-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Main Project Goal
The main idea behind this project is to test an analytical thesis: for time series problems the window length that you use for training data is often the most important variable in your results, and I suspect it's true that combining different window lengths for a final predictive output is a very powerful form of model improvement, with an affect that's akin to regularization, but better suited for the time series domain.
Why This Project is Interesting
If this idea is true, then it would undercover an important methodological improvement in time series analysis, that's both easy to implement and verify. Enduring improvements in technique are often some of the most important academic work, something like this would have immediate impact on how lots of practitioners do their work. This project also has both an empirical component and a software one, because its technique could easily be put into modeling pipelines for different ML libraries.
Brief Description of Work Involved
- Collect different varieties of time series datasets: electricity, tourism, m3, m4, airline, etc
- Write training scripts that run a wide variety of models testing performance on the different datasets combining the different window sizes
- Probably spend a lot of time sifting through the results to try and find interesting relationships: is the effect robust? how large? does it work particularly well for some models than others? etc
- Find a good way to communicate the results: arxiv, medium, towards data science, etc
- Potentially deploy it as a pipeline technique to use in ML libraries, so others can make use of it
First Steps
- Collect some of the common datasets to use for the project
- Would be useful to toy around with smaller examples in notebooks to make sure we understand the right approach
- Begin writing training scripts to train a lot of different models simultaneously and get the results