Skip to content

[PROJECT IDEA]Ensembling Different Window Lengths As a Form of Time Series Regularization #2

@JonathanBechtel

Description

@JonathanBechtel

Main Project Goal

The main idea behind this project is to test an analytical thesis: for time series problems the window length that you use for training data is often the most important variable in your results, and I suspect it's true that combining different window lengths for a final predictive output is a very powerful form of model improvement, with an affect that's akin to regularization, but better suited for the time series domain.

Why This Project is Interesting

If this idea is true, then it would undercover an important methodological improvement in time series analysis, that's both easy to implement and verify. Enduring improvements in technique are often some of the most important academic work, something like this would have immediate impact on how lots of practitioners do their work. This project also has both an empirical component and a software one, because its technique could easily be put into modeling pipelines for different ML libraries.

Brief Description of Work Involved

  • Collect different varieties of time series datasets: electricity, tourism, m3, m4, airline, etc
  • Write training scripts that run a wide variety of models testing performance on the different datasets combining the different window sizes
  • Probably spend a lot of time sifting through the results to try and find interesting relationships: is the effect robust? how large? does it work particularly well for some models than others? etc
  • Find a good way to communicate the results: arxiv, medium, towards data science, etc
  • Potentially deploy it as a pipeline technique to use in ML libraries, so others can make use of it

First Steps

  • Collect some of the common datasets to use for the project
  • Would be useful to toy around with smaller examples in notebooks to make sure we understand the right approach
  • Begin writing training scripts to train a lot of different models simultaneously and get the results

Metadata

Metadata

Assignees

No one assigned

    Labels

    In Progressanalyticalwill include experimental setup and analytical resultsopen source softwarecan be implemented into softwarepandaswill require data cleaningtime seriesidea is a time series problem

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions