Skip to content

Commit 6732b1c

Browse files
abdulfatirAbdul Fatir Ansari
andauthored
Add a README file for the scripts (#67)
*Description of changes:* Adds usage examples for `scripts/`. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. --------- Co-authored-by: Abdul Fatir Ansari <[email protected]>
1 parent 1e102f6 commit 6732b1c

File tree

3 files changed

+92
-2
lines changed

3 files changed

+92
-2
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
## 🚀 News
1212

13-
- **10 May 2024**: 🚀 We added the code for pretraining and fine-tuning Chronos models. You can find it in [this folder](./scripts/training). We also added [a script](./scripts/kernel-synth.py) for generating synthetic time series data from Gaussian processes (KernelSynth; see Section 4.2 in the paper for details).
13+
- **10 May 2024**: 🚀 We added the code for pretraining and fine-tuning Chronos models. You can find it in [this folder](./scripts/training). We also added [a script](./scripts/kernel-synth.py) for generating synthetic time series data from Gaussian processes (KernelSynth; see Section 4.2 in the paper for details). Check out the [usage examples](./scripts/).
1414
- **19 Apr 2024**: 🚀 Chronos is now supported on [AutoGluon-TimeSeries](https://auto.gluon.ai/stable/tutorials/timeseries/index.html), the powerful AutoML package for time series forecasting which enables model ensembles, cloud deployments, and much more. Get started with the [tutorial](https://auto.gluon.ai/stable/tutorials/timeseries/forecasting-chronos.html).
1515
- **08 Apr 2024**: 🧪 Experimental [MLX inference support](https://github.com/amazon-science/chronos-forecasting/tree/mlx) added. If you have an Apple Silicon Mac, you can now obtain significantly faster forecasts from Chronos compared to CPU inference. This provides an alternative way to exploit the GPU on your Apple Silicon Macs together with the "mps" support in PyTorch.
1616
- **25 Mar 2024**: [v1.1.0 released](https://github.com/amazon-science/chronos-forecasting/releases/tag/v1.1.0) with inference optimizations and `pipeline.embed` to extract encoder embeddings from Chronos.

scripts/README.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Usage Examples
2+
3+
## Generating Synthetic Time Series (KernelSynth)
4+
5+
- Install this package with with the `training` extra:
6+
```
7+
pip install "chronos[training] @ git+https://github.com/amazon-science/chronos-forecasting.git"
8+
```
9+
- Run `kernel-synth.py`:
10+
```sh
11+
# With defaults used in the paper (1M time series and 5 max_kernels)
12+
python kernel-synth.py
13+
14+
# You may optionally specify num-series and max-kernels
15+
python kernel-synth.py \
16+
--num-series <num of series to generate> \
17+
--max-kernels <max number of kernels to use per series>
18+
```
19+
The generated time series will be saved in a [GluonTS](https://github.com/awslabs/gluonts)-comptabile arrow file `kernelsynth-data.arrow`.
20+
21+
## Pretraining (and fine-tuning) Chronos models
22+
- Install this package with with the `training` extra:
23+
```
24+
pip install "chronos[training] @ git+https://github.com/amazon-science/chronos-forecasting.git"
25+
```
26+
- Convert your time series dataset into a GluonTS-compatible file dataset. We recommend using the arrow format. You may use the `convert_to_arrow` function from the following snippet for that. Optionally, you may use [synthetic data from KernelSynth](#generating-synthetic-time-series-kernelsynth) to follow along.
27+
```py
28+
from pathlib import Path
29+
from typing import List, Optional, Union
30+
31+
import numpy as np
32+
from gluonts.dataset.arrow import ArrowWriter
33+
34+
35+
def convert_to_arrow(
36+
path: Union[str, Path],
37+
time_series: Union[List[np.ndarray], np.ndarray],
38+
start_times: Optional[Union[List[np.datetime64], np.ndarray]] = None,
39+
compression: str = "lz4",
40+
):
41+
if start_times is None:
42+
# Set an arbitrary start time
43+
start_times = [np.datetime64("2000-01-01 00:00", "s")] * len(time_series)
44+
45+
assert len(time_series) == len(start_times)
46+
47+
dataset = [
48+
{"start": start, "target": ts} for ts, start in zip(time_series, start_times)
49+
]
50+
ArrowWriter(compression=compression).write_to_file(
51+
dataset,
52+
path=path,
53+
)
54+
55+
56+
if __name__ == "__main__":
57+
# Generate 20 random time series of length 1024
58+
time_series = [np.random.randn(1024) for i in range(20)]
59+
60+
# Convert to GluonTS arrow format
61+
convert_to_arrow("./noise-data.arrow", time_series=time_series)
62+
63+
```
64+
- Modify the [training configs](training/configs) to use your data. Let's use the KernelSynth data as an example.
65+
```yaml
66+
# List of training data files
67+
training_data_paths:
68+
- "/path/to/kernelsynth-data.arrow"
69+
# Mixing probability of each dataset file
70+
probability:
71+
- 1.0
72+
```
73+
You may optionally change other parameters of the config file, as required. For instance, if you're interested in fine-tuning the model from a pretrained Chronos checkpoint, you should change the `model_id`, set `random_init: false`, and (optionally) change other parameters such as `max_steps` and `learning_rate`.
74+
- Start the training (or fine-tuning) job:
75+
```sh
76+
# On single GPU
77+
CUDA_VISIBLE_DEVICES=0 python training/train.py --config/path/to/modified/config.yaml
78+
79+
# On multiple GPUs (example with 8 GPUs)
80+
torchrun --nproc-per-node=8 training/train.py --config /path/to/modified/config.yaml
81+
82+
# Fine-tune `amazon/chronos-t5-small` for 1000 steps
83+
CUDA_VISIBLE_DEVICES=0 python training/train.py --config /path/to/modified/config.yaml \
84+
--model-id amazon/chronos-t5-small \
85+
--no-random-init \
86+
--max-steps 1000
87+
```
88+
The output and checkpoints will be saved in `output/run_{id}/`.
89+
> [!TIP]
90+
> If the initial training step is too slow, you might want to change the `shuffle_buffer_length` and/or set `torch_compile` to `false`.

scripts/training/train.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ def load_model(
123123
config.tie_word_embeddings = tie_embeddings
124124
model = AutoModelClass.from_config(config)
125125
else:
126-
log_on_main("Using pretrained initialization", logger)
126+
log_on_main(f"Using pretrained initialization from {model_id}", logger)
127127
model = AutoModelClass.from_pretrained(model_id)
128128

129129
model.resize_token_embeddings(vocab_size)

0 commit comments

Comments
 (0)