-
Notifications
You must be signed in to change notification settings - Fork 408
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Bug report checklist
- I provided code that demonstrates a minimal reproducible example.
- I confirmed bug exists on the latest mainline of Chronos via source install.
Describe the bug
When I put a single dataset in the config file like the following:
# List of training data files
training_data_paths:
- "/path/to/kernelsynth-data.arrow"
# Mixing probability of each dataset file
probability:
- 1.0
I would face ValueError:
File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/accelerate/data_loader.py", line 631, in _fetch_batches
batches.append(next(iterator))
^^^^^^^^^^^^^^
File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1326, in _next_data
return self._process_data(data)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/torch/_utils.py", line 705, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
data.append(next(self.dataset_iter))
^^^^^^^^^^^^^^^^^^^^^^^
File "/export/home/chronos-forecasting/scripts/training/train.py", line 243, in __iter__
for element in self.base_dataset:
File "/export/home/chronos-forecasting/scripts/training/train.py", line 493, in __iter__
idx = np.random.choice(range(len(iterators)), p=probs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "numpy/random/mtrand.pyx", line 951, in numpy.random.mtrand.RandomState.choice
ValueError: 'a' cannot be empty unless no samples are taken
Basically, this is because the probs is an empty list: probs: [], iterables: []. I am not sure why it would be empty. I think this might be bug but not sure if any one else faced the same issue?
Expected behavior
I think it should run smoothly.
To reproduce
Full script:
context_length: 512
prediction_length: 64
min_past: 60
max_steps: 200_000
save_steps: 100_000
log_steps: 500
per_device_train_batch_size: 128
learning_rate: 0.001
optim: adamw_torch_fused
num_samples: 20
shuffle_buffer_length: 100_000
gradient_accumulation_steps: 1
model_id: google/t5-efficient-tiny
model_type: seq2seq
random_init: true
tie_embeddings: true
output_dir: chronos_output/output-tiny_only_synth/
tf32: true
torch_compile: true
tokenizer_class: "MeanScaleUniformBins"
tokenizer_kwargs:
low_limit: -15.0
high_limit: 15.0
n_tokens: 4096
lr_scheduler_type: linear
warmup_ratio: 0.0
dataloader_num_workers: 11
max_missing_prop: 0.9
use_eos_token: true
training_data_paths:
- "synth-data/kernelsynth-data.arrow"
probability:
- 1.0
Environment description
Operating system:
Python version: Python 3.11.5
PyTorch version: 2.3.1+cu121
HuggingFace transformers version: 4.41.2
HuggingFace accelerate version: 0.30.1
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working