[BUG] ValueError when we use only one dataset

**Bug report checklist**

- [x] I provided code that demonstrates a minimal reproducible example.  
- [x] I confirmed bug exists on the latest mainline of Chronos via source install.  

**Describe the bug**

When I put a single dataset in the config file like the following: 
```
# List of training data files
training_data_paths:
- "/path/to/kernelsynth-data.arrow"
# Mixing probability of each dataset file
probability:
- 1.0
```
I would face ValueError:
```
  File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/accelerate/data_loader.py", line 631, in _fetch_batches
    batches.append(next(iterator))
                   ^^^^^^^^^^^^^^
  File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
           ^^^^^^^^^^^^^^^^^
  File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1326, in _next_data
    return self._process_data(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
    data.reraise()
  File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/torch/_utils.py", line 705, in reraise
    raise exception
ValueError: Caught ValueError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
           ^^^^^^^^^^^^^^^^^^^^
  File "/export/home/anaconda/envs/chronos/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
    data.append(next(self.dataset_iter))
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/chronos-forecasting/scripts/training/train.py", line 243, in __iter__
    for element in self.base_dataset:
  File "/export/home/chronos-forecasting/scripts/training/train.py", line 493, in __iter__
    idx = np.random.choice(range(len(iterators)), p=probs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "numpy/random/mtrand.pyx", line 951, in numpy.random.mtrand.RandomState.choice
ValueError: 'a' cannot be empty unless no samples are taken
```
Basically, this is because the probs is an empty list: probs: [], iterables: []. I am not sure why it would be empty. I think this might be bug but not sure if any one else faced the same issue?

**Expected behavior**

I think it should run smoothly. 

**To reproduce**

Full script: 
```
context_length: 512
prediction_length: 64
min_past: 60
max_steps: 200_000
save_steps: 100_000
log_steps: 500
per_device_train_batch_size: 128
learning_rate: 0.001
optim: adamw_torch_fused
num_samples: 20
shuffle_buffer_length: 100_000
gradient_accumulation_steps: 1
model_id: google/t5-efficient-tiny
model_type: seq2seq
random_init: true
tie_embeddings: true
output_dir: chronos_output/output-tiny_only_synth/
tf32: true
torch_compile: true
tokenizer_class: "MeanScaleUniformBins"
tokenizer_kwargs:
  low_limit: -15.0
  high_limit: 15.0
n_tokens: 4096
lr_scheduler_type: linear
warmup_ratio: 0.0
dataloader_num_workers: 11
max_missing_prop: 0.9
use_eos_token: true
training_data_paths:
- "synth-data/kernelsynth-data.arrow"
probability:
- 1.0
```
**Environment description**
Operating system:
Python version: Python 3.11.5
PyTorch version: 2.3.1+cu121
HuggingFace transformers version: 4.41.2
HuggingFace accelerate version: 0.30.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] ValueError when we use only one dataset #154

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] ValueError when we use only one dataset #154

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions