`Dataset.map` ignores existing caches and remaps when ran with different `num_proc`

### Describe the bug

If you `map` a dataset and save it to a specific `cache_file_name` with a specific `num_proc`, and then call map again with that same existing `cache_file_name` but a different `num_proc`, the dataset will be re-mapped.

### Steps to reproduce the bug

1. Download a dataset
```python
import datasets

dataset = datasets.load_dataset("ylecun/mnist")
```

```
Generating train split: 100%|██████████| 60000/60000 [00:00<00:00, 116429.85 examples/s]
Generating test split: 100%|██████████| 10000/10000 [00:00<00:00, 103310.27 examples/s]
```

2. `map` and cache it with a specific `num_proc`

```python
cache_file_name="./cache/train.map"
dataset["train"].map(lambda x: x, cache_file_name=cache_file_name, num_proc=2)
```

```
Map (num_proc=2): 100%|██████████| 60000/60000 [00:01<00:00, 53764.03 examples/s]
```

3. `map` it with a different `num_proc` and the same `cache_file_name` as before

```python
dataset["train"].map(lambda x: x, cache_file_name=cache_file_name, num_proc=3)
```

```
Map (num_proc=3): 100%|██████████| 60000/60000 [00:00<00:00, 65377.12 examples/s]
```

### Expected behavior

If I specify an existing `cache_file_name`, I don't expect using a different `num_proc` than the one that was used to generate it to cause the dataset to have be be re-mapped.

### Environment info

```console
$ datasets-cli env

- `datasets` version: 3.3.2
- Platform: Linux-5.15.0-131-generic-x86_64-with-glibc2.35
- Python version: 3.10.16
- `huggingface_hub` version: 0.29.1
- PyArrow version: 19.0.1
- Pandas version: 2.2.3
- `fsspec` version: 2024.12.0
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`Dataset.map` ignores existing caches and remaps when ran with different `num_proc` #7433

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dataset.map ignores existing caches and remaps when ran with different num_proc #7433

Description

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`Dataset.map` ignores existing caches and remaps when ran with different `num_proc` #7433