Tokenizers throwing warning "The current process just got forked, Disabling parallelism to avoid deadlocks.. To disable this warning, please explicitly set TOKENIZERS_PARALLELISM=(true | false)" #5486

saahiluppal · 2020-07-03T05:13:36Z

I know this warning is because the transformer library is updated to 3.x.
I know the warning saying to set TOKENIZERS_PARALLELISM = true / false

My question is where should i set TOKENIZERS_PARALLELISM = true / false
is this when defining tokenizers like

tok = Tokenizer.from_pretrained('xyz', TOKENIZERS_PARALLELISM=True) // this doesn't work

or is this when encoding text like

tok.encode_plus(text_string, some=some, some=some, TOKENIZERS_PARALLELISM = True) // this also didn't work

Suggestions anyone?

The text was updated successfully, but these errors were encountered:

hadarishav · 2020-07-05T05:03:01Z

This might help you: https://stackoverflow.com/questions/62691279/how-to-disable-tokenizers-parallelism-true-false-warning

Vimos · 2020-07-05T05:07:40Z

I suspect this may be caused by loading data. In my case, it happens when my dataloader starts working.

n1t0 · 2020-07-06T13:19:43Z

This is happening whenever you use multiprocessing (Often used by data loaders). The way to disable this warning is to set the TOKENIZERS_PARALLELISM environment variable to the value that makes more sense for you. By default, we disable the parallelism to avoid any hidden deadlock that would be hard to debug, but you might be totally fine while keeping it enabled in your specific use-case.

You can try to set it to true, and if your process seems to be stuck, doing nothing, then you should use false.

We'll improve this message to help avoid any confusion (Cf huggingface/tokenizers#328)

nathan-chappell · 2020-07-09T09:40:32Z

I may be a rookie, but it seems like it would be useful to indicate that this is an environment variable in the warning message.

n1t0 · 2020-07-09T11:36:40Z

You are totally right! In the latest version 3.0.2, the warning message should be a lot better, and it will trigger only when necessary.

ierezell · 2021-05-06T18:20:11Z

Hi, sorry to bump this thread...

I'm having the same problem however, the tokenizer is used only in my model.

Data loading is made with multiple workers but it is only loading raw text which is then given to the model and only the model uses the tokenizer.
I don't have multi model or whatever, just a classic pytorch model.

Thus I was wondering how can I have the warning.

Thanks in advance,
Have a great day :)

n1t0 · 2021-05-06T18:42:19Z

You must be using a tokenizer before using multiprocessing. When your process gets forked, you see this message because it detects that a fork is happening and that some kind of parallelism was used before.

ierezell · 2021-05-06T20:50:14Z

@n1t0,
Thanks a lot for the fast reply,
I guess it detect a fork even if it's safe for me to do so... Yes my process is forked but not the tokenizer.

Then I will use the env variable to remove the warning.

ritwikmishra · 2022-04-02T14:31:18Z

I use tokenizer in my data loader.

If that is the source of this problem (hence disabling the parallelization --> hence slow training), then what is the solution?

Using tokenizer in the pre-processing step?

hbchen121 · 2022-05-09T04:23:03Z

After testing, it is found that when the data in a dataloader is processed by the token, and the datalodaer jumps out before it is finished, this warning will be triggered;
I give a code example:

# for example, following code will trigger the warning
for texts in train_dataloader:
    _ = tokenizer.batch_encode_plus(texts)
    # loader has not been traversed
    # but texts are used
    break 
for texts in test_dataloader:
    # warning ...
    pass or break

# and following code will not trigger the warning
for texts in train_dataloader:
    # loader has not been traversed
    # but texts are not used
    break 
for texts in test_dataloader:
    # No warning 
    pass or break

ritwikmishra · 2022-05-11T11:01:47Z

@hbchen121 my dataloader processes the text in init function

During data loading time, directly input_ids and attention masks are fetched, yet I get this warning.

Necessary, see https://stackoverflow.com/q/62691279 and huggingface/transformers#5486

* Make HFTokenizer lazy. Tokenizer is created lazily because huggingface tokenizers are not fork safe and prefer being created in each process * Disabling tokenizer parallism for HF. Necessary, see https://stackoverflow.com/q/62691279 and huggingface/transformers#5486

Jadiker · 2023-05-11T01:21:33Z

Despite the documentation saying that use_fast defaults to False, adding use_fast=False so that it's AutoTokenizer.from_pretrained(model_name, use_fast=False) removed this warning for me. If I just use AutoTokenizer.from_pretrained(model_name), the warning pops up again.

hzphzp · 2023-08-30T10:58:24Z

I want to know if we can ignore this warning. What bad effects will it have? Will it affect the training results? Or is it just a little slower? If the environment variables are changed according to the above solution, what is the cost of doing so?

amyeroberts · 2023-08-30T11:08:50Z

cc @ArthurZucker

crmuhsin · 2023-08-31T08:40:37Z

I want to know if we can ignore this warning. What bad effects will it have? Will it affect the training results? Or is it just a little slower? If the environment variables are changed according to the above solution, what is the cost of doing so?

@hzphzp there is an explanation in SO
https://stackoverflow.com/questions/62691279/how-to-disable-tokenizers-parallelism-true-false-warning/72926996#72926996

hzphzp · 2023-09-01T11:50:42Z

I want to know if we can ignore this warning. What bad effects will it have? Will it affect the training results? Or is it just a little slower? If the environment variables are changed according to the above solution, what is the cost of doing so?

@hzphzp there is an explanation in SO https://stackoverflow.com/questions/62691279/how-to-disable-tokenizers-parallelism-true-false-warning/72926996#72926996

Thank you!

ctwardy · 2024-01-22T19:56:59Z

Though each notebook runs fine by itself, I get this warning when running multiple notebooks via nbdev_test (https://github.com/fastai/nbdev). Shortly afterwards it crashes due to out-of-memory.

I assume it has something to do with multiprocessing in nbdev_test, even when setting --n_workers 1.

This gets a warning about disabling parallelism to avoid locks:

nbdev_test --n_workers 1 --pause 10 --do_print --file_glob "*nb"

This works fine:

$ for x in `ls nbs/*nb`; do nbdev_test --n_workers 1 --do_print --path "$x"; done

HoseinHashemi · 2024-07-24T16:55:12Z

Despite the documentation saying that use_fast defaults to False, adding use_fast=False so that it's AutoTokenizer.from_pretrained(model_name, use_fast=False) removed this warning for me. If I just use AutoTokenizer.from_pretrained(model_name), the warning pops up again.

use_fast default is True, which enables a fast Rust base tokeniser if available, if not a Python base tokeniser will be triggered.

thomwolf assigned n1t0 Jul 3, 2020

patrickvonplaten added the Core: Tokenization Internals of the library; Tokenization. label Jul 6, 2020

n1t0 mentioned this issue Jul 6, 2020

Various tokenizers fixes #5558

Merged

LysandreJik closed this as completed in #5558 Jul 6, 2020

sushreebarsa mentioned this issue Sep 6, 2021

Forked tf script deadlocks unless disabling intra op parallelism tensorflow/tensorflow#51832

Closed

NickCrews mentioned this issue Mar 23, 2022

Print warnings to stderr, not stdout huggingface/tokenizers#964

Closed

vinid mentioned this issue Sep 23, 2022

The Kitty: Human-in-the-loop Classifier example gives errors and warnings MilaNLProc/contextualized-topic-models#115

Closed

Narsil mentioned this issue Sep 29, 2022

[Community] cancelable and asynchronous pipelines huggingface/diffusers#374

Closed

rom1504 added a commit to mlfoundations/open_clip that referenced this issue Nov 7, 2022

Disabling tokenizer parallism for HF.

c371f23

Necessary, see https://stackoverflow.com/q/62691279 and huggingface/transformers#5486

rom1504 added a commit to mlfoundations/open_clip that referenced this issue Nov 7, 2022

Disabling tokenizer parallism for HF.

b00acfc

Necessary, see https://stackoverflow.com/q/62691279 and huggingface/transformers#5486

Narsil mentioned this issue Nov 14, 2022

Hanging in TextClassificationPipeline's prediction #20189

Closed

4 tasks

fgeeri mentioned this issue Nov 18, 2022

Disable parallelism for huggingface/tokenizers quanteda/spacyr#233

Open

fxmarty mentioned this issue Nov 24, 2022

Clearer error messages when initilizing the requested ONNX Runtime execution provider fails huggingface/optimum#514

Merged

philschmid mentioned this issue Jan 23, 2024

Optimum Neuron Bug Bash - fine_tune_bert - huggingface/tokenizers warning huggingface/optimum-neuron#425

Closed

joannacknight mentioned this issue Mar 18, 2024

Investigate use of weights and biases alan-turing-institute/ARC-MTQE#7

Closed

anakin87 mentioned this issue Jun 10, 2024

HuggingFace Embedder potential deadlock deepset-ai/haystack#7787

Closed

1 task

rishic3 mentioned this issue Oct 8, 2024

Fix Spark-DL notebooks for CI/CD and update to latest dependencies NVIDIA/spark-rapids-examples#439

Merged

xifeng0126 mentioned this issue Mar 9, 2025

配置文件包缺失以及对运行代码性能要求 guosyjlu/DS-Agent#20

Closed

Tokenizers throwing warning "The current process just got forked, Disabling parallelism to avoid deadlocks.. To disable this warning, please explicitly set TOKENIZERS_PARALLELISM=(true | false)" #5486

Tokenizers throwing warning "The current process just got forked, Disabling parallelism to avoid deadlocks.. To disable this warning, please explicitly set TOKENIZERS_PARALLELISM=(true | false)" #5486

Comments

saahiluppal commented Jul 3, 2020

hadarishav commented Jul 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vimos commented Jul 5, 2020

Uh oh!

n1t0 commented Jul 6, 2020

Uh oh!

nathan-chappell commented Jul 9, 2020

Uh oh!

n1t0 commented Jul 9, 2020

Uh oh!

ierezell commented May 6, 2021

Uh oh!

n1t0 commented May 6, 2021

Uh oh!

ierezell commented May 6, 2021

Uh oh!

ritwikmishra commented Apr 2, 2022

Uh oh!

hbchen121 commented May 9, 2022

Uh oh!

ritwikmishra commented May 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jadiker commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hzphzp commented Aug 30, 2023

Uh oh!

amyeroberts commented Aug 30, 2023

Uh oh!

crmuhsin commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hzphzp commented Sep 1, 2023

Uh oh!

ctwardy commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HoseinHashemi commented Jul 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hadarishav commented Jul 5, 2020 •

edited

Loading

ritwikmishra commented May 11, 2022 •

edited

Loading

Jadiker commented May 11, 2023 •

edited

Loading

crmuhsin commented Aug 31, 2023 •

edited

Loading

ctwardy commented Jan 22, 2024 •

edited

Loading

HoseinHashemi commented Jul 24, 2024 •

edited

Loading