-
Notifications
You must be signed in to change notification settings - Fork 29.2k
Tokenizers throwing warning "The current process just got forked, Disabling parallelism to avoid deadlocks.. To disable this warning, please explicitly set TOKENIZERS_PARALLELISM=(true | false)" #5486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I suspect this may be caused by loading data. In my case, it happens when my dataloader starts working. |
This is happening whenever you use You can try to set it to We'll improve this message to help avoid any confusion (Cf huggingface/tokenizers#328) |
I may be a rookie, but it seems like it would be useful to indicate that this is an environment variable in the warning message. |
You are totally right! In the latest version |
Hi, sorry to bump this thread... I'm having the same problem however, the tokenizer is used only in my model. Data loading is made with multiple workers but it is only loading raw text which is then given to the model and only the model uses the tokenizer. Thus I was wondering how can I have the warning. Thanks in advance, |
You must be using a tokenizer before using |
@n1t0, Then I will use the env variable to remove the warning. |
I use If that is the source of this problem (hence disabling the parallelization --> hence slow training), then what is the solution? Using |
After testing, it is found that when the data in a dataloader is processed by the token, and the datalodaer jumps out before it is finished, this warning will be triggered;
|
@hbchen121 my dataloader processes the text in init function During data loading time, directly input_ids and attention masks are fetched, yet I get this warning. |
* Make HFTokenizer lazy. Tokenizer is created lazily because huggingface tokenizers are not fork safe and prefer being created in each process * Disabling tokenizer parallism for HF. Necessary, see https://stackoverflow.com/q/62691279 and huggingface/transformers#5486
Despite the documentation saying that |
I want to know if we can ignore this warning. What bad effects will it have? Will it affect the training results? Or is it just a little slower? If the environment variables are changed according to the above solution, what is the cost of doing so? |
@hzphzp there is an explanation in SO |
Thank you! |
Though each notebook runs fine by itself, I get this warning when running multiple notebooks via I assume it has something to do with multiprocessing in This gets a warning about disabling parallelism to avoid locks:
This works fine:
|
|
I know this warning is because the transformer library is updated to 3.x.
I know the warning saying to set TOKENIZERS_PARALLELISM = true / false
My question is where should i set TOKENIZERS_PARALLELISM = true / false
is this when defining tokenizers like
or is this when encoding text like
Suggestions anyone?
The text was updated successfully, but these errors were encountered: