Skip to content

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Oct 9, 2025

What does this PR do?

I can see the effect as the value is changed as expected from

(Pdb) self.backend_tokenizer.pre_tokenizer
Sequence(pretokenizers=[Split(pattern=Regex(" ?[^(\s|[.,!?…。,、।۔،])]+"), behavior=Isolated, invert=False), ByteLevel(add_prefix_space=False, trim_offsets=True, use_regex=False)])

to

(Pdb) self.backend_tokenizer.pre_tokenizer
Sequence(pretokenizers=[Split(pattern=Regex(" ?[^(\s|[.,!?…。,、।۔،])]+"), behavior=Isolated, invert=False), ByteLevel(add_prefix_space=True, trim_offsets=True, use_regex=False)])

@ydshieh ydshieh requested a review from ArthurZucker October 9, 2025 09:13
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh
Copy link
Collaborator Author

ydshieh commented Oct 9, 2025

There are some similar places like

src/transformers/models/cohere/tokenization_cohere_fast.py

But would like to have a confirmation first.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and long due, this was not possible before, it is now because of the update to tokenizers ! Can be applied everywhere else!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: bloom, cohere

@ydshieh
Copy link
Collaborator Author

ydshieh commented Oct 9, 2025

run-slow: bloom, cohere

1 similar comment
@ydshieh
Copy link
Collaborator Author

ydshieh commented Oct 10, 2025

run-slow: bloom, cohere

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/bloom', 'models/cohere']
quantizations: [] ...

@ydshieh ydshieh merged commit f5f3457 into main Oct 10, 2025
22 of 23 checks passed
@ydshieh ydshieh deleted the i_dont_like_pickle branch October 10, 2025 08:52
AhnJoonSung pushed a commit to AhnJoonSung/transformers that referenced this pull request Oct 12, 2025
* pickle 1

* pickle 1

* pickle 1

* pickle 1

* pickle 1

* pickle 1

---------

Co-authored-by: ydshieh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants