-
Notifications
You must be signed in to change notification settings - Fork 537
Python hugging face tokenizer #8354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8354
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 2 Cancelled JobsAs of commit b0990da with merge base d99970b ( NEW FAILURE - The following job has failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
9ebd327
to
f1573f2
Compare
f1573f2
to
e3352fa
Compare
911b9d9
to
01ed13c
Compare
parser.add_argument( | ||
"--tokenizer_config_path", | ||
type=str, | ||
default=None, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add help
to clarify how to use this argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this should be mutually exclusive with tokenizer_path? We should make sure only one is passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokenizer.json is always the source of truth, the config is just for metadata like this, e.g. it tells you what the eos_token is, which isn't info that is in the tokenizer.json
itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this info to the help string?
237ebcb
to
b0990da
Compare
Summary
Add python tokenizer for hugging face tokenizer format (-
.json
files)Test plan