-
Notifications
You must be signed in to change notification settings - Fork 7
Add pcre2 as re2 fallback #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Apr 15, 2025
5 tasks
larryliu0820
approved these changes
Apr 17, 2025
6a4afd9
to
aa70360
Compare
aa70360
to
0fc711a
Compare
@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This pull request was exported from Phabricator. Differential Revision: D73295314 |
jackzhxng
added a commit
that referenced
this pull request
Apr 21, 2025
Summary: Adds pcre2 to handle the negative lookbehinds in HuggingFace tokenizers. Performance stays about the same from test runs [before](https://github.com/pytorch-labs/tokenizers/actions/runs/14480863330/job/40617329721#step:14:758) (run on last commit on main) and [after](https://github.com/pytorch-labs/tokenizers/actions/runs/14526152504/job/40757962551#step:14:901) (this pr). Tokenizer library size (from `ls -lh build/libtokenizers.a`): `13M` (on main) -> `15M`. This most likely comes from adding the `pcre2` lib. 🧱 Stack: - [ ] #45 - [ ] #48 - [ ] #49 - [x] #50 Pull Request resolved: #50 Differential Revision: D73295314 Pulled By: jackzhxng
This pull request was exported from Phabricator. Differential Revision: D73295314 |
jackzhxng
added a commit
that referenced
this pull request
Apr 21, 2025
Summary: Adds pcre2 to handle the negative lookbehinds in HuggingFace tokenizers. Performance stays about the same from test runs [before](https://github.com/pytorch-labs/tokenizers/actions/runs/14480863330/job/40617329721#step:14:758) (run on last commit on main) and [after](https://github.com/pytorch-labs/tokenizers/actions/runs/14526152504/job/40757962551#step:14:901) (this pr). Tokenizer library size (from `ls -lh build/libtokenizers.a`): `13M` (on main) -> `15M`. This most likely comes from adding the `pcre2` lib. 🧱 Stack: - [ ] #45 - [ ] #48 - [ ] #49 - [x] #50 Pull Request resolved: #50 Differential Revision: D73295314 Pulled By: jackzhxng
Summary: Adds pcre2 to handle the negative lookbehinds in HuggingFace tokenizers. Performance stays about the same from test runs [before](https://github.com/pytorch-labs/tokenizers/actions/runs/14480863330/job/40617329721#step:14:758) (run on last commit on main) and [after](https://github.com/pytorch-labs/tokenizers/actions/runs/14526152504/job/40757962551#step:14:901) (this pr). Tokenizer library size (from `ls -lh build/libtokenizers.a`): `13M` (on main) -> `15M`. This most likely comes from adding the `pcre2` lib. 🧱 Stack: - [ ] #45 - [ ] #48 - [ ] #49 - [x] #50 Pull Request resolved: #50 Differential Revision: D73295314 Pulled By: jackzhxng
This pull request was exported from Phabricator. Differential Revision: D73295314 |
jackzhxng
added a commit
to pytorch/executorch
that referenced
this pull request
Apr 30, 2025
### Summary Use https://github.com/pytorch-labs/tokenizers huggingface tokenizer in the Llama runner. Results on Qwen2.5 with `extension/llm/tokenizers` checked out to pytorch-labs/tokenizers#50: ``` Once upon a time, there was a little girl named Lily. She was very happy. She had a big garden in the back of her house. She planted many flowers in it. They were red, yellow and blue. They were very pretty. Lily loved them very much. One day, she was watering them. Suddenly, she heard a noise. It was a noise in the tree. She looked up. There was a big bird in the tree. It was eating one of Lily's flowers. Lily was very angry. She ran to the tree. "Hello!" she said to the bird. "What are you doing in my I 00:00:08.624959 executorch:runner.cpp:294] RSS after finishing text generation: 2147.121094 MiB (0 if unsupported) PyTorchObserver {"prompt_tokens":4,"generated_tokens":123,"model_load_start_ms":1744936315023,"model_load_end_ms":1744936318524,"inference_start_ms":1744936318524,"inference_end_ms":1744936323646,"prompt_eval_end_ms":1744936318580,"first_token_ms":1744936318580,"aggregate_sampling_time_ms":274877907025,"SCALING_FACTOR_UNITS_PER_SECOND":1000} I 00:00:08.625019 executorch:stats.h:106] Prompt Tokens: 4 Generated Tokens: 123 I 00:00:08.625021 executorch:stats.h:112] Model Load Time: 3.501000 (seconds) I 00:00:08.625023 executorch:stats.h:119] Total inference time: 5.122000 (seconds) Rate: 24.014057 (tokens/second) I 00:00:08.625033 executorch:stats.h:129] Prompt evaluation: 0.056000 (seconds) Rate: 71.428571 (tokens/second) I 00:00:08.625038 executorch:stats.h:138] Generated 123 tokens: 5.066000 (seconds) Rate: 24.279510 (tokens/second) I 00:00:08.625045 executorch:stats.h:149] Time to first generated token: 0.056000 (seconds) I 00:00:08.625047 executorch:stats.h:155] Sampling time over 127 tokens: 274877907.025000 (seconds) ``` ### Test plan Build llama runner locally (note the inclusion of `-DSUPPORT_REGEX_LOOKAHEAD=ON`): ``` cmake -DPYTHON_EXECUTABLE=python \ -DCMAKE_INSTALL_PREFIX=cmake-out \ -DCMAKE_BUILD_TYPE=Release \ -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \ -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ -DEXECUTORCH_BUILD_XNNPACK=ON \ -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \ -DSUPPORT_REGEX_LOOKAHEAD=ON \ -Bcmake-out/examples/models/llama \ examples/models/llama cmake --build cmake-out/examples/models/llama -j16 --config Release ``` Run on Qwen2.5: ``` cmake-out/examples/models/llama/llama_main --model_path=qwen2_5.pte --tokenizer_path ~/hf/models--Qwen--Qwen2.5-1.5B/snapshots/8faed761d45a263340a0528343f099c05c9a4323/tokenizer.json --prompt="Once upon a time" --temperature 0 ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds pcre2 to handle the negative lookbehinds in HuggingFace tokenizers.
Performance stays about the same from test runs before (run on last commit on main) and after (this pr).
Tokenizer library size (from
ls -lh build/libtokenizers.a
):13M
(on main) ->15M
. This most likely comes from adding thepcre2
lib.🧱 Stack: