Skip to content

Update Japanese tokenizer config and add serialization #5562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 8, 2020

Conversation

adrianeboyd
Copy link
Contributor

Description

  • Use config dict for tokenizer settings
  • Add serialization of split mode setting
  • Add tests for tokenizer split modes and serialization of split mode setting

Based on #5561

Types of change

Enhancement.

Checklist

  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

* Use `config` dict for tokenizer settings
* Add serialization of split mode setting
* Add tests for tokenizer split modes and serialization of split mode
setting

Based on explosion#5561
@adrianeboyd adrianeboyd added enhancement Feature requests and improvements lang / ja Japanese language data and models labels Jun 8, 2020
@adrianeboyd adrianeboyd mentioned this pull request Jun 8, 2020
3 tasks
@adrianeboyd adrianeboyd merged commit 3bf1115 into explosion:master Jun 8, 2020
@hiroshi-matsuda-rit
Copy link
Contributor

I think this works fine and thank you for adding test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements lang / ja Japanese language data and models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants