-
Notifications
You must be signed in to change notification settings - Fork 96
Description
Following the official discussion about Chinese support in Meilisearch, it is relevant to normalize Chinese characters by unifying Z
Simplified
and Semantic
variants before transliterating them into Pinyin.
to know more about each variant, you can read the dedicated report on unicode.org
There are several dictionaries listing variations that we can use, I suggest using the kvariants dictionary made by hfhchan (see the related documentation on the same repo).
technical approach
Import and Rework the dictionary to be a key-value binding of each variant, then, in the Chinese normalizer, convert the provided character before transliterating it into Pinyin.
Files expected to be modified
Misc
related to meilisearch/product#503
Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement aSegmenter
or aNormalizer
.
Thanks a lot for your Contribution! 🤝