-
Notifications
You must be signed in to change notification settings - Fork 96
Description
Today, we are using the Hidden Markov Model algorithm (HMM) provided by the cut
method of Jieba to segment unknown Chinese words in the Chinese segmenter.
drawback
Following the subdiscussion in the official discussion about Chinese support in Meilisearch, it seems that the HMM feature of Jieba is not relevant in the context of a search engine. This feature creates longer words and inconsistencies in the segmentation, which reduces the recall of Meilisearch without significantly raising the precision.
enhancement
Deactivate the HMM feature in Chinese segmentation.
Files expected to be modified
Misc
related to product#503
Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement aSegmenter
or aNormalizer
.
Thanks a lot for your Contribution! 🤝