Skip to content

Disable HMM feature of Jieba #136

@ManyTheFish

Description

@ManyTheFish

Today, we are using the Hidden Markov Model algorithm (HMM) provided by the cut method of Jieba to segment unknown Chinese words in the Chinese segmenter.

drawback

Following the subdiscussion in the official discussion about Chinese support in Meilisearch, it seems that the HMM feature of Jieba is not relevant in the context of a search engine. This feature creates longer words and inconsistencies in the segmentation, which reduces the recall of Meilisearch without significantly raising the precision.

enhancement

Deactivate the HMM feature in Chinese segmentation.

Files expected to be modified

Misc

related to product#503

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions