Skip to content

Implement Pinyin normalizer #135

@ManyTheFish

Description

@ManyTheFish

Today Meilisearch normalizes Chinese characters by converting traditional characters into simplified ones.

drawback

This normalization process doesn't seem to enhance the recall of Meilisearch.

enhancement

Following the official discussion about Chinese support in Meilisearch, it is more relevant to normalize Chinese characters by transliterating them into a Phonological version.
In order to have accurate phonology for Mandarin, we should normalize Chinese characters into Pinyin using the pinyin crates.

Files expected to be modified

Misc

related to product#503

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions