Implement an efficient `Nonspacing Mark` Normalizer

In the Information Retrieval (IR) context, removing Nonspacing Marks like diacritics is a good way to increase recall without losing much precision, like in Latin, Arabic, or Hebrew. 

## Technical Approach
Implement a  new Normalizer, named `NonspacingMarkNormalizer`, that removes [the nonspacing marks](https://www.compart.com/en/unicode/category/Mn) from a provided token (find a naive implementation with the exhaustive list in the `Misc` section).
Because there are a lot of sparse character ranges to match, it would be inefficient to create a big if-forest to know if a character is a nonspacing mark.
This way, I suggest trying several implementations of the naive implementation below in a small local project.

### Interesting Rust Crates
- [hyperfine](https://crates.io/crates/hyperfine): a small command-line tool to benchmark several binaries
- [roaring-rs](https://crates.io/crates/roaring): a bitmap data structure that has an efficient `contains` method
- [once_cell](https://crates.io/crates/once_cell): a good Library to create lazy statics already used in the repository

## Misc
- [naive implementation of `is_nonspacing_mark`](https://gist.github.com/ManyTheFish/2e05698d1c76c0504c3f7738de40c95d)
- related discussion about [the Arabic Language](https://github.com/meilisearch/product/discussions/139)

> Hey! 👋 
Before starting any implementation, make sure that you read the [CONTRIBUTING.md](https://github.com/meilisearch/charabia/blob/main/CONTRIBUTING.md#contributing) file.
In addition to the recurrent rules, you can find some guides to easily implement a `Segmenter` or a `Normalizer`.
Thanks a lot for your Contribution! 🤝

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement an efficient `Nonspacing Mark` Normalizer #137

Technical Approach

Interesting Rust Crates

Misc

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement an efficient Nonspacing Mark Normalizer #137

Description

Technical Approach

Interesting Rust Crates

Misc

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Implement an efficient `Nonspacing Mark` Normalizer #137