Store detected Language per document during indexing

> ⚠️: This issue is not an easy one, it requires some knowledge in Rust and more work than the other issues. 
I highly encourage beginners to take another issue.

## Summary

Meilisearch automatically detects the Script and the Language during indexing and search.
Because the searches only contain small texts, it is almost impossible to efficiently detect the used Language.
However, during indexing, Meilisearch receives complete documents on which it is easier to detect the Language, And so, instead of knowing the Language used in the search query, we could know the Language used in the data where we search in.

related to: https://github.com/meilisearch/product/discussions/532#discussioncomment-3709627

## technical approach
### Create a new database
The first step is to create a new database [in the index](https://github.com/meilisearch/milli/blob/main/milli/src/index.rs) named `script_language_docids` in the Index that stores as the key: the `Script` concatenated to the `Language` and as the value: a `RoaringBitmap` containing all the concerned docids, be aware that the key needs a [specialized codec](https://github.com/meilisearch/milli/tree/main/milli/src/heed_codec).

**related files:**
- [index.rs](https://github.com/meilisearch/milli/blob/main/milli/src/index.rs)
- [heed_codec/](https://github.com/meilisearch/milli/tree/main/milli/src/heed_codec)

### Extract and index data
During [word position extraction](https://github.com/meilisearch/milli/blob/main/milli/src/update/index_documents/extract/extract_docid_word_positions.rs) we should store the detected languages in a hashmap linked with the docids in order to [send the hashmap](https://github.com/meilisearch/milli/blob/main/milli/src/update/index_documents/extract/mod.rs) to the main thread at the end of the extraction task.
Then the main thread will have to [store these data](https://github.com/meilisearch/milli/blob/main/milli/src/update/index_documents/typed_chunk.rs) in the `script_language_docids` database.
Be aware that the same document can contain several Languages, and so, should be indexed as the value of several Script/Language pairs.

**related files:**
- [extract_docid_word_positions.rs](https://github.com/meilisearch/milli/blob/main/milli/src/update/index_documents/extract/extract_docid_word_positions.rs)
- [(extract/)mod.rs](https://github.com/meilisearch/milli/blob/main/milli/src/update/index_documents/extract/mod.rs)
- [typed_chunk.rs](https://github.com/meilisearch/milli/blob/main/milli/src/update/index_documents/typed_chunk.rs)

### Delete data
When [removing documents](https://github.com/meilisearch/milli/blob/main/milli/src/update/delete_documents.rs), we should take care of removing the corresponding docids from the `script_language_docids` database.
Then, [when the database is cleared](https://github.com/meilisearch/milli/blob/main/milli/src/update/clear_documents.rs), the `script_language_docids` database should be cleared too.

**related files:**
- [delete_documents.rs](https://github.com/meilisearch/milli/blob/main/milli/src/update/delete_documents.rs)
- [clear_documents.rs](https://github.com/meilisearch/milli/blob/main/milli/src/update/clear_documents.rs)

## Todo
- [x] create a new database
  - [x] implementation
- [x] update this database during indexing
  - [x] implementation
  - [x] tests
- [x] update this database during deletion
  - [x] implementation
  - [x] tests



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Store detected Language per document during indexing #646

Summary

technical approach

Create a new database

Extract and index data

Delete data

Todo

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Store detected Language per document during indexing #646

Description

Summary

technical approach

Create a new database

Extract and index data

Delete data

Todo

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions