Georgian language support #751
thecotne
started this conversation in
Feedback & Feature Proposal
Replies: 1 comment
-
Hello @thecotne, sorry for the confusion; these two scripts/languages are unrelated. However, the tokenization process we apply to them is the same. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Readme of Charabia mentions Georgian but it says "Cyrillic - Georgian" and i am confused what that supposed to mean? https://github.com/meilisearch/charabia?tab=readme-ov-file#supported-languages
there is 3 different Georgian scripts Asomtavruli, Nuskhuri and Mkhedruli. (Mkhedruli, [...], is now the standard script for modern Georgian and its related Kartvelian languages)
Cyrillic is completely unrelated to those 3
Wikipedia article on Cyrillic script https://en.wikipedia.org/wiki/Cyrillic_script#Lowercase_forms mentions "Georgia type" and "Odesa Script" as alternative names for upright (printed) and cursive (handwritten) variants (whole page mentions georgia/georgian once)
google search on "Cyrillic Georgian" is weird since there is no such thing and it comes up with this abomination https://www.omniglot.com/conscripts/karturilitsa.htm - "Karturilitsa is an adaptation of the Cyrillic alphabet for Georgian devised by Sakana Oji. It was created after wondering how the Cyrillic script would have been used in Georgia during the Soviet era." it's a meme ...
my suggestion would be to first change "Cyrillic - Georgian" in readme to something more meaningful or at least something that people can google and get an explanation of what that is i would suggest "Cyrillic" and if there is some support for Georgian then add another row for Georgian
Beta Was this translation helpful? Give feedback.
All reactions