Skip to content

Conversation

@ben417
Copy link
Contributor

@ben417 ben417 commented May 4, 2023

Fixes the issue reported in #4061

  1. Added 0x102c and 0x1062 in the tone mark section, in Karen these can be tones too.

  2. Added the optional 0x103a, 0x1037, and 0x1038 after the tones. Asat is part of the Sgaw tone mark and dot below and visarga are used as nasal marks following the Pwo tones.

And here are some text files for testing:

test_strings.txt - A few Sgaw and Pwo test strings highlighting the errors this PR fixes.
syllables_sgaw.txt - All possible Sgaw syllables.
syllables_pwo.txt - All possible Pwo syllables.

1. Added 0x102c and 0x1062 in the tone mark section, in Karen these can
be tones too.

2. Added the optional 0x103a, 0x1037, and 0x1038 after the tones. Asat
is part of the Sgaw tone mark and dot below and visarga are used as
nasal marks following the Pwo tones.
@amitdo amitdo merged commit ed69e57 into tesseract-ocr:main May 5, 2023
@amitdo
Copy link
Collaborator

amitdo commented May 5, 2023

Thank you for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants