added markdown document for ocr engine comparison#577
Conversation
| * **Bad**, because increases support complexity with multiple engines | ||
|
|
||
| ### Confirmation | ||
|
|
There was a problem hiding this comment.
Elaborate on how this is done. I would assume that you have the 100+ PDFs at hand and wrote a test suite?
There was a problem hiding this comment.
No, i wrote this in advance assuming that I will have that many tested later on, but I deleted that section now. Looking at the level of detail and sophistication of the other markdowns (very little) I decided it's not needed.
There was a problem hiding this comment.
An ADR can also have TODOs and links to existing drafts of the test suite.
|
|
||
| * Current implementation uses Tesseract 4.x with LSTM engine | ||
| * In benchmarks, Google Cloud Vision shows the highest overall accuracy | ||
| * Handwriting (categories 2 & 3) is the main differentiator among engines |
There was a problem hiding this comment.
Where are these catorgies mentioned?
There was a problem hiding this comment.
Same as above, deleted this section
|
|
||
| The web resources that informed this ADR: | ||
|
|
||
| 1. <https://www.mdpi.com/2073-8994/12/5/715> |
There was a problem hiding this comment.
Link that to each pro/con agrument
| @@ -0,0 +1,153 @@ | |||
| # ADR-002: OCR Engine Selection for JabRef | |||
There was a problem hiding this comment.
Try to follow the format given at JabRef's repo - and place it in the JabRef folder. https://github.com/JabRef/jabref/tree/main/docs/decisions
I think, this is AI generated, because I cannot explain otherwise why A) this takes number 0002 - and in the heading.
(And does not follow the MADR format)
There was a problem hiding this comment.
I adjusted the format a little bit, but it was already very similar to the other md files in the folder. I restructured the heading a little bit to make it even more similar.
See the new PR here: JabRef/jabref#13573
|
should go to devdocs: jabref/docs/decisions |
|
Follow-up PR is JabRef/jabref#13573 Therefore, I close this one. |
This is related to gsoc ocr project by Kaan Erdem.
JabRef/jabref#13313