Skip to content

Support comparing line GT directories with line OCR directories #64

@mikegerber

Description

@mikegerber

In #62, @stweil's original problem was - as I understand it - to compare a directory with line GT text files with a directory of line OCR text files. For now I've created fake test data to implement this fake-line-gt.zip. It looks like this:

% ls *
gt:
line001.gt.txt  line003.gt.txt  line005.gt.txt  line007.gt.txt  line009.gt.txt  line011.gt.txt
line002.gt.txt  line004.gt.txt  line006.gt.txt  line008.gt.txt  line010.gt.txt

some-ocr:
line001.some-ocr.txt  line003.some-ocr.txt  line005.some-ocr.txt  line007.some-ocr.txt  line009.some-ocr.txt  line011.some-ocr.txt
line002.some-ocr.txt  line004.some-ocr.txt  line006.some-ocr.txt  line008.some-ocr.txt  line010.some-ocr.txt

A first implementation should compare the text of pairs files (matching by filename) and produce a report of metrics & differences over all of the lines. First idea of the CLI interface:

dinglehopper-lines gt/ --gt-suffix .gt.txt some-ocr/ --ocr-suffix .some-ocr.txt

I'm not sure if this will be the final CLI interface but it's what seems necessary on first glance.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions