In #62, @stweil's original problem was - as I understand it - to compare a directory with line GT text files with a directory of line OCR text files. For now I've created fake test data to implement this fake-line-gt.zip. It looks like this:
% ls *
gt:
line001.gt.txt line003.gt.txt line005.gt.txt line007.gt.txt line009.gt.txt line011.gt.txt
line002.gt.txt line004.gt.txt line006.gt.txt line008.gt.txt line010.gt.txt
some-ocr:
line001.some-ocr.txt line003.some-ocr.txt line005.some-ocr.txt line007.some-ocr.txt line009.some-ocr.txt line011.some-ocr.txt
line002.some-ocr.txt line004.some-ocr.txt line006.some-ocr.txt line008.some-ocr.txt line010.some-ocr.txt
A first implementation should compare the text of pairs files (matching by filename) and produce a report of metrics & differences over all of the lines. First idea of the CLI interface:
dinglehopper-lines gt/ --gt-suffix .gt.txt some-ocr/ --ocr-suffix .some-ocr.txt
I'm not sure if this will be the final CLI interface but it's what seems necessary on first glance.
In #62, @stweil's original problem was - as I understand it - to compare a directory with line GT text files with a directory of line OCR text files. For now I've created fake test data to implement this fake-line-gt.zip. It looks like this:
A first implementation should compare the text of pairs files (matching by filename) and produce a report of metrics & differences over all of the lines. First idea of the CLI interface:
I'm not sure if this will be the final CLI interface but it's what seems necessary on first glance.