Skip to content

ML. Send smaller IIIF Images to the ML learning endpoints + don't index empties/no detections #102

@DiegoPino

Description

@DiegoPino

What?

Now using real data/large collections, sending a full/full image to the back end is just too process intensive. The reason we were sending full/full was because smaller images than the model needs (e.g Insightface needs 640x640) could be upscaled by Python and still get a good enough vector, so I could be lazy and not actually check for the size at the processor level. But .. I should not be lazy anymore. Some portrait images we processed are of 6Million pixels, and will just move gigabytes of data between backend and front end for a 640 representation.

Also. New. Each Post processor (adding this to the interface) will have two extra methods: validateForIndex and validateForChaining. The base implementation can be just a return TRUE. But some Processors should return FALSE, if, e.g the output is not what we need. ML models with empty vectors (and thus empty OCR) should not fill the Solr index with nothing.

This would also allow OCR that leads to 0 to have no index entry (e.g failed OCR)

So what now

  • Check the original size.
  • Depending on the Model, send a larger than needed size (e.g for image segmentation) so we don't loose details (like the person standing on the back)
  • Others send just a few extra %.
  • For smaller than desired for the model, add a checkbox allowing people to "skip" ADOs that don't provide the best data.

@alliomeria for your radar

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions