ML. Send smaller IIIF Images to the ML learning endpoints + don't index empties/no detections

# What?

Now using real data/large collections, sending a `full/full` image to the back end is just too process intensive. The reason we were sending `full/full` was because smaller images than the model needs (e.g Insightface needs 640x640) could be upscaled by Python and still get a good enough vector, so I could be lazy and not actually check for the size at the processor level. But .. I should not be lazy anymore. Some portrait images we processed are of 6Million pixels, and will just move gigabytes of data between backend and front end for a 640 representation.

Also. New. Each Post processor (adding this to the interface) will have two extra methods: validateForIndex and validateForChaining. The base implementation can be just a return TRUE. But some Processors should return FALSE, if, e.g the output is not what we need. ML models with empty vectors (and thus empty OCR) should not fill the Solr index with nothing.

This would also allow OCR that leads to 0 to have no index  entry (e.g failed OCR)

# So what now

- Check the original size.
- Depending on the Model, send a larger than needed size (e.g for image segmentation) so we don't loose details (like the person standing on the back)
- Others send just a few extra %.
- For smaller than desired for the model, add a checkbox allowing people to "skip" ADOs that don't provide the best data.

@alliomeria for your radar


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ML. Send smaller IIIF Images to the ML learning endpoints + don't index empties/no detections #102

What?

So what now

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ML. Send smaller IIIF Images to the ML learning endpoints + don't index empties/no detections #102

Description

What?

So what now

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions