Reconnecting media that failed to be moved from source to destination on ADO persistence

# What?

So first a disclaimer (and this is on me many times, sorry). AWS SDK (s3) requires any file larger than 5 Gbytes to be uploaded via Multipart. We implement this correctly. But...

@alliomeria this is important!

...If your Archipelago (production) is Routing via minio an AWS S3 bucket instead of using it directly from AWS via S3FS, then Mini (a bug there) will incorrectly send a header to Amazon that is not in specs with the current API. 

And uploading a temp file to its final destination (managed by Archipelago) will fail with this message in Minio:
```
- API: CopyObjectPart(bucket=YOURBUCKET object=media/111/video-UUID-verylarge.mp4)
Time: 21:18:09 UTC 01/03/2025
DeploymentID: XXXX
RequestID: XXXX
RemoteHost: XXXXX
Host: esmero-minio:9000
UserAgent: aws-sdk-php/3.324.10 ua/2.0 OS/Linux#6.1.112-122.189.amzn2023.aarch64 lang/php#8.1.25 GuzzleHttp/7
Error: x-amz-server-side-encryption header is not supported for this operation. (minio.ErrorResponse)
```

If your storage is 100% managed by Mini, no issue, if your storage on the Drupal side of things is direct to AWS S3 no issue.


Ok, that said. Let's say your file was not correctly uploaded to `media/111/video-UUUID-verylarge.mp4`. For failsafe we will still "connect" it to the original source (to avoid total failure) and it will stay where it was (so you can still stream it) at e.g `/media/upload/verylarge.mp4`. Issue with this? Well, if you delete your originals, all is broken for that file. But most importantly IIIF (cantaloupe) won't see it and won't be able to provide anything.

In specific, this one here does the job on a normal operation:

https://github.com/esmero/strawberryfield/blob/ce3aae811900f49996a1b2bb631b64b602a004f2/src/EventSubscriber/StrawberryfieldEventPresaveSubscriberFilePersister.php#L107

calling this here:

https://github.com/esmero/strawberryfield/blob/main/src/StrawberryfieldFilePersisterService.php#L764-L883

The thing is, once we pass that stage (if we could/not could) we make the file permanent to avoid failure and that whole code will never run again, making (except via custom code/and manually copying things) re-establishing the Archipelago "we manage your file" contract impossible.

Now that we know the problem. Solution

# What do we do?

Few options. 
- Allow a re-try on a next save. Like add a flag somewhere so we can not depend only if the "file" entity is permanent to decide if we run or not
- Emergency. A VBO action that does exactly what ::persistFilesInJsonToDisks does, but without the "constraints" there. Moreover. It validates first. Here is how
   - For every File in an ADO
   - Get the current Storage Location.
   - If Current Storage Location != Desired Location
     - Check if there is already a file in Desired Location, if not Copy to Desired Location
     - Update the File Entity to use the Desired Location
     - Save.
     - Done.
   - Else. Do nothing  
   
My only concern is: VBO. the files I saw fail today are 20Gbytes+. Not sure VBO will be able to do this in a run. So probably the VBO part will only do the "check if needs fixing" and then enqueue in the new AMI Action Queue to actually do the JOB.

Thanks



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reconnecting media that failed to be moved from source to destination on ADO persistence #226

What?

What do we do?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reconnecting media that failed to be moved from source to destination on ADO persistence #226

Description

What?

What do we do?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions