Description
S3 uses this algorithm for ETags for multipart uploads:
- Compute the MD5 sum of each part
- Compute the MD5 sum of the concatenation of the MD5 sum digests of each part
- Digest of that + '-' + part count
Sadly this means that when you want to check a local file to see if it matches an S3 object that was uploaded via multipart, you must also know what size each of the parts are, and this information is not available in S3 metadata.
One way to mitigate this problem is to use consistent part sizes when uploading files. For example, if I set maxPartSize
to 5MB, then each part uploaded to S3 should be exactly 5MB, except for the last one. Currently the code flushes a part when the part size is slightly above maxPartSize
. This makes it impossible to do client side ETag calculation.
Note that s3cmd, which by default does 15MB multipart uploads, has behavior like I am describing where each part is exactly 15MB (except for the last one).