Skip to content

use a consistent part size so that ETags are predictable #7

Closed
@andrewrk

Description

@andrewrk

S3 uses this algorithm for ETags for multipart uploads:

  1. Compute the MD5 sum of each part
  2. Compute the MD5 sum of the concatenation of the MD5 sum digests of each part
  3. Digest of that + '-' + part count

Sadly this means that when you want to check a local file to see if it matches an S3 object that was uploaded via multipart, you must also know what size each of the parts are, and this information is not available in S3 metadata.

One way to mitigate this problem is to use consistent part sizes when uploading files. For example, if I set maxPartSize to 5MB, then each part uploaded to S3 should be exactly 5MB, except for the last one. Currently the code flushes a part when the part size is slightly above maxPartSize. This makes it impossible to do client side ETag calculation.

Note that s3cmd, which by default does 15MB multipart uploads, has behavior like I am describing where each part is exactly 15MB (except for the last one).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions