-
Notifications
You must be signed in to change notification settings - Fork 167
fix: Use separate header object for each upload in Transfer Manager MPU #1595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @MattIrv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug in the Transfer Manager's multipart upload functionality. Previously, concurrent upload tasks were inadvertently sharing the same HTTP headers dictionary, leading to data corruption errors such as checksum mismatches. The change ensures thread safety and reliable multipart uploads by passing a distinct copy of the headers to each individual upload task, thereby preventing unintended modifications across concurrent operations. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses a critical concurrency issue within the Transfer Manager's multipart upload functionality. By ensuring each worker thread receives a separate copy of the headers dictionary, the change effectively prevents race conditions that were causing data corruption errors. The fix is simple, targeted, and essential for the stability of concurrent uploads. I approve of this change.
2e3007f to
20f0d94
Compare
chandra-siri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your fix!
|
@chandra-siri I don't see a way to re-run the approvers check so that I can submit this. Are you able to do so? Thanks! |
done! |
I didn't see test failures due to this (are the transfer manager tests running continuously?) but in our own repo (GoogleCloudPlatform/gcs-connector-for-pytorch#209) a copied implementation of this caused upload failures (with errors like
google.cloud.storage.exceptions.DataCorruption: Checksum mismatch: checksum calculated by client and server did not match. Error code: BadDigest, Error message: The MD5 you specified in Content-MD5 or x-goog-hash did not match what we computed., Error details: The specified CRC (-1631273713) does not match what we computed (1851139595)) because the headers dict gets updated in upload.py but passing the headers dict toexecutor.submithere causes all threads to share the same object.