-
Notifications
You must be signed in to change notification settings - Fork 9.4k
Added hash based check to prevent image duplication on product import #21146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added hash based check to prevent image duplication on product import #21146
Conversation
…ation upon successive product CSV imports.
Hi @erfanimani. Thank you for your contribution
For more details, please, review the Magento Contributor Assistant documentation |
@erfanimani I have used your PR and added image deletion to it #21855 Thanks for the PR, saved me quite a lot of time |
* | ||
* @param array $images | ||
*/ | ||
public function addImageHashes(&$images) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, make a method private and add strict types ti the method
* | ||
* @return string | ||
*/ | ||
protected function getImportDir() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, make a method private and add strict types ti the method
@erfanimani , I am closing this PR now due to inactivity. |
Hi @erfanimani, thank you for your contribution! |
As I just realized that magento uses different copies of images for the same images when uploaded manually during new product creation I want to ask if the attempt here to solve the issue would work also when creating a new product and reupload the same images. Would this also use the hash check or the check runs only during csv import? |
@cptX it's been a while, but I believe it just affects the CSV import. |
Description
I've added a MD5 based hash check to prevent image duplication upon successive product CSV imports.
Note that this is a proof of concept, even though it works — sort of.
The problem is that the Add/Update product import method should be declarative, and thus idempotent — it should not cause side effects when importing the same CSV multiple times.
As it stands now, this is not the case (confirmed), running the import multiple times causes images to be added. The "Replace" import method is beyond silly for production systems as it trashes the old IDs (and takes quotes, reports, wish lists, comparison lists, and so on with it).
We also can't replace just the images as it causes the different filenames, thus realistically a filename + hash comparison is the way to go. But filename checks are hard because we're not keeping track of the file metadata upon import (and as the file is renamed when Magento moves it into media/catalog/product, any reference to the old filename is lost) — so the easiest solution that kinda works is a hash based check.
I want to know whether this is the way to implement it or not. It's already a messy, mostly undocumented, class and no straight forward way to implement it properly, but any guidance would be helpful.
Fixed Issues
Manual testing scenarios (*)
There's many testing scenario's actually. Some of them being:
Please let me know how to proceed and I can amend my commit message and implement it properly.