Skip to content

disk space usage by duplicate attachments #465

@patroqueeet

Description

@patroqueeet

symptom

when I send a personalised mail to many users with the same attachment, the attachment file will be duplicated for each mail and use a large amount of disk space when sent to thousands.

expected solution

have only one Attachment object for a single file linked by many Emails.

workaround as of now

run script to detect and consolidate frequently:

import hashlib
import os

for a in Attachment.objects.all():
    attachments = Attachment.objects.filter(name=a.name).exclude(pk=a.pk)
    if attachments.count() > 1:
        md5 = hashlib.md5()
        if not os.path.exists(a.file.path):
            continue
        md5.update(a.file.file.read())
        hash0 = md5.hexdigest()
        for attachment in attachments:
            md5a = hashlib.md5()
            md5a.update(attachment.file.file.read())
            hash = md5a.hexdigest()
            if hash0 == hash and attachment.name == a.name:
                print(f"{attachment} ({attachment.pk}) is duplicate of {a} ({a.pk})")
                for email in attachment.emails.all():
                    print(f"for {email.pk} add {a} ({a.pk}) and delete {attachment} ({attachment.pk})")
                    if os.path.exists(attachment.file.path):
                        os.remove(attachment.file.path)
                    email.attachments.add(a)
                    if attachment.id:
                        attachment.delete()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions