Skip to content

Conversation

@Aldeigja
Copy link

@Aldeigja Aldeigja commented Nov 15, 2025

Helper module that migrates existing attachments to S3.

Module for migration existing filestore
attachments into the S3 backend with queue_job orchestration
@Aldeigja Aldeigja marked this pull request as draft November 15, 2025 14:05
Comment on lines 25 to 26
("res_field", "=", False),
("res_field", "!=", False),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need such domain leaf? Cannot it be simply omitted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This is a required workaround for Odoo's ir.attachment._search method
which automatically adds ('res_field', '=', False) to any domain that doesn't
already contain 'res_field'. Without this tautology, only attachments where
res_field=False would be found, missing all field-linked attachments.

I'll add an explicit comment explaining this.

if not checksum:
return None

rf_domain = ["|", ("res_field", "=", False), ("res_field", "!=", False)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about domain here. If this is needed on purpose, then there should be an explicit comment explaining the reason behind it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above.

This module lets users move existing `ir.attachment` files from the
standard filestore or database into an Amazon S3-backed `fs.storage`, using a
wizard directly on the storage form.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrations are run in background batches, skipping attachments that are already stored in S3 or must remain in PostgreSQL.
This allows to run the process repeatedly avoiding creating duplicates.

Comment on lines 1 to 3
* Cetmix (cetmix.com)
* Ivan Sokolov
* George Smirnov
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cetmix (cetmix.com)

  • Ivan Sokolov
  • George Smirnov

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting is neither documented nor explained, however it's a very important one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add comment a comment in the XML explaining the channel purpose.

storage_id = fields.Many2one("fs.storage", required=True)
storage_code = fields.Char(required=True)
batch_size = fields.Integer(default=500)
channel = fields.Char(default="root.s3_migration")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not a good idea to hardcorde it this way, especially takin into account the fact that you are using a data file to add it.
You should assign the default values using a function that populates the env.ref("fs_attachment_s3_migration.queue_channel_s3_migration") value.

default=0,
)

@api.onchange("storage_id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to avoid using @onchange and use compared stored fields with readonly=False instead.
This will simplify the usage as part of automation including external calls.


_logger.info(
"Completed batch migration: %d/%d attachments to storage %s (%d skipped). "
"Old files will be cleaned by garbage collection.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the garbage collector

max_batches or "unlimited",
)

while True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have some concerns regarding this approach.

Pros

Can run multiple jobs in parallel -> faster

Cons

Jobs are created instantly using the current DB snapshot. So as soon as the migration is started no further changes are taken into account.
Eg if some attachments were removed they will be processed anyway even if this is not needed any longer.
When run in parallel several batches can access the same physical files (same checksum). If we modify the data in one of those batches the modification will not be saved in the database until the transaction is committed. So we either need to commit these changes explicitly (questionable) or prepare the batches checksum-grouped so attachments with the same checksum are always in the same batch.

Conclusion

Personally I would prefer to have those batches enqueued one after one to avoid potential issues and also always have an up-to date database snapshot being used. However I may be missing something here, so let's research this part better.


# Avoid empty writes if source is temporarily unreadable
if attachment.file_size and not file_data:
resolved = self._s3_resolve_migration_bytes(attachment, storage_code)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leads to a query being run N times which significantly degrades the performance especially on databases with large number of attachments.
I would suggest to fetch the data for the entire batch (or even the entire db 🤔 ) using read_group and then iterate it.

@Aldeigja Aldeigja force-pushed the 16.0-t5048-fs_attachment_s3_migration-add-module branch from 2611c77 to ec02e7a Compare November 29, 2025 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants