Skip to content

watch=true very slow with large mailboxes #584

@half-duplex

Description

@half-duplex

I have 17,000 historical reports in a folder in o365, which I'm trying to process with parsedmarc.
This seems to require watch=true to do more than one batch, but with watch=true it works through these at an extremely slow rate (less than 1/second).

It appears this is because a batch_size param is passed to connector.fetch_messages() on the first call but not subsequent ones, so the first batch is reasonably fast and every one after that is running a slow, expensive query to list all 17,000 items in the mailbox.

Bad workaround: Set batch_size=250 or so, so the expensive query is only run 34 times instead of 1,700, and deal with the duplicates if it crashes between starting processing and moving/deleting the emails. (Also, don't use watch=true - see #416 )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions