Skip to content

Meilisearch auto-batching uses a lot of space on /tmp, "out of space" error #2844

@groentebroer

Description

@groentebroer

Describe the bug
When adding new sets of documents to the index through the API, the indexer stops indexing after a certain amount of time. The /tasks
API endpoints shows "No space left on device (os error 28)". The partition where the index should be placed has plenty of space available. The /tmp has limited amount of space, because that was not needed (until now maybe?)...

Adding --disable-auto-batching makes 0.29.0 work like 0.28.1, which still indexes all items, so the exhausive use of /tmp seems to be related to the auto-batch feature which is enabled by default in 0.29.0.

Is there any use of auto-batching when adding new tasks with over 400 items at a time?
When adding/updating single items, the batching makes sense, but in this case the documents are already offered in "batches"...

To Reproduce

  1. Create a clean install and data folder of 0.29.0.
  2. I have a dataset with about 150.000 articles.
  3. The dataset is divided in JSON files from 400 articles to 25.000 articles per file.
  4. Have a relatively small /tmp partition (4GB at my case).
  5. Add all dataset JSON files through the API to make Meilisearch index those.
  6. Wait until the CPU stops working, without completing any of the tasks created by the addition of the JSON files.

Expected behavior
Successful indexing, dispite the "small" (4GB) /tmp size.

Meilisearch version:
{
"commitSha": "fa315352da8100e43da0f345712fb43cdf0271cb",
"commitDate": "2022-09-13T16:07:06Z",
"pkgVersion": "0.29.0"
}

Additional context
Linux, Intel

Metadata

Metadata

Assignees

No one assigned

    Labels

    supportIssues related to support questions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions