Skip to content

standard-tests: some vector store tests mistakenly assume get_by_ids preserves document order #32820

@hemidactylus

Description

@hemidactylus

Checked other resources

  • This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Example Code

Replace file libs/standard-tests/tests/unit_tests/test_in_memory_vectorstore.py with this:

import pytest
from typing import TYPE_CHECKING
from langchain_core.vectorstores import (
    InMemoryVectorStore,
    VectorStore,
)
from collections.abc import Sequence
from langchain_core.documents import Document

from typing_extensions import override



from langchain_tests.integration_tests.vectorstores import VectorStoreIntegrationTests


class TestInMemoryVectorStore(VectorStoreIntegrationTests):
    @pytest.fixture
    def vectorstore(self) -> VectorStore:
        embeddings = self.get_embeddings()

        class ReverserInMemoryVectorStore(InMemoryVectorStore):

            @override
            def get_by_ids(
                self,
                ids: Sequence[str],
                /,
            ) -> list[Document]:
                return super().get_by_ids(ids)[::-1]

        return ReverserInMemoryVectorStore(embedding=embeddings)

This introduces out-of-order results of get_by_ids (which is allowed, see description)

Now run uv run pytest tests/unit_tests/test_in_memory_vectorstore.py, will fail 6 tests.

Error Message and Stack Trace (if applicable)

Full failures reproduced by the above change:

[...]
================================================= short test summary info ==================================================
FAILED tests/unit_tests/test_in_memory_vectorstore.py::TestInMemoryVectorStore::test_get_by_ids - AssertionError: assert [Document(id='2', metadata={'id': 2}, page_content='bar'), Document(id='1', metadata={'id': 1}, page_content='foo')] == [Document(id='1', metadata={'id': 1}, page_content='foo'), Document(id='2', metadata={'id': 2}, page_content='bar')]
  
  At index 0 diff: Document(id='2', metadata={'id': 2}, page_content='bar') != Document(id='1', metadata={'id': 1}, page_content='foo')
  
  Full diff:
    [
  +     Document(id='2', metadata={'id': 2}, page_content='bar'),
        Document(id='1', metadata={'id': 1}, page_content='foo'),
  -     Document(id='2', metadata={'id': 2}, page_content='bar'),
    ]
FAILED tests/unit_tests/test_in_memory_vectorstore.py::TestInMemoryVectorStore::test_add_documents_documents - AssertionError: assert [Document(id='2d889f3c-d285-4eb8-b25e-988dd86311e6', metadata={'id': 2}, page_content='bar'), Document(id='b519212e-d42c-413e-8e1f-9368da10568b', metadata={'id': 1}, page_content='foo')] == [Document(id='b519212e-d42c-413e-8e1f-9368da10568b', metadata={'id': 1}, page_content='foo'), Document(id='2d889f3c-d285-4eb8-b25e-988dd86311e6', metadata={'id': 2}, page_content='bar')]
  
  At index 0 diff: Document(id='2d889f3c-d285-4eb8-b25e-988dd86311e6', metadata={'id': 2}, page_content='bar') != Document(id='b519212e-d42c-413e-8e1f-9368da10568b', metadata={'id': 1}, page_content='foo')
  
  Full diff:
    [
  +     Document(id='2d889f3c-d285-4eb8-b25e-988dd86311e6', metadata={'id': 2}, page_content='bar'),
        Document(id='b519212e-d42c-413e-8e1f-9368da10568b', metadata={'id': 1}, page_content='foo'),
  -     Document(id='2d889f3c-d285-4eb8-b25e-988dd86311e6', metadata={'id': 2}, page_content='bar'),
    ]
FAILED tests/unit_tests/test_in_memory_vectorstore.py::TestInMemoryVectorStore::test_add_documents_with_existing_ids - AssertionError: assert [Document(id='813f4d88-f46c-4a37-a5aa-b8039caf706c', metadata={'id': 2}, page_content='bar'), Document(id='foo', metadata={'id': 1}, page_content='foo')] == [Document(id='foo', metadata={'id': 1}, page_content='foo'), Document(id='813f4d88-f46c-4a37-a5aa-b8039caf706c', metadata={'id': 2}, page_content='bar')]
  
  At index 0 diff: Document(id='813f4d88-f46c-4a37-a5aa-b8039caf706c', metadata={'id': 2}, page_content='bar') != Document(id='foo', metadata={'id': 1}, page_content='foo')
  
  Full diff:
    [
  +     Document(id='813f4d88-f46c-4a37-a5aa-b8039caf706c', metadata={'id': 2}, page_content='bar'),
        Document(id='foo', metadata={'id': 1}, page_content='foo'),
  -     Document(id='813f4d88-f46c-4a37-a5aa-b8039caf706c', metadata={'id': 2}, page_content='bar'),
    ]
FAILED tests/unit_tests/test_in_memory_vectorstore.py::TestInMemoryVectorStore::test_get_by_ids_async - AssertionError: assert [Document(id='2', metadata={'id': 2}, page_content='bar'), Document(id='1', metadata={'id': 1}, page_content='foo')] == [Document(id='1', metadata={'id': 1}, page_content='foo'), Document(id='2', metadata={'id': 2}, page_content='bar')]
  
  At index 0 diff: Document(id='2', metadata={'id': 2}, page_content='bar') != Document(id='1', metadata={'id': 1}, page_content='foo')
  
  Full diff:
    [
  +     Document(id='2', metadata={'id': 2}, page_content='bar'),
        Document(id='1', metadata={'id': 1}, page_content='foo'),
  -     Document(id='2', metadata={'id': 2}, page_content='bar'),
    ]
FAILED tests/unit_tests/test_in_memory_vectorstore.py::TestInMemoryVectorStore::test_add_documents_documents_async - AssertionError: assert [Document(id='47352b82-de6d-4329-9713-9f838887fcc7', metadata={'id': 2}, page_content='bar'), Document(id='1eec56b3-2efc-4c66-9d7c-69ad9b2a81c0', metadata={'id': 1}, page_content='foo')] == [Document(id='1eec56b3-2efc-4c66-9d7c-69ad9b2a81c0', metadata={'id': 1}, page_content='foo'), Document(id='47352b82-de6d-4329-9713-9f838887fcc7', metadata={'id': 2}, page_content='bar')]
  
  At index 0 diff: Document(id='47352b82-de6d-4329-9713-9f838887fcc7', metadata={'id': 2}, page_content='bar') != Document(id='1eec56b3-2efc-4c66-9d7c-69ad9b2a81c0', metadata={'id': 1}, page_content='foo')
  
  Full diff:
    [
  +     Document(id='47352b82-de6d-4329-9713-9f838887fcc7', metadata={'id': 2}, page_content='bar'),
        Document(id='1eec56b3-2efc-4c66-9d7c-69ad9b2a81c0', metadata={'id': 1}, page_content='foo'),
  -     Document(id='47352b82-de6d-4329-9713-9f838887fcc7', metadata={'id': 2}, page_content='bar'),
    ]
FAILED tests/unit_tests/test_in_memory_vectorstore.py::TestInMemoryVectorStore::test_add_documents_with_existing_ids_async - AssertionError: assert [Document(id='d98a2614-ab9b-4c44-9490-fec885273e59', metadata={'id': 2}, page_content='bar'), Document(id='foo', metadata={'id': 1}, page_content='foo')] == [Document(id='foo', metadata={'id': 1}, page_content='foo'), Document(id='d98a2614-ab9b-4c44-9490-fec885273e59', metadata={'id': 2}, page_content='bar')]
  
  At index 0 diff: Document(id='d98a2614-ab9b-4c44-9490-fec885273e59', metadata={'id': 2}, page_content='bar') != Document(id='foo', metadata={'id': 1}, page_content='foo')
  
  Full diff:
    [
  +     Document(id='d98a2614-ab9b-4c44-9490-fec885273e59', metadata={'id': 2}, page_content='bar'),
        Document(id='foo', metadata={'id': 1}, page_content='foo'),
  -     Document(id='d98a2614-ab9b-4c44-9490-fec885273e59', metadata={'id': 2}, page_content='bar'),
    ]
========================================= 6 failed, 19 passed, 7 warnings in 0.36s =========================================

Description

Some tests in libs/standard-tests/langchain_tests/integration_tests/vectorstores.py mistakenly assume that method get_by_ids of the target vector store return documents in the same order as the provided ids argument. This is not a valid assumption, as stated in the vectorstore base class docstring:

Users should not assume that the order of the returned documents matches
the order of the input IDs. Instead, users should rely on the ID field of the
returned document

For some vectorstores, the affected tests in class VectorStoreIntegrationTests fail in ways such as the one pasted above (case in point: AstraDBVectorStore. We have added xfail there for this reason).

The tests are:

  • test_add_documents_with_existing_ids / _async
  • test_get_by_ids / _async
  • test_add_documents_documents / _async

Let me remark that I am about to submit a PR to fix this :)

System Info

System Information
------------------
> OS:  Linux
> OS Version:  #1 SMP PREEMPT_DYNAMIC Sat Aug 23 17:02:17 UTC 2025
> Python Version:  3.13.7 (main, Aug 14 2025, 00:00:00) [GCC 15.2.1 20250808 (Red Hat 15.2.1-1)]

Package Information
-------------------
> langchain_core: 0.3.75
> langsmith: 0.4.4
> langchain_tests: 0.3.21

Optional packages not installed
-------------------------------
> langserve

Other Dependencies
------------------
> httpx: 0.28.1
> httpx<1,>=0.28.1: Installed. No version info available.
> jsonpatch<2.0,>=1.33: Installed. No version info available.
> langchain-core<2.0.0,>=0.3.75: Installed. No version info available.
> langsmith-pyo3: Installed. No version info available.
> langsmith>=0.3.45: Installed. No version info available.
> numpy>=1.26.2;: Installed. No version info available.
> numpy>=2.1.0;: Installed. No version info available.
> openai-agents: Installed. No version info available.
> opentelemetry-api: Installed. No version info available.
> opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
> opentelemetry-sdk: Installed. No version info available.
> orjson: 3.10.15
> packaging: 24.2
> packaging>=23.2: Installed. No version info available.
> pydantic: 2.10.6
> pydantic>=2.7.4: Installed. No version info available.
> pytest: 8.3.4
> pytest-asyncio<2,>=0.20: Installed. No version info available.
> pytest-benchmark: 5.1.0
> pytest-codspeed: 3.2.0
> pytest-recording: 0.13.4
> pytest-socket<1,>=0.7.0: Installed. No version info available.
> pytest<9,>=7: Installed. No version info available.
> PyYAML>=5.3: Installed. No version info available.
> requests: 2.32.3
> requests-toolbelt: 1.0.0
> rich: 14.0.0
> syrupy<5,>=4: Installed. No version info available.
> tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
> typing-extensions>=4.7: Installed. No version info available.
> vcrpy>=7.0: Installed. No version info available.
> zstandard: 0.23.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugRelated to a bug, vulnerability, unexpected error with an existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions