-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Checked other resources
- This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
- I added a clear and descriptive title that summarizes this issue.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
- I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
- I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
Example Code
Replace file libs/standard-tests/tests/unit_tests/test_in_memory_vectorstore.py
with this:
import pytest
from typing import TYPE_CHECKING
from langchain_core.vectorstores import (
InMemoryVectorStore,
VectorStore,
)
from collections.abc import Sequence
from langchain_core.documents import Document
from typing_extensions import override
from langchain_tests.integration_tests.vectorstores import VectorStoreIntegrationTests
class TestInMemoryVectorStore(VectorStoreIntegrationTests):
@pytest.fixture
def vectorstore(self) -> VectorStore:
embeddings = self.get_embeddings()
class ReverserInMemoryVectorStore(InMemoryVectorStore):
@override
def get_by_ids(
self,
ids: Sequence[str],
/,
) -> list[Document]:
return super().get_by_ids(ids)[::-1]
return ReverserInMemoryVectorStore(embedding=embeddings)
This introduces out-of-order results of get_by_ids
(which is allowed, see description)
Now run uv run pytest tests/unit_tests/test_in_memory_vectorstore.py
, will fail 6 tests.
Error Message and Stack Trace (if applicable)
Full failures reproduced by the above change:
[...]
================================================= short test summary info ==================================================
FAILED tests/unit_tests/test_in_memory_vectorstore.py::TestInMemoryVectorStore::test_get_by_ids - AssertionError: assert [Document(id='2', metadata={'id': 2}, page_content='bar'), Document(id='1', metadata={'id': 1}, page_content='foo')] == [Document(id='1', metadata={'id': 1}, page_content='foo'), Document(id='2', metadata={'id': 2}, page_content='bar')]
At index 0 diff: Document(id='2', metadata={'id': 2}, page_content='bar') != Document(id='1', metadata={'id': 1}, page_content='foo')
Full diff:
[
+ Document(id='2', metadata={'id': 2}, page_content='bar'),
Document(id='1', metadata={'id': 1}, page_content='foo'),
- Document(id='2', metadata={'id': 2}, page_content='bar'),
]
FAILED tests/unit_tests/test_in_memory_vectorstore.py::TestInMemoryVectorStore::test_add_documents_documents - AssertionError: assert [Document(id='2d889f3c-d285-4eb8-b25e-988dd86311e6', metadata={'id': 2}, page_content='bar'), Document(id='b519212e-d42c-413e-8e1f-9368da10568b', metadata={'id': 1}, page_content='foo')] == [Document(id='b519212e-d42c-413e-8e1f-9368da10568b', metadata={'id': 1}, page_content='foo'), Document(id='2d889f3c-d285-4eb8-b25e-988dd86311e6', metadata={'id': 2}, page_content='bar')]
At index 0 diff: Document(id='2d889f3c-d285-4eb8-b25e-988dd86311e6', metadata={'id': 2}, page_content='bar') != Document(id='b519212e-d42c-413e-8e1f-9368da10568b', metadata={'id': 1}, page_content='foo')
Full diff:
[
+ Document(id='2d889f3c-d285-4eb8-b25e-988dd86311e6', metadata={'id': 2}, page_content='bar'),
Document(id='b519212e-d42c-413e-8e1f-9368da10568b', metadata={'id': 1}, page_content='foo'),
- Document(id='2d889f3c-d285-4eb8-b25e-988dd86311e6', metadata={'id': 2}, page_content='bar'),
]
FAILED tests/unit_tests/test_in_memory_vectorstore.py::TestInMemoryVectorStore::test_add_documents_with_existing_ids - AssertionError: assert [Document(id='813f4d88-f46c-4a37-a5aa-b8039caf706c', metadata={'id': 2}, page_content='bar'), Document(id='foo', metadata={'id': 1}, page_content='foo')] == [Document(id='foo', metadata={'id': 1}, page_content='foo'), Document(id='813f4d88-f46c-4a37-a5aa-b8039caf706c', metadata={'id': 2}, page_content='bar')]
At index 0 diff: Document(id='813f4d88-f46c-4a37-a5aa-b8039caf706c', metadata={'id': 2}, page_content='bar') != Document(id='foo', metadata={'id': 1}, page_content='foo')
Full diff:
[
+ Document(id='813f4d88-f46c-4a37-a5aa-b8039caf706c', metadata={'id': 2}, page_content='bar'),
Document(id='foo', metadata={'id': 1}, page_content='foo'),
- Document(id='813f4d88-f46c-4a37-a5aa-b8039caf706c', metadata={'id': 2}, page_content='bar'),
]
FAILED tests/unit_tests/test_in_memory_vectorstore.py::TestInMemoryVectorStore::test_get_by_ids_async - AssertionError: assert [Document(id='2', metadata={'id': 2}, page_content='bar'), Document(id='1', metadata={'id': 1}, page_content='foo')] == [Document(id='1', metadata={'id': 1}, page_content='foo'), Document(id='2', metadata={'id': 2}, page_content='bar')]
At index 0 diff: Document(id='2', metadata={'id': 2}, page_content='bar') != Document(id='1', metadata={'id': 1}, page_content='foo')
Full diff:
[
+ Document(id='2', metadata={'id': 2}, page_content='bar'),
Document(id='1', metadata={'id': 1}, page_content='foo'),
- Document(id='2', metadata={'id': 2}, page_content='bar'),
]
FAILED tests/unit_tests/test_in_memory_vectorstore.py::TestInMemoryVectorStore::test_add_documents_documents_async - AssertionError: assert [Document(id='47352b82-de6d-4329-9713-9f838887fcc7', metadata={'id': 2}, page_content='bar'), Document(id='1eec56b3-2efc-4c66-9d7c-69ad9b2a81c0', metadata={'id': 1}, page_content='foo')] == [Document(id='1eec56b3-2efc-4c66-9d7c-69ad9b2a81c0', metadata={'id': 1}, page_content='foo'), Document(id='47352b82-de6d-4329-9713-9f838887fcc7', metadata={'id': 2}, page_content='bar')]
At index 0 diff: Document(id='47352b82-de6d-4329-9713-9f838887fcc7', metadata={'id': 2}, page_content='bar') != Document(id='1eec56b3-2efc-4c66-9d7c-69ad9b2a81c0', metadata={'id': 1}, page_content='foo')
Full diff:
[
+ Document(id='47352b82-de6d-4329-9713-9f838887fcc7', metadata={'id': 2}, page_content='bar'),
Document(id='1eec56b3-2efc-4c66-9d7c-69ad9b2a81c0', metadata={'id': 1}, page_content='foo'),
- Document(id='47352b82-de6d-4329-9713-9f838887fcc7', metadata={'id': 2}, page_content='bar'),
]
FAILED tests/unit_tests/test_in_memory_vectorstore.py::TestInMemoryVectorStore::test_add_documents_with_existing_ids_async - AssertionError: assert [Document(id='d98a2614-ab9b-4c44-9490-fec885273e59', metadata={'id': 2}, page_content='bar'), Document(id='foo', metadata={'id': 1}, page_content='foo')] == [Document(id='foo', metadata={'id': 1}, page_content='foo'), Document(id='d98a2614-ab9b-4c44-9490-fec885273e59', metadata={'id': 2}, page_content='bar')]
At index 0 diff: Document(id='d98a2614-ab9b-4c44-9490-fec885273e59', metadata={'id': 2}, page_content='bar') != Document(id='foo', metadata={'id': 1}, page_content='foo')
Full diff:
[
+ Document(id='d98a2614-ab9b-4c44-9490-fec885273e59', metadata={'id': 2}, page_content='bar'),
Document(id='foo', metadata={'id': 1}, page_content='foo'),
- Document(id='d98a2614-ab9b-4c44-9490-fec885273e59', metadata={'id': 2}, page_content='bar'),
]
========================================= 6 failed, 19 passed, 7 warnings in 0.36s =========================================
Description
Some tests in libs/standard-tests/langchain_tests/integration_tests/vectorstores.py
mistakenly assume that method get_by_ids
of the target vector store return documents in the same order as the provided ids
argument. This is not a valid assumption, as stated in the vectorstore base class docstring:
Users should not assume that the order of the returned documents matches
the order of the input IDs. Instead, users should rely on the ID field of the
returned document
For some vectorstores, the affected tests in class VectorStoreIntegrationTests
fail in ways such as the one pasted above (case in point: AstraDBVectorStore
. We have added xfail
there for this reason).
The tests are:
test_add_documents_with_existing_ids
/_async
test_get_by_ids
/_async
test_add_documents_documents
/_async
Let me remark that I am about to submit a PR to fix this :)
System Info
System Information
------------------
> OS: Linux
> OS Version: #1 SMP PREEMPT_DYNAMIC Sat Aug 23 17:02:17 UTC 2025
> Python Version: 3.13.7 (main, Aug 14 2025, 00:00:00) [GCC 15.2.1 20250808 (Red Hat 15.2.1-1)]
Package Information
-------------------
> langchain_core: 0.3.75
> langsmith: 0.4.4
> langchain_tests: 0.3.21
Optional packages not installed
-------------------------------
> langserve
Other Dependencies
------------------
> httpx: 0.28.1
> httpx<1,>=0.28.1: Installed. No version info available.
> jsonpatch<2.0,>=1.33: Installed. No version info available.
> langchain-core<2.0.0,>=0.3.75: Installed. No version info available.
> langsmith-pyo3: Installed. No version info available.
> langsmith>=0.3.45: Installed. No version info available.
> numpy>=1.26.2;: Installed. No version info available.
> numpy>=2.1.0;: Installed. No version info available.
> openai-agents: Installed. No version info available.
> opentelemetry-api: Installed. No version info available.
> opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
> opentelemetry-sdk: Installed. No version info available.
> orjson: 3.10.15
> packaging: 24.2
> packaging>=23.2: Installed. No version info available.
> pydantic: 2.10.6
> pydantic>=2.7.4: Installed. No version info available.
> pytest: 8.3.4
> pytest-asyncio<2,>=0.20: Installed. No version info available.
> pytest-benchmark: 5.1.0
> pytest-codspeed: 3.2.0
> pytest-recording: 0.13.4
> pytest-socket<1,>=0.7.0: Installed. No version info available.
> pytest<9,>=7: Installed. No version info available.
> PyYAML>=5.3: Installed. No version info available.
> requests: 2.32.3
> requests-toolbelt: 1.0.0
> rich: 14.0.0
> syrupy<5,>=4: Installed. No version info available.
> tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
> typing-extensions>=4.7: Installed. No version info available.
> vcrpy>=7.0: Installed. No version info available.
> zstandard: 0.23.0