Skip to content

[langchain-sqlserver] Use binary collation for custom_id column#299

Open
Copilot wants to merge 6 commits into
mainfrom
copilot/use-binary-collation-custom-id
Open

[langchain-sqlserver] Use binary collation for custom_id column#299
Copilot wants to merge 6 commits into
mainfrom
copilot/use-binary-collation-custom-id

Conversation

Copilot AI commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

The custom_id column in EmbeddingStore was created without a collation. This adds support for Latin1_General_100_BIN2_UTF8 binary collation on the custom_id column as an opt-in flag for best performance, while preserving backward compatibility for existing deployments by defaulting to the database's collation.

Changes

  • BINARY_COLLATION constant"Latin1_General_100_BIN2_UTF8" defined at module level.
  • use_binary_collation_on_custom_id: bool = False — new __init__ parameter; set to True to apply binary collation for best performance. Defaults to False to preserve backward compatibility with existing tables that were created without a forced collation.
  • _get_embedding_store() — conditionally passes collation=BINARY_COLLATION to VARCHAR(1000) for custom_id based on the flag.

Usage

# Default: uses the database's collation (backward compatible)
store = SQLServer_VectorStore(
    connection_string=...,
    embedding_function=...,
    embedding_length=1536,
)

# Opt in: binary collation applied for best performance (recommended for new deployments)
store = SQLServer_VectorStore(
    connection_string=...,
    embedding_function=...,
    embedding_length=1536,
    use_binary_collation_on_custom_id=True,
)
Original prompt

This section details on the original issue you should resolve

<issue_title>[langchain-sqlserver] Use binary collation for custom_id column</issue_title>
<issue_description>To get the best performance possible, binary collation should be used for the custom_id column. Specifically the collation Latin1_General_100_BIN2_UTF8 should be used when creating the custom_id column in the EmbeddingStore function.

To keep backward compatibility and allow users to use the default collation defined for the database, instead of forcing the binary collation, a new option should be introduced in the SQLServer_VectorStore class so that one can opt-out of the forced binary collation, for example:

MySqlServerVectorStore.ForceBinaryComparisonOnCustomId = false
```</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: santiagxf <32112894+santiagxf@users.noreply.github.com>
Copilot AI changed the title [WIP] Use binary collation for custom_id column [langchain-sqlserver] Use binary collation for custom_id column Feb 27, 2026
@santiagxf Facundo Santiago (santiagxf) added langchain-sqlserver Issues related to langchain-sqlserver bug Something isn't working labels Feb 27, 2026
@santiagxf Facundo Santiago (santiagxf) marked this pull request as ready for review February 27, 2026 22:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a performance improvement request (#74) by applying Latin1_General_100_BIN2_UTF8 binary collation to the custom_id column in EmbeddingStore, with an opt-out flag for users who prefer the database's default collation.

Changes:

  • Adds a BINARY_COLLATION constant at module level in vectorstores.py
  • Introduces a new use_binary_collation_on_custom_id: bool = True constructor parameter that conditionally passes the collation to VARCHAR(1000) when building the ORM model
  • Adds a new test file test_binary_collation.py covering default behavior, explicit opt-in, opt-out, and the constant value

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
libs/sqlserver/langchain_sqlserver/vectorstores.py Adds BINARY_COLLATION constant, new __init__ parameter, stores it as _use_binary_collation_on_custom_id, and conditionally applies collation in _get_embedding_store
libs/sqlserver/tests/unit_tests/test_binary_collation.py New unit tests verifying binary collation is applied by default, when explicitly set to True, and not applied when opted out

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libs/sqlserver/langchain_sqlserver/vectorstores.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working langchain-sqlserver Issues related to langchain-sqlserver

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[langchain-sqlserver] Use binary collation for custom_id column

3 participants