Skip to content

Commit 2360e5a

Browse files
sharanshirodkar7pre-commit-ci[bot]lvliang-intel
authored
adding lancedb to langchain vectorstores (#291)
* adding lancedb to langchain vectorstores Signed-off-by: sharanshirodkar7 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: sharanshirodkar7 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: lvliang-intel <[email protected]>
1 parent 5b3053f commit 2360e5a

File tree

1 file changed

+139
-0
lines changed
  • comps/vectorstores/langchain/lancedb

1 file changed

+139
-0
lines changed
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# LanceDB
2+
3+
LanceDB is an embedded vector database for AI applications. It is open source and distributed with an Apache-2.0 license.
4+
5+
LanceDB datasets are persisted to disk and can be shared in Python.
6+
7+
## Setup
8+
9+
```bash
10+
npm install -S vectordb
11+
```
12+
13+
## Usage
14+
15+
### Create a new index from texts
16+
17+
```python
18+
import os
19+
import tempfile
20+
from langchain.vectorstores import LanceDB
21+
from langchain.embeddings.openai import OpenAIEmbeddings
22+
from vectordb import connect
23+
24+
25+
async def run():
26+
dir = tempfile.mkdtemp(prefix="lancedb-")
27+
db = await connect(dir)
28+
table = await db.create_table("vectors", [{"vector": [0] * 1536, "text": "sample", "id": 1}])
29+
30+
vector_store = await LanceDB.from_texts(
31+
["Hello world", "Bye bye", "hello nice world"],
32+
[{"id": 2}, {"id": 1}, {"id": 3}],
33+
OpenAIEmbeddings(),
34+
table=table,
35+
)
36+
37+
result_one = await vector_store.similarity_search("hello world", 1)
38+
print(result_one)
39+
# [ Document(page_content='hello nice world', metadata={'id': 3}) ]
40+
41+
42+
# Run the function
43+
import asyncio
44+
45+
asyncio.run(run())
46+
```
47+
48+
API Reference:
49+
50+
- `LanceDB` from `@langchain/community/vectorstores/lancedb`
51+
- `OpenAIEmbeddings` from `@langchain/openai`
52+
53+
### Create a new index from a loader
54+
55+
```python
56+
import os
57+
import tempfile
58+
from langchain.vectorstores import LanceDB
59+
from langchain.embeddings.openai import OpenAIEmbeddings
60+
from langchain.document_loaders.fs import TextLoader
61+
from vectordb import connect
62+
63+
# Create docs with a loader
64+
loader = TextLoader("src/document_loaders/example_data/example.txt")
65+
docs = loader.load()
66+
67+
68+
async def run():
69+
dir = tempfile.mkdtemp(prefix="lancedb-")
70+
db = await connect(dir)
71+
table = await db.create_table("vectors", [{"vector": [0] * 1536, "text": "sample", "source": "a"}])
72+
73+
vector_store = await LanceDB.from_documents(docs, OpenAIEmbeddings(), table=table)
74+
75+
result_one = await vector_store.similarity_search("hello world", 1)
76+
print(result_one)
77+
# [
78+
# Document(page_content='Foo\nBar\nBaz\n\n', metadata={'source': 'src/document_loaders/example_data/example.txt'})
79+
# ]
80+
81+
82+
# Run the function
83+
import asyncio
84+
85+
asyncio.run(run())
86+
```
87+
88+
API Reference:
89+
90+
- `LanceDB` from `@langchain/community/vectorstores/lancedb`
91+
- `OpenAIEmbeddings` from `@langchain/openai`
92+
- `TextLoader` from `langchain/document_loaders/fs/text`
93+
94+
### Open an existing dataset
95+
96+
```python
97+
import os
98+
import tempfile
99+
from langchain.vectorstores import LanceDB
100+
from langchain.embeddings.openai import OpenAIEmbeddings
101+
from vectordb import connect
102+
103+
104+
async def run():
105+
uri = await create_test_db()
106+
db = await connect(uri)
107+
table = await db.open_table("vectors")
108+
109+
vector_store = LanceDB(OpenAIEmbeddings(), table=table)
110+
111+
result_one = await vector_store.similarity_search("hello world", 1)
112+
print(result_one)
113+
# [ Document(page_content='Hello world', metadata={'id': 1}) ]
114+
115+
116+
async def create_test_db():
117+
dir = tempfile.mkdtemp(prefix="lancedb-")
118+
db = await connect(dir)
119+
await db.create_table(
120+
"vectors",
121+
[
122+
{"vector": [0] * 1536, "text": "Hello world", "id": 1},
123+
{"vector": [0] * 1536, "text": "Bye bye", "id": 2},
124+
{"vector": [0] * 1536, "text": "hello nice world", "id": 3},
125+
],
126+
)
127+
return dir
128+
129+
130+
# Run the function
131+
import asyncio
132+
133+
asyncio.run(run())
134+
```
135+
136+
API Reference:
137+
138+
- `LanceDB` from `@langchain/community/vectorstores/lancedb`
139+
- `OpenAIEmbeddings` from `@langchain/openai`

0 commit comments

Comments
 (0)