Skip to content

Commit 92d3177

Browse files
authored
Cassandra database tool (#13423)
1 parent cad1eaf commit 92d3177

File tree

13 files changed

+1410
-0
lines changed

13 files changed

+1410
-0
lines changed

docs/docs/examples/tools/casssandra.ipynb

Lines changed: 396 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
llama_index/_static
2+
.DS_Store
3+
# Byte-compiled / optimized / DLL files
4+
__pycache__/
5+
*.py[cod]
6+
*$py.class
7+
8+
# C extensions
9+
*.so
10+
11+
# Distribution / packaging
12+
.Python
13+
bin/
14+
build/
15+
develop-eggs/
16+
dist/
17+
downloads/
18+
eggs/
19+
.eggs/
20+
etc/
21+
include/
22+
lib/
23+
lib64/
24+
parts/
25+
sdist/
26+
share/
27+
var/
28+
wheels/
29+
pip-wheel-metadata/
30+
share/python-wheels/
31+
*.egg-info/
32+
.installed.cfg
33+
*.egg
34+
MANIFEST
35+
36+
# PyInstaller
37+
# Usually these files are written by a python script from a template
38+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
39+
*.manifest
40+
*.spec
41+
42+
# Installer logs
43+
pip-log.txt
44+
pip-delete-this-directory.txt
45+
46+
# Unit test / coverage reports
47+
htmlcov/
48+
.tox/
49+
.nox/
50+
.coverage
51+
.coverage.*
52+
.cache
53+
nosetests.xml
54+
coverage.xml
55+
*.cover
56+
*.py,cover
57+
.hypothesis/
58+
.pytest_cache/
59+
.ruff_cache
60+
61+
# Translations
62+
*.mo
63+
*.pot
64+
65+
# Django stuff:
66+
*.log
67+
local_settings.py
68+
db.sqlite3
69+
db.sqlite3-journal
70+
71+
# Flask stuff:
72+
instance/
73+
.webassets-cache
74+
75+
# Scrapy stuff:
76+
.scrapy
77+
78+
# Sphinx documentation
79+
docs/_build/
80+
81+
# PyBuilder
82+
target/
83+
84+
# Jupyter Notebook
85+
.ipynb_checkpoints
86+
notebooks/
87+
88+
# IPython
89+
profile_default/
90+
ipython_config.py
91+
92+
# pyenv
93+
.python-version
94+
95+
# pipenv
96+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
97+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
98+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
99+
# install all needed dependencies.
100+
#Pipfile.lock
101+
102+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
103+
__pypackages__/
104+
105+
# Celery stuff
106+
celerybeat-schedule
107+
celerybeat.pid
108+
109+
# SageMath parsed files
110+
*.sage.py
111+
112+
# Environments
113+
.env
114+
.venv
115+
env/
116+
venv/
117+
ENV/
118+
env.bak/
119+
venv.bak/
120+
pyvenv.cfg
121+
122+
# Spyder project settings
123+
.spyderproject
124+
.spyproject
125+
126+
# Rope project settings
127+
.ropeproject
128+
129+
# mkdocs documentation
130+
/site
131+
132+
# mypy
133+
.mypy_cache/
134+
.dmypy.json
135+
dmypy.json
136+
137+
# Pyre type checker
138+
.pyre/
139+
140+
# Jetbrains
141+
.idea
142+
modules/
143+
*.swp
144+
145+
# VsCode
146+
.vscode
147+
148+
# pipenv
149+
Pipfile
150+
Pipfile.lock
151+
152+
# pyright
153+
pyrightconfig.json
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
poetry_requirements(
2+
name="poetry",
3+
)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
GIT_ROOT ?= $(shell git rev-parse --show-toplevel)
2+
3+
help: ## Show all Makefile targets.
4+
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[33m%-30s\033[0m %s\n", $$1, $$2}'
5+
6+
format: ## Run code autoformatters (black).
7+
pre-commit install
8+
git ls-files | xargs pre-commit run black --files
9+
10+
lint: ## Run linters: pre-commit (black, ruff, codespell) and mypy
11+
pre-commit install && git ls-files | xargs pre-commit run --show-diff-on-failure --files
12+
13+
test: ## Run tests via pytest.
14+
pytest tests
15+
16+
watch-docs: ## Build and watch documentation.
17+
sphinx-autobuild docs/ docs/_build/html --open-browser --watch $(GIT_ROOT)/llama_index/
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Cassandra Database Tools
2+
3+
## Overview
4+
5+
The Cassandra Database Tools project is designed to help AI engineers efficiently integrate Large Language Models (LLMs) with Apache Cassandra® data. It facilitates optimized and safe interactions with Cassandra databases, supporting various deployments like Apache Cassandra®, DataStax Enterprise™, and DataStax Astra™.
6+
7+
## Key Features
8+
9+
- **Fast Data Access:** Optimized queries ensure most operations complete in milliseconds.
10+
- **Schema Introspection:** Enhances the reasoning capabilities of LLMs by providing detailed schema information.
11+
- **Compatibility:** Supports various Cassandra deployments, ensuring wide applicability.
12+
- **Safety Measures:** Limits operations to SELECT queries and schema introspection to prioritize data integrity.
13+
14+
## Installation
15+
16+
Ensure your system has Python installed and proceed with the following installations via pip:
17+
18+
```bash
19+
pip install python-dotenv cassio llama-index-tools-cassandra
20+
```
21+
22+
Create a `.env` file for environmental variables related to Cassandra and Astra configurations, following the example structure provided in the notebook.
23+
24+
## Environment Setup
25+
26+
- For Cassandra: Configure `CASSANDRA_CONTACT_POINTS`, `CASSANDRA_USERNAME`, `CASSANDRA_PASSWORD`, and `CASSANDRA_KEYSPACE`.
27+
- For DataStax Astra: Set `ASTRA_DB_APPLICATION_TOKEN`, `ASTRA_DB_DATABASE_ID`, and `ASTRA_DB_KEYSPACE`.
28+
29+
## How It Works
30+
31+
The toolkit leverages the Cassandra Query Language (CQL) and integrates with LLMs to provide an efficient query path determination for the user's requests, ensuring best practices for querying are followed. Using functions, the LLMs decision making can invoke the tool instead of designing custom queries. The result is faster and efficient access to Cassandra data for agents.
32+
33+
## Tools Included
34+
35+
- **`cassandra_db_schema`**: Fetches schema information, essential for the agent’s operation.
36+
- **`cassandra_db_select_table_data`**: Allows selection of data from a specific keyspace and table.
37+
- **`cassandra_db_query`**: An experimental tool that accepts fully formed query strings from the agent.
38+
39+
## Example Usage
40+
41+
Initialize the CassandraDatabase and set up the agent with the tools provided. Query the database by interacting with the agent as shown in the example [notebook](https://docs.llamaindex.ai/en/latest/examples/tools/cassandra/).
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
python_sources()
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
from llama_index.tools.cassandra.base import CassandraDatabaseToolSpec
2+
3+
4+
__all__ = ["CassandraDatabaseToolSpec"]
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
"""Tools for interacting with an Apache Cassandra database."""
2+
from typing import List
3+
4+
from llama_index.core.bridge.pydantic import Field
5+
from llama_index.core.schema import Document
6+
from llama_index.core.tools.tool_spec.base import BaseToolSpec
7+
8+
from llama_index.tools.cassandra.cassandra_database_wrapper import (
9+
CassandraDatabase,
10+
)
11+
12+
13+
class CassandraDatabaseToolSpec(BaseToolSpec):
14+
"""Base tool for interacting with an Apache Cassandra database."""
15+
16+
db: CassandraDatabase = Field(exclude=True)
17+
18+
spec_functions = [
19+
"cassandra_db_query",
20+
"cassandra_db_schema",
21+
"cassandra_db_select_table_data",
22+
]
23+
24+
def __init__(self, db: CassandraDatabase) -> None:
25+
"""DB session in context."""
26+
self.db = db
27+
28+
def cassandra_db_query(self, query: str) -> List[Document]:
29+
"""Execute a CQL query and return the results as a list of Documents.
30+
31+
Args:
32+
query (str): A CQL query to execute.
33+
34+
Returns:
35+
List[Document]: A list of Document objects, each containing data from a row.
36+
"""
37+
documents = []
38+
result = self.db.run_no_throw(query, fetch="Cursor")
39+
for row in result:
40+
doc_str = ", ".join([str(value) for value in row])
41+
documents.append(Document(text=doc_str))
42+
return documents
43+
44+
def cassandra_db_schema(self, keyspace: str) -> List[Document]:
45+
"""Input to this tool is a keyspace name, output is a table description
46+
of Apache Cassandra tables.
47+
If the query is not correct, an error message will be returned.
48+
If an error is returned, report back to the user that the keyspace
49+
doesn't exist and stop.
50+
51+
Args:
52+
keyspace (str): The name of the keyspace for which to return the schema.
53+
54+
Returns:
55+
List[Document]: A list of Document objects, each containing a table description.
56+
"""
57+
return [Document(text=self.db.get_keyspace_tables_str_no_throw(keyspace))]
58+
59+
def cassandra_db_select_table_data(
60+
self, keyspace: str, table: str, predicate: str, limit: int
61+
) -> List[Document]:
62+
"""Tool for getting data from a table in an Apache Cassandra database.
63+
Use the WHERE clause to specify the predicate for the query that uses the
64+
primary key. A blank predicate will return all rows. Avoid this if possible.
65+
Use the limit to specify the number of rows to return. A blank limit will
66+
return all rows.
67+
68+
Args:
69+
keyspace (str): The name of the keyspace containing the table.
70+
table (str): The name of the table for which to return data.
71+
predicate (str): The predicate for the query that uses the primary key.
72+
limit (int): The maximum number of rows to return.
73+
74+
Returns:
75+
List[Document]: A list of Document objects, each containing a row of data.
76+
"""
77+
return [
78+
Document(
79+
text=self.db.get_table_data_no_throw(keyspace, table, predicate, limit)
80+
)
81+
]

0 commit comments

Comments
 (0)