Kedro RAG Chatbot

This project demonstrates how to use Kedro to create a Retrieval-Augmented Generation (RAG)-based chatbot.

The chatbot is designed to assist users with Kedro-related questions by leveraging historical Q&A data from our Kedro Slack support channel. It creates a vector store from Slack conversations and employs a Generative AI-based agent to retrieve relevant context and generate accurate responses.

See the demo on YouTube.

Note: this project is a toy example, designed to explain how you can use Kedro to structure and manage GenAI workflows. While not production-ready, it provides a strong foundation for more advanced implementations.

Features

Extracts Q&A data from Slack conversations
Converts text data into embeddings and stores them in a vector database
Implements a retrieval-augmented chatbot using LangChain and OpenAI
Interactive CLI interface for user interaction
Compares RAG-based answers with responses from a standard LLM (without context retrieval)
Saves interaction logs, including user questions, retrieved context, and chatbot responses

Setup

1. Clone the Repository

https://github.com/ElenaKhaustova/kedro-rag-chatbot.git
cd kedro-rag-chatbot

2. Install Dependencies

pip install -r requirements.txt

3. Add API Credentials

Create a credentials.yml file and place it in the conf/base/ directory with the following format:

openai:
  openai_api_base: <openai-api-base>
  openai_api_key: <openai-api-key>

4. Verify Data Availability

The necessary raw data for a test run is already included in data/01_raw.

Running the Project

Step 1: Create the Vector Store

This step processes the Slack Q&A data and stores embeddings in a vector database.

kedro run -p create_vector_store

Step 2: Run the Chatbot Agent

This step initializes the AI agent, allowing it to query the vector store and generate responses.

kedro run -t agent_rag

Note: to run agent_rag pipeline we use agent_rag tag to reuse some nodes from create_vector_store pipeline.

Usage

Once the chatbot is running, you can interact with it via the CLI. For each question you ask, the chatbot will provide:

A response generated by the RAG agent using retrieved context.
A response from a standard LLM without context retrieval.

This allows you to compare the effectiveness of retrieval-augmented generation versus a general-purpose model.

After exiting the loop, all questions asked, retrieved context, and generated answers are saved in data/08_reporting/output.md.

Output example

User Input

How can I force node execution order in the Kedro pipeline?

LLM Output ❌

To force node execution order in the Kedro pipeline, you can use the before and after arguments when defining your nodes in the pipeline.py file.

For example, if you have two nodes node1 and node2, and you want node1 to run before node2, you can define them like this:

from kedro.pipeline import node

def create_pipeline():
    return Pipeline(
        [
            node(func=node1, inputs="input_data", outputs="output_data", name="node1"),
            node(func=node2, inputs="output_data", outputs="final_output", name="node2", before="node1")
        ]
    )

In this example, node2 will run after node1 because of the before="node1" argument. You can also use the after argument to specify that a node should run after another node.

By using these arguments, you can control the execution order of nodes in your Kedro pipeline.

Agent Output ✅

In Kedro, the recommended approach is to rely on the topological sorting of nodes in the pipeline to determine the execution order. However, if you need to force a specific node execution order, you can create "pass-through" nodes to achieve this.

Here is an example of how you can force node execution order by creating pass-through nodes:

from kedro.pipeline import node

def pass_through_function(input_data):
    # Perform any necessary operations here
    return input_data

pass_through_node = node(pass_through_function, inputs="input_data", outputs="output_data")

# Define your pipeline with the pass-through nodes to enforce execution order
pipeline = Pipeline([
    pass_through_node,
    node(process_data, inputs="output_data", outputs="processed_data"),
    node(analyze_data, inputs="processed_data", outputs=None)
])

By inserting pass-through nodes between the nodes that need to be executed in a specific order, you can enforce the desired execution sequence in the pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
conf		conf
data		data
docs/source		docs/source
src/kedro_rag_chatbot		src/kedro_rag_chatbot
.gitignore		.gitignore
.telemetry		.telemetry
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kedro RAG Chatbot

Features

Setup

1. Clone the Repository

2. Install Dependencies

3. Add API Credentials

4. Verify Data Availability

Running the Project

Step 1: Create the Vector Store

Step 2: Run the Chatbot Agent

Usage

Output example

User Input

LLM Output ❌

Agent Output ✅

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kedro RAG Chatbot

Features

Setup

1. Clone the Repository

2. Install Dependencies

3. Add API Credentials

4. Verify Data Availability

Running the Project

Step 1: Create the Vector Store

Step 2: Run the Chatbot Agent

Usage

Output example

User Input

LLM Output ❌

Agent Output ✅

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages