Skip to content

ElenaKhaustova/kedro-rag-chatbot

Repository files navigation

Kedro RAG Chatbot

This project demonstrates how to use Kedro to create a Retrieval-Augmented Generation (RAG)-based chatbot.

The chatbot is designed to assist users with Kedro-related questions by leveraging historical Q&A data from our Kedro Slack support channel. It creates a vector store from Slack conversations and employs a Generative AI-based agent to retrieve relevant context and generate accurate responses.

See the demo on YouTube.

Note: this project is a toy example, designed to explain how you can use Kedro to structure and manage GenAI workflows. While not production-ready, it provides a strong foundation for more advanced implementations.

Features

  • Extracts Q&A data from Slack conversations
  • Converts text data into embeddings and stores them in a vector database
  • Implements a retrieval-augmented chatbot using LangChain and OpenAI
  • Interactive CLI interface for user interaction
  • Compares RAG-based answers with responses from a standard LLM (without context retrieval)
  • Saves interaction logs, including user questions, retrieved context, and chatbot responses

Setup

1. Clone the Repository

https://github.com/ElenaKhaustova/kedro-rag-chatbot.git
cd kedro-rag-chatbot

2. Install Dependencies

pip install -r requirements.txt

3. Add API Credentials

Create a credentials.yml file and place it in the conf/base/ directory with the following format:

openai:
  openai_api_base: <openai-api-base>
  openai_api_key: <openai-api-key>

4. Verify Data Availability

The necessary raw data for a test run is already included in data/01_raw.

Running the Project

Step 1: Create the Vector Store

This step processes the Slack Q&A data and stores embeddings in a vector database.

kedro run -p create_vector_store

Step 2: Run the Chatbot Agent

This step initializes the AI agent, allowing it to query the vector store and generate responses.

kedro run -t agent_rag

Note: to run agent_rag pipeline we use agent_rag tag to reuse some nodes from create_vector_store pipeline.

Usage

Once the chatbot is running, you can interact with it via the CLI. For each question you ask, the chatbot will provide:

  1. A response generated by the RAG agent using retrieved context.
  2. A response from a standard LLM without context retrieval.

This allows you to compare the effectiveness of retrieval-augmented generation versus a general-purpose model.

After exiting the loop, all questions asked, retrieved context, and generated answers are saved in data/08_reporting/output.md.

Output example

User Input

How can I force node execution order in the Kedro pipeline?

LLM Output ❌

To force node execution order in the Kedro pipeline, you can use the before and after arguments when defining your nodes in the pipeline.py file.

For example, if you have two nodes node1 and node2, and you want node1 to run before node2, you can define them like this:

from kedro.pipeline import node

def create_pipeline():
    return Pipeline(
        [
            node(func=node1, inputs="input_data", outputs="output_data", name="node1"),
            node(func=node2, inputs="output_data", outputs="final_output", name="node2", before="node1")
        ]
    )

In this example, node2 will run after node1 because of the before="node1" argument. You can also use the after argument to specify that a node should run after another node.

By using these arguments, you can control the execution order of nodes in your Kedro pipeline.

Agent Output ✅

In Kedro, the recommended approach is to rely on the topological sorting of nodes in the pipeline to determine the execution order. However, if you need to force a specific node execution order, you can create "pass-through" nodes to achieve this.

Here is an example of how you can force node execution order by creating pass-through nodes:

from kedro.pipeline import node

def pass_through_function(input_data):
    # Perform any necessary operations here
    return input_data

pass_through_node = node(pass_through_function, inputs="input_data", outputs="output_data")

# Define your pipeline with the pass-through nodes to enforce execution order
pipeline = Pipeline([
    pass_through_node,
    node(process_data, inputs="output_data", outputs="processed_data"),
    node(analyze_data, inputs="processed_data", outputs=None)
])

By inserting pass-through nodes between the nodes that need to be executed in a specific order, you can enforce the desired execution sequence in the pipeline.

About

RAG-based Chatbot with Kedro

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages