Skip to content

qmeng222/PDFAnswerBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDFAnswerBot

Overview:

  • Description: AI-powered question answering for PDFs with user feedback.
  • Tech stack:
    • LLM: ChatGPT, LangChain
    • Frontend: JavaScript, HTML, CSS
    • Backend:
      • Language & framework: Python, Flask
      • Asynchronous job queue: Celery (Python specific)
      • Message broker: Redis
      • Vector database: Pinecone
      • Tracing & debugging: Langfuse
  • Demo: project demo

Setup:

  1. (For project isolation) create & activate a virtual environment (dependencies are installed within the virtual environment other than system-wide & all subsequent steps will be performed within the virtual environment):
    python -m venv .venv
    source .venv/bin/activate
    
  2. Upgrade the pip package manager to the latest version within the current Python environment: python -m pip install --upgrade pip
  3. Install libraries/packages/dependencies: pip install -r requirements.txt
  4. Initialize the database (if ever need to clear out all the data from this application, just run this command again): flask --app app.web init-db
  1. Install Redis on the macOS system using Homebrew: brew install redis

Running the app:

There are 3 separate processes that need to be running for the app to work: background jobs

  • invoke the Python server (in development mode): inv dev

  • launch the Redis server (the message broker), allowing it to accept connections and handle data storage and retrieval:

    # launch the Redis server:
    redis-server
    
    # (only if needed) identify the process using port 6379:
    lsof -i :6379
    
    # (only if needed) terminate a process:
    kill <PID>
    
    # (only if needed) connect to the Redis instance:
    redis-cli
    
    # (only if needed) retrieve all info from a specific hash using the 'hash get all' command:
    HGETALL <key> # eg: HGETALL llm_score_counts
    
    
  • invoke the worker: inv devworker

If you stop any of these processes, you will need to start them back up!

If you need to stop them, select the terminal window the process is running in and press ctrl+C to quit.

To reset the database: flask --app app.web init-db

Start by creating an account using the following credentials for example: email [email protected], password abc123

Structure & concepts:

  • App workflow: app workflow
  • The Python server: Python server
  • Converational QA chain: converational QA chain
  • Persistent message storage: persistent message storage
  • Conversational retrieval chain: conversational retrieval chain conversational retrieval chain
  • Streaming text generation: streaming text generation streaming controls how OpenAI responds to LangChain, and the way to call the model controls how OpenAI resonds to LangChain + how LangChain responds to the user: transmission of a response
  • Random component parts in retrieval chain: random component parts in retrieval chain
  • The retrieval chain component maps: retrieval chain component maps
  • Self-improving text generation by collecting user feedback: self-improving text generation by collecting user feedback calculate the ave feedback scores and stored within Redis
  • Conversation buffer window memory: conversation buffer window memory

Resources:

  1. .gitignore File – How to Ignore Files and Folders in Git
  2. Keyword Argument & Positional Argument in Python
  3. python-dotenv: Load configuration without altering the environment
  4. Pinecone Documentation
  5. Pinecone Examples
  6. Pinecone AI Support
  7. Langfuse | Quickstart
  8. Langfuse | Self-Hosting Guide
  9. Self-hosted Langfuse

About

AI-powered question answering for PDFs with user feedback.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published