Skip to content

cBioPortal/cBioPubChat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 cBioPubChat

Ask questions. Explore cancer publications. Discover insights.

⚠️ This project is a work in progress being developed for the cBioPortal Hackathon 2025.

cBioPubChat is an AI-powered chatbot designed to help researchers, clinicians, and enthusiasts interact with publications from cBioPortal studies. By combining vector search with large language models, the chatbot can:

  • Retrieve the most relevant studies based on a user question
  • Summarize the key findings from those studies
  • Provide direct links to the studies in cBioPortal

Sample Use Case

“Which pathways are most commonly altered in ovarian cancer?”

cBioPubChat will:

  • Search all study publication text using embedding similarity
  • Summarize relevant findings with an LLM
  • Provide links to those studies in cBioPortal

Planned Tech Stack

  • Chainlit – Interactive chat UI
  • LangChain – LLM pipeline & orchestration
  • ChromaDB – Vector store for publication embeddings
  • Python 3.10+
  • LLMs – OpenAI or other LangChain-compatible providers

Planned Project Structure

cBioPubChat/
├── app/                          # Chainlit app frontend and config
│   ├── main.py                   # Chainlit entrypoint (UI + LangChain agent)
│   └── config.toml               # Chainlit config (title, theme, etc.)
├── backend/                      # Core logic: embeddings, indexing, QA
│   ├── ingest/
│   │   ├── parse_publications.py # PDF, HTML, or plain text loader
│   │   ├── embed_and_store.py    # Convert text → embeddings → store in ChromaDB
│   │   └── __init__.py
│   ├── qa/
│   │   ├── query_engine.py       # Embedding search + summarization pipeline
│   │   └── __init__.py
│   └── __init__.py
├── data/                         # Raw and processed publication data
│   ├── raw/                      # Raw PDFs or metadata
│   └── processed/                # Text chunks or cleaned files
├── chroma/                       # Local ChromaDB index directory (auto-created)
├── notebooks/                    # (Optional) Jupyter notebooks for exploration
│   └── analysis.ipynb
├── tests/                        # Unit and integration tests
│   ├── test_ingest.py
│   ├── test_query.py
│   └── ...
├── scripts/                      # Convenience scripts (e.g., bootstrap)
│   └── run_ingest.sh
├── .env                          # API keys, secrets (ignored by git)
├── .gitignore
├── README.md
└── requirements.txt              # Pip dependencies

About

LLM-powered chatbot for exploring cBioPortal publications.

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published