yt-digest

A YouTube transcript digest generator that searches for YouTube videos, extracts transcripts, and generates AI-powered summaries using OpenAI's GPT models. The tool outputs summaries to an RSS feed for easy consumption in your favorite RSS reader.

Maintenance Mode: This project has reached maturity and is now in maintenance mode. It is stable and feature-complete for its intended use case. While critical bug fixes and security updates will be addressed, no major new features are planned. The project remains actively maintained and ready for use.

Local Setup

Prerequisites

Before setting up the project, ensure you have the following installed:

Python 3.10: This project requires Python 3.10 or higher
Conda: For managing the Python environment (Installation Guide)
API Keys: You'll need API keys for the following services:
- OpenAI API Key - For generating video summaries
- Webshare Proxy credentials - For accessing YouTube transcripts (username and password)

Installation

Clone the repository:

git clone https://github.com/greenbrettmichael/yt-digest.git
cd yt-digest

Create and activate the Conda environment:
```
conda env create -f environment.yaml
conda activate yt_digest
```
This will install all required dependencies including:
- scrapetube - For searching YouTube videos
- youtube-transcript-api - For fetching video transcripts
- openai - For generating AI-powered digests
- pytest and ruff - For testing and linting

Configure environment variables:

Copy the example environment file:

cp .env.example .env

Edit .env and add your credentials:

# Proxy configuration for YouTube Transcript API
PROXY_USERNAME=your_webshare_username
PROXY_PASSWORD=your_webshare_password

# OpenAI API Key
OPENAI_API_KEY=sk-your-openai-api-key

Troubleshooting

Conda environment creation fails

Ensure Conda is installed and updated: conda update conda
Try creating the environment with: conda env create -f environment.yaml --force

Proxy authentication errors

Verify your Webshare proxy credentials are correct in .env
Ensure your proxy subscription is active

OpenAI API errors

Check that your API key is valid and has available credits
Verify the model name in the code matches available models in your OpenAI account

For more detailed troubleshooting and advanced configuration, see the Advanced Usage Guide.

Basic Usage

Configuration Using queries.json

Create a queries.json file in the project root directory with the following structure:

[
    {
        "search_url": "https://www.youtube.com/results?search_query=python+tutorials&sp=EgIIAw%253D%253D"
    },
    {
        "channel_username": "LinusTechTips"
    },
    {
        "channel_id": "UC8butISFwT-Wl7EV0hUK0BQ"
    },
    {
        "channel_url": "https://www.youtube.com/@mkbhd"
    },
    {
        "channel_username": "ThePrimeagen",
        "search_url": "https://www.youtube.com/results?search_query=programming&sp=EgIIAw%253D%253D"
    }
]

Configuration File Format:

The file must be a JSON array of objects
Each object represents a video source query
Required fields for each entry:
- At least one video source (can have multiple):
  - search_url: Full YouTube search URL for keyword-based searches
  - channel_id: YouTube channel ID (e.g., "UC8butISFwT-Wl7EV0hUK0BQ")
  - channel_url: YouTube channel URL (e.g., "https://www.youtube.com/@mkbhd")
  - channel_username: YouTube channel username without @ (e.g., "LinusTechTips")

Using Channel Sources:

When you specify a channel (via channel_id, channel_url, or channel_username), the tool will:
- Query the channel for videos published in the last 24 hours
- Process transcripts for all videos found
- Include them in the RSS feed
You can specify multiple sources per query (e.g., both a channel and a search URL)
Channel videos are fetched using scrapetube.get_channel() sorted by newest first

How to Construct YouTube Search URLs:

Go to YouTube and perform your desired search
Apply any filters (upload date, duration, etc.)
Copy the complete URL from your browser's address bar
The URL should include the sp parameter for filters, e.g., sp=EgIIAw%253D%253D for videos uploaded this week

Finding Channel Identifiers:

Channel Username: The handle shown on the channel page (without the @), e.g., "LinusTechTips"
Channel URL: The full URL to the channel page, e.g., "https://www.youtube.com/@mkbhd"
Channel ID: Found in the page source or channel URL, e.g., "UC8butISFwT-Wl7EV0hUK0BQ"

Example: A queries.json.example file is provided in the repository for reference.

Running the Main Script

The project can be run directly using the main script:

python app.py

The application processes each entry in the configuration file
For each entry, it will:
1. Fetch videos from the last 24 hours from any specified channels
2. Fetch transcripts for videos matching any search URLs
3. Generate an AI summary for each video individually
4. Add each summary as a separate entry to the RSS feed
The RSS feed is written to feed.xml in the project root
Each execution overwrites the previous feed.xml with newly generated content
If any entry fails, the application logs the error and continues with the next entry
The tool logs the number of videos found and which channels were processed

RSS Feed Output

The generated feed.xml file:

Contains one RSS item per video summary
Includes video title, YouTube link, publication date, and AI-generated summary
Is compatible with standard RSS readers (Feedly, Inoreader, etc.)
Summaries are truncated to 10,000 characters if too long to protect against excessive size
The feed is completely regenerated on each run (previous entries are overwritten)

Core Functionality

The yt-digest tool provides several key functions:

Video Search and Transcript Extraction:
- Searches YouTube for videos by keyword
- Retrieves English transcripts (or falls back to other available languages)
- Handles videos with disabled or missing transcripts gracefully
AI-Powered Digest Generation:
- Uses OpenAI's GPT models to analyze transcripts
- Generates concise, structured summaries for each video
- Includes video titles, links, and key takeaways with timestamps
- Transcripts are truncated to 15,000 characters to handle large queries efficiently
RSS Feed Generation:
- Outputs all summaries to a single feed.xml file
- Each video gets its own RSS item entry
- Compatible with all standard RSS readers
- Summaries are truncated to 10,000 characters to prevent excessive size

Additional Documentation

Advanced Usage Guide - Customization options, advanced workflows, and programmatic usage examples
Development Guide - Setup instructions, code quality tools, and contribution guidelines
Deployment Guide - AWS Fargate deployment instructions for production use

Contributing

This project is in maintenance mode but we still welcome contributions for bug fixes and security updates. Please see the Development Guide for information on setting up your development environment and code quality standards.

License

This project is open source. Please refer to the repository for license information.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
ADVANCED_USAGE.md		ADVANCED_USAGE.md
DEPLOYMENT.md		DEPLOYMENT.md
DEVELOPMENT.md		DEVELOPMENT.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
app.py		app.py
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
queries.json.example		queries.json.example
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

yt-digest

Local Setup

Prerequisites

Installation

Troubleshooting

Basic Usage

Configuration Using queries.json

Running the Main Script

RSS Feed Output

Core Functionality

Additional Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

greenbrettmichael/yt-digest

Folders and files

Latest commit

History

Repository files navigation

yt-digest

Local Setup

Prerequisites

Installation

Troubleshooting

Basic Usage

Configuration Using queries.json

Running the Main Script

RSS Feed Output

Core Functionality

Additional Documentation

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages