YouTube Video Downloader v0.1.2

A Python-based tool with a user-friendly interface for downloading YouTube videos, extracting high-quality audio tracks, and splitting audio files into smaller chunks with advanced overlap control.

Features

Easy-to-use interface for downloading videos and audio from YouTube
Real-time metadata display showing video information before download
Multiple format options for both video and audio downloads
Advanced audio splitting functionality with:
- Customizable chunk sizes
- Configurable overlap settings
- Detailed overlap validation
- Real-time splitting progress
Clean and organized file management
Audio transcription capabilities using whisper.cpp

New Feature: AI Analysis

The AI Analysis feature uses Ollama with the "mistral" model to analyze transcription quality. It focuses specifically on the joins between chunks to help identify the best chunking strategy for your transcriptions.

How It Works

The system analyzes each transcription chunk in context with its neighboring chunks
It evaluates 8 key criteria for transcription quality:
- Overall Accuracy
- Joint Smoothness
- Contextual Continuity
- Grammar Integrity
- Word Completeness
- Redundancy
- Content Loss
- Joint Readability
Each criterion is scored on a scale of 0-10
Detailed analysis is provided for each chunk showing the reasoning process
Summary reports are generated for each combination of chunk/overlap settings

Setup Requirements

Install Ollama: Follow the instructions at ollama.com
Start the Ollama service (important!):
- On macOS/Linux: Run ollama serve in a terminal
- On Windows: Ensure the Ollama application is running
Pull the models you want to use:
- For Mistral: ollama pull mistral
- For Llama3: ollama pull llama3
- For other models: ollama pull model_name
Install Python dependencies: pip install -r requirements.txt

Using the AI Analysis Feature

First prepare audio files using the "Prepare Audios" button
Transcribe the chunks using the "Transcribe Chunks" button
Make sure Ollama is running (check with ps aux | grep ollama)
Navigate to the "AI Analysis" tab
Select the Ollama model to use from the dropdown menu (options include "mistral", "llama3", "gemma2:9b", etc.)
Click the "Start Analysis" button to begin the analysis process
Monitor progress in the "Analysis Status" section
View detailed results in the "Analysis Results" section

Note: The analysis process requires that each combination has been transcribed first. Combinations that haven't been transcribed will be skipped during analysis. If you see "Skipped combinations" in the analysis summary, make sure to run the transcription process for those combinations before analyzing them.

Enhanced Features

The AI Analysis system includes several robust features:

Comprehensive Logging:
- Detailed logs for each analysis run
- Per-file analysis logs with timestamps
- Summary reports with statistics and recommendations
Error Handling:
- Automatic retry for Ollama API failures
- Graceful handling of missing files
- Recovery from incomplete transcriptions
- Detailed error reporting with error types
Detailed Analysis Reports:
- Summary files in both JSON and human-readable formats
- Individual chunk analysis with scores and reasoning
- Overall statistics and quality metrics
- Charts and visualizations of results
Metadata Management:
- Automatic creation of metadata.json if missing
- Recovery from combined_transcription.json
- Validation of transcription files

Understanding Analysis Results

Each chunk analysis provides:

Score for each criterion (0-10)
Step-by-step reasoning
Overall average score
Specific observations about the joint quality
Analysis duration and performance metrics

The system generates several output files:

detailed_analysis.log: Complete log of the analysis process
summary.json: Machine-readable summary of all results
summary.txt: Human-readable summary with recommendations
Individual chunk analysis files in both JSON and TXT formats

Analysis Metrics Explained

Overall Accuracy: How accurately the transcription captures the spoken content
Joint Smoothness: How well the chunks connect at their boundaries
Contextual Continuity: Whether the meaning flows naturally across chunk boundaries
Grammar Integrity: Grammatical correctness at chunk boundaries
Word Completeness: Whether words are complete at chunk boundaries
Redundancy: Repeated content across chunk boundaries
Content Loss: Missing content at chunk boundaries
Joint Readability: How readable the text is across chunk boundaries

Recommendations

The system provides automatic recommendations based on analysis results:

Suggestions for optimal chunk size and overlap settings
Identification of problematic combinations

Audio AI Analysis Visualizations

This script generates visualizations and a comprehensive report from your audio AI analysis data.

Setup

Make sure you have Python installed, then install the required dependencies:

pip install -r requirements.txt

Running the Script

Simply execute the visualization script:

python visualize_audio_analysis.py

Output

The script will generate visualizations in the visualizations/ directory:

processing_times.png - Bar charts of processing times by chunk and overlap lengths
parameter_heatmap.png - Heatmap showing performance of different chunk/overlap combinations
metrics_comparison.png - Comparison of different evaluation metrics
top_combinations.png - The top performing parameter combinations
chunk_overlap_scatter.png - Scatter plot showing relationship between chunk length, overlap, and scores
comprehensive_report.html - An HTML report integrating all visualizations with summary statistics

Understanding the Visualizations

Processing Times

Shows the average processing time for different chunk lengths and overlap lengths. This helps identify efficiency tradeoffs.

Parameter Heatmap

A color-coded grid showing how each combination of chunk length and overlap performs. Darker colors indicate better performance.

Top Combinations

Bar chart showing the highest-scoring parameter combinations, providing a quick view of which settings performed best.

Chunk vs Overlap Scatter Plot

Scatter plot showing each parameter combination, with chunk length on the x-axis, score on the y-axis, and color indicating overlap percentage.

Metrics Comparison

Comparison of different evaluation metrics across all parameter combinations, showing which aspects perform better or worse.

Customizing the Visualizations

To modify the visualizations, edit the visualize_audio_analysis.py file. You can adjust:

Colors and styles in the plotting functions
Number of top combinations to display (n parameter in plot_best_combinations())
HTML report layout and styling in the generate_html_report() function

Understanding Analysis Results

Each chunk analysis provides:

Score for each criterion (0-10)
Step-by-step reasoning
Overall average score
Specific observations about the joint quality
Analysis duration and performance metrics

The system generates several output files:

detailed_analysis.log: Complete log of the analysis process
summary.json: Machine-readable summary of all results
summary.txt: Human-readable summary with recommendations
Individual chunk analysis files in both JSON and TXT formats

Analysis Metrics Explained

Overall Accuracy: How accurately the transcription captures the spoken content
Joint Smoothness: How well the chunks connect at their boundaries
Contextual Continuity: Whether the meaning flows naturally across chunk boundaries
Grammar Integrity: Grammatical correctness at chunk boundaries
Word Completeness: Whether words are complete at chunk boundaries
Redundancy: Repeated content across chunk boundaries
Content Loss: Missing content at chunk boundaries
Joint Readability: How readable the text is across chunk boundaries

Recommendations

The system provides automatic recommendations based on analysis results:

Suggestions for optimal chunk size and overlap settings
Identification of problematic combinations

File Structure

./
├── audio/           # Directory for extracted audio files
├── audio_split/     # Directory for split audio chunks
├── audio_text/      # Directory for transcription outputs
├── download/        # Core downloading functionality
├── logs/            # Application logs
├── metadata/        # Stored video metadata files
├── split/           # Audio splitting functionality
├── transcribe/      # Audio transcription functionality
├── utils/           # Utility functions
├── video/           # Directory for downloaded videos
├── whisper.cpp/     # Whisper.cpp library for transcription
├── main.py          # Main application entry point
├── requirements.txt # Project dependencies
└── metadata_schema.md  # Detailed metadata schema documentation

Function Reference

Main Application Functions

Metadata Management

save_metadata(metadata): Saves YouTube video metadata to a JSON file in the 'metadata' directory.
summarize_metadata(metadata): Processes metadata and extracts relevant information for display, including video title, duration, upload date, and available formats.
grab_metadata(url): Fetches metadata for a YouTube URL and updates the Gradio UI components.

Download Module Functions

get_youtube_metadata(url): Fetches comprehensive metadata for a YouTube video URL using yt_dlp.
download_video(url): Downloads a YouTube video in the highest quality format available, automatically selecting the best resolution.
download_audio(url): Downloads and extracts high-quality audio from a YouTube video, converting it to WAV format with 16kHz sample rate and mono channel.

Audio Splitting Module Functions

ensure_directories(): Creates necessary directories for audio processing if they don't exist.
get_audio_files(): Returns a list of audio files from the audio directory.
clean_audio_split_directory(preserve_state=True): Removes all files from the audio_split directory, with an option to preserve the split state file.
validate_audio_file(file_path): Validates and locates an audio file, checking both the provided path and the audio directory.
validate_overlap_calculations(chunk_length_ms, overlap_ms, duration, total_chunks): Validates overlap calculations before processing chunks to ensure consistency.
split_audio(file_path, split_size, overlap_size=0): Splits an audio file into chunks of specified size with optional overlap.
verify_split_results(): Verifies the results of the audio splitting process.
fix_split_overlaps(): Fixes any issues with overlaps between audio chunks.
test_overlap_calculations(duration_seconds, chunk_sizes, overlap_sizes): Tests overlap calculations with various parameters.
run_overlap_test(duration, chunk_size, overlap_size): Runs a specific overlap test with given parameters.

Transcription Module Functions

TranscriptionService Class

__init__(model_path): Initializes the transcription service with a specified whisper.cpp model.
_verify_dependencies(): Verifies that the required dependencies (whisper.cpp binary and model) exist.
transcribe_chunk(audio_chunk, lang_code="en", initial_prompt=None): Transcribes an audio chunk using whisper.cpp with optional language code and initial prompt.

Utility Functions

Logger Class

__init__(name='audio_dataset'): Initializes the logger with console and file handlers.
clear_logs(): Clears the log file by truncating it to zero size.
debug(message), info(message), warning(message), error(message), critical(message): Log messages at different severity levels.

Cleanup Functions

cleanup_python_cache(): Removes all Python cache files and directories to keep the codebase clean.

User Interface Guide

Download Tab

YouTube Video/Audio Download

URL Input
- Paste a YouTube URL into the text box
- Click "Grab Information" to fetch video details
Metadata Display
- View comprehensive video information including:
  - Title
  - Duration
  - Upload date
  - Available video and audio formats
Download Options
- "Download Video": Downloads the highest quality video format
- "Download Audio": Extracts high-quality audio track

Split Audio Tab

Audio Processing

Input Options
- Upload a new audio file using the audio upload component
- Select an existing audio file from the dropdown menu
Split Settings
- Choose from preset split durations (10s, 30s, 45s, 60s, 100s)
- Select "Custom" for a specific duration in seconds
- Set overlap duration between chunks:
  - Choose from preset overlap durations
  - Enter custom overlap duration
  - Disable overlap with "None" option
Processing
- Click "Split Audio" to divide the file into chunks
- View detailed processing information:
  - Chunk timecodes and durations
  - Overlap validation results
  - Processing summary

Transcription Tab

Audio Transcription

Model Selection
- Choose from available whisper.cpp models
- Models vary in size and accuracy
Input Selection
- Select audio files to transcribe from the audio_split directory
- Batch processing of multiple files is supported
Transcription Options
- Set language code for transcription (default: English)
- Provide optional initial prompt to guide transcription
- Configure processing parameters
Output
- View transcription results in real-time
- Results are saved to the audio_text directory

Installation

Clone the repository
Install dependencies:
```
pip install -r requirements.txt
```
Set up whisper.cpp:
```
python setup_whisper.py
```
Run the application:
```
python main.py
```

Version History

v0.1.2 (Current)

Added customizable overlap settings for audio splitting
Implemented detailed overlap validation
Enhanced progress reporting with chunk information
Improved error handling and validation
Added transcription capabilities with whisper.cpp integration

v0.1.1

Initial release with user-friendly interface
Support for high-quality video and audio downloads
Implementation of audio splitting feature
Real-time metadata display
Organized file management

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
ai_optimize		ai_optimize
download		download
example_output		example_output
logs		logs
split		split
transcribe		transcribe
utils		utils
visualizations		visualizations
whisper.cpp		whisper.cpp
.gitignore		.gitignore
DOCUMENTATION.md		DOCUMENTATION.md
MULTI_BATCH_VISUALIZATION.md		MULTI_BATCH_VISUALIZATION.md
README.md		README.md
README_audio_analyzer.md		README_audio_analyzer.md
category_scores_comparison.py		category_scores_comparison.py
example_multi_batch_visualization.py		example_multi_batch_visualization.py
example_usage.py		example_usage.py
example_visualization.py		example_visualization.py
main.py		main.py
main_example.py		main_example.py
metadata_schema.md		metadata_schema.md
requirements.txt		requirements.txt
setup_whisper.py		setup_whisper.py
test_analyzer.py		test_analyzer.py
visualize_audio_analysis.py		visualize_audio_analysis.py

dreamworks2050/audio_dataset

Folders and files

Latest commit

History

Repository files navigation

YouTube Video Downloader v0.1.2

Features

New Feature: AI Analysis

How It Works

Setup Requirements

Using the AI Analysis Feature

Enhanced Features

Understanding Analysis Results

Analysis Metrics Explained

Recommendations

Audio AI Analysis Visualizations

Setup

Running the Script

Output

Understanding the Visualizations

Processing Times

Parameter Heatmap

Top Combinations

Chunk vs Overlap Scatter Plot

Metrics Comparison

Customizing the Visualizations

Understanding Analysis Results

Analysis Metrics Explained

Recommendations

File Structure

Function Reference

Main Application Functions

Metadata Management

Download Module Functions

Audio Splitting Module Functions

Transcription Module Functions

TranscriptionService Class

Utility Functions

Logger Class

Cleanup Functions

User Interface Guide

Download Tab

YouTube Video/Audio Download

Split Audio Tab

Audio Processing

Transcription Tab

Audio Transcription

Installation

Version History

v0.1.2 (Current)

v0.1.1

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages