Automated Google Slides → Narrated Videos → Final Course Video
course_pipeline is a deterministic, reproducible, dependency-aware video generation pipeline that converts a Google Slides deck into a fully narrated course video.
It supports:
- Human voice recording
- macOS native TTS (
say) - ElevenLabs high-quality TTS
- Incremental rebuilds
- Slide-level redo
- Fully ordered, dependency-driven execution using
doit
This is designed for course creators, educators, and AI-first content pipelines.
-
🔗 Google Slides → Video (single source of truth)
-
🧠 Dependency-aware pipeline (no reruns unless needed)
-
🔢 Strict numeric ordering (1 → N slides, no randomness)
-
🎙️ Multiple audio modes:
- Interactive human recording
- macOS
say - ElevenLabs API (studio-quality voices)
-
🎞️ Per-slide video generation
-
🎬 Final concatenation into
final.mp4 -
♻️ Incremental & resumable (crash-safe)
-
🧹 Redo individual slides easily
Google Slides URL
│
▼
extract_notes → notes/1.txt ... notes/N.txt
screenshot → screenshots/1.png ... N.png
audio / tts_* → audio/1.wav ... N.wav
video → videos/1.mp4 ... N.mp4
final → final.mp4
Each step is a doit task with explicit dependencies.
- Python 3.11+
- macOS (for
say; optional if using ElevenLabs) ffmpeg- Google Cloud credentials (Slides + Drive API)
Install ffmpeg:
brew install ffmpeggit clone https://github.com/<your-org>/course_pipeline.git
cd course_pipeline
python -m venv .venv
source .venv/bin/activate
pip install uv
uv syncEnable:
- Google Slides API
- Google Drive API
Create OAuth credentials and save as:
credentials.json
First run will open a browser for auth and cache tokens locally.
export ELEVENLABS_API_KEY="your_api_key_here"Edit settings in:
scripts/settings.py
Typical options:
- Video resolution (e.g.
1920x1080) - FPS
- Output directories
uv run doitThis will:
- Extract notes
- Take slide screenshots
- Generate audio
- Create per-slide videos
- Concatenate into
final.mp4
uv run doit audio-
Displays slide text
-
Records mic input
-
Options:
- Enter → save & continue
- r → re-record
- s → skip
uv run doit tts_slidesUses native say command.
uv run doit elevenlabs_tts- Studio-quality voices
- Handles empty slides safely
- Converts MP3 → WAV automatically
Force regeneration:
uv run doit elevenlabs_tts --forceGenerate per-slide videos:
uv run doit videoEach slide:
1.png + 1.wav → 1.mp4
Optimized for:
- QuickTime
- VLC
- YouTube compatibility
uv run doit finalResult:
build/<deck_id>/final.mp4
Force re-encode (if needed):
uv run doit final --reencodeRedo slide 12 completely:
uv run doit redo --slide 12
uv run doitRedo only audio:
uv run doit redo --slide 12 --what audiodoit listExample:
setup
extract_notes
screenshot
audio
tts_slides
elevenlabs_tts
video
final
redo
This pipeline does not rely on ad-hoc scripts.
doit gives:
- True dependency graphs
- Incremental builds
- Crash-safe resumption
- Deterministic ordering
This makes it suitable for large decks (100+ slides) and CI-like automation.
- Single source of truth: Google Slides
- Artifacts are explicit: text, image, audio, video
- No hidden state
- No silent reprocessing
- Human-friendly override at every stage
MIT License Free to use, modify, and distribute.
PRs welcome for:
- Windows/Linux TTS backends
- Speaker diarization
- Subtitle (SRT/VTT) generation
- Video transitions
- Parallel execution
- Google Slides API
- FFmpeg
- ElevenLabs
doittask runner