Practical, runnable examples for Gemini Embedding 2 (
gemini-embedding-2-preview) — the first fully multimodal embedding model that maps text, images, video, audio and PDFs into the same vector space.
| Script | Description |
|---|---|
01_text_embedding.py |
Single & batch text embedding, task types, 768D normalization |
02_video_embedding.py |
Inline upload, Files API, video+text cross-modal, long-video chunking |
03_multimodal_embedding.py |
Cross-modal search: find videos with a text query |
04_search.py |
Semantic search over saved embeddings |
05_describe.py |
Analyze a saved video embedding against 100 predefined topics to describe what the video is about |
git clone https://github.com/YOUR_USERNAME/gemini-multimodal-embedding-examples.git
cd gemini-multimodal-embedding-examples
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtcp .env.example .env
# Open .env and paste your Google AI API keyGet a free API key at → aistudio.google.com/app/apikey
curl -L "https://www.w3schools.com/html/mov_bbb.mp4" -o sample.mp4python 01_text_embedding.pyDemonstrates:
- Single text → 3072-dimensional vector
- Batch embedding (multiple texts in one call)
- Task-type-aware embeddings (
RETRIEVAL_QUERYvsRETRIEVAL_DOCUMENT) - Output dimensionality reduction (3072 → 768) with L2 normalization
python 02_video_embedding.py sample.mp4Three different strategies:
- Inline – reads the file as bytes (best for small videos < 10 MB)
- Files API – uploads the video first (recommended for large files)
- Video + Text – combines video bytes with a textual description into a single embedding
Embeddings are saved as JSON in the embeddings/ folder for later reuse.
python 03_multimodal_embedding.pyEmbeds multiple videos and a text query, then ranks the videos by cosine similarity to the query — no separate index needed.
python 04_search.pyLoads previously saved embedding JSON files and performs nearest-neighbour search against a text query.
python 05_describe.py embeddings/<your_video_embedding>.json
python 05_describe.py embeddings/<your_video_embedding>.json --top 10Compares the video embedding against 100 predefined topic phrases across categories like food, sports, nature, technology, and emotions. Outputs a ranked list with a visual score bar.
=================================================================
This video is most likely about (Top 8):
=================================================================
#1 72.3% [██████████████████░░░░░░░] yemek yeme, restoranda yemek
#2 68.1% [█████████████████░░░░░░░░] fast food yeme
...
=================================================================
| Property | Value |
|---|---|
| Model ID | gemini-embedding-2-preview |
| Supported inputs | Text, Image (PNG/JPEG), Video (MP4/MOV), Audio (MP3/WAV), PDF |
| Max input | 8,192 tokens |
| Default output dims | 3,072 |
| Configurable dims | 128 – 3,072 |
| Recommended dims | 768, 1,536, 3,072 |
| Task Type | Use When |
|---|---|
RETRIEVAL_QUERY |
Embedding a search query |
RETRIEVAL_DOCUMENT |
Embedding documents to be indexed |
SEMANTIC_SIMILARITY |
Comparing two pieces of content |
CLASSIFICATION |
Sentiment analysis, topic classification |
CLUSTERING |
Grouping similar content |
QUESTION_ANSWERING |
The question side of a QA system |
CODE_RETRIEVAL_QUERY |
Code search queries |
- Normalization – The default 3,072D output is already normalized. If you reduce to 768D or 1,536D, apply L2 normalization manually.
- Video limit – Each embedding call accepts at most 128 seconds of video. Use the chunking helper in
02_video_embedding.pyfor longer videos. - Model incompatibility – Embeddings from
gemini-embedding-001andgemini-embedding-2-previewlive in different vector spaces. If you upgrade, you must re-embed all your data. - Cost tip – The Batch API offers up to 50% discount if latency is not critical.
.
├── 01_text_embedding.py # Text embedding examples
├── 02_video_embedding.py # Video embedding (inline / Files API / cross-modal)
├── 03_multimodal_embedding.py # Cross-modal text → video search
├── 04_search.py # Semantic search over saved embeddings
├── 05_describe.py # Video content analysis via topic matching
├── embeddings/ # Saved embedding JSON files (git-ignored)
├── requirements.txt
├── .env.example
└── README.md
Pull requests are welcome! Feel free to:
- Add more modalities (images, audio, PDFs)
- Improve the topic list in
05_describe.py - Add vector database integration examples (Pinecone, Qdrant, pgvector…)
MIT