🧠 Gemini Multimodal Embedding – Examples

Practical, runnable examples for Gemini Embedding 2 (gemini-embedding-2-preview) — the first fully multimodal embedding model that maps text, images, video, audio and PDFs into the same vector space.

✨ What's Inside

Script	Description
`01_text_embedding.py`	Single & batch text embedding, task types, 768D normalization
`02_video_embedding.py`	Inline upload, Files API, video+text cross-modal, long-video chunking
`03_multimodal_embedding.py`	Cross-modal search: find videos with a text query
`04_search.py`	Semantic search over saved embeddings
`05_describe.py`	Analyze a saved video embedding against 100 predefined topics to describe what the video is about

🚀 Quick Start

1. Clone & install dependencies

git clone https://github.com/YOUR_USERNAME/gemini-multimodal-embedding-examples.git
cd gemini-multimodal-embedding-examples

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

pip install -r requirements.txt

2. Set up your API key

cp .env.example .env
# Open .env and paste your Google AI API key

Get a free API key at → aistudio.google.com/app/apikey

3. (Optional) Get a sample video

curl -L "https://www.w3schools.com/html/mov_bbb.mp4" -o sample.mp4

📖 Usage

Text embedding

python 01_text_embedding.py

Demonstrates:

Single text → 3072-dimensional vector
Batch embedding (multiple texts in one call)
Task-type-aware embeddings (RETRIEVAL_QUERY vs RETRIEVAL_DOCUMENT)
Output dimensionality reduction (3072 → 768) with L2 normalization

Video embedding

python 02_video_embedding.py sample.mp4

Three different strategies:

Inline – reads the file as bytes (best for small videos < 10 MB)
Files API – uploads the video first (recommended for large files)
Video + Text – combines video bytes with a textual description into a single embedding

Embeddings are saved as JSON in the embeddings/ folder for later reuse.

Cross-modal search

python 03_multimodal_embedding.py

Embeds multiple videos and a text query, then ranks the videos by cosine similarity to the query — no separate index needed.

Semantic search over saved embeddings

python 04_search.py

Loads previously saved embedding JSON files and performs nearest-neighbour search against a text query.

Describe a video

python 05_describe.py embeddings/<your_video_embedding>.json
python 05_describe.py embeddings/<your_video_embedding>.json --top 10

Compares the video embedding against 100 predefined topic phrases across categories like food, sports, nature, technology, and emotions. Outputs a ranked list with a visual score bar.

=================================================================
  This video is most likely about (Top 8):
=================================================================
  #1   72.3%  [██████████████████░░░░░░░]  yemek yeme, restoranda yemek
  #2   68.1%  [█████████████████░░░░░░░░]  fast food yeme
  ...
=================================================================

🔬 Model Reference

Property	Value
Model ID	`gemini-embedding-2-preview`
Supported inputs	Text, Image (PNG/JPEG), Video (MP4/MOV), Audio (MP3/WAV), PDF
Max input	8,192 tokens
Default output dims	3,072
Configurable dims	128 – 3,072
Recommended dims	768, 1,536, 3,072

Task Types

Task Type	Use When
`RETRIEVAL_QUERY`	Embedding a search query
`RETRIEVAL_DOCUMENT`	Embedding documents to be indexed
`SEMANTIC_SIMILARITY`	Comparing two pieces of content
`CLASSIFICATION`	Sentiment analysis, topic classification
`CLUSTERING`	Grouping similar content
`QUESTION_ANSWERING`	The question side of a QA system
`CODE_RETRIEVAL_QUERY`	Code search queries

⚠️ Important Notes

Normalization – The default 3,072D output is already normalized. If you reduce to 768D or 1,536D, apply L2 normalization manually.
Video limit – Each embedding call accepts at most 128 seconds of video. Use the chunking helper in 02_video_embedding.py for longer videos.
Model incompatibility – Embeddings from gemini-embedding-001 and gemini-embedding-2-preview live in different vector spaces. If you upgrade, you must re-embed all your data.
Cost tip – The Batch API offers up to 50% discount if latency is not critical.

📁 Project Structure

.
├── 01_text_embedding.py       # Text embedding examples
├── 02_video_embedding.py      # Video embedding (inline / Files API / cross-modal)
├── 03_multimodal_embedding.py # Cross-modal text → video search
├── 04_search.py               # Semantic search over saved embeddings
├── 05_describe.py             # Video content analysis via topic matching
├── embeddings/                # Saved embedding JSON files (git-ignored)
├── requirements.txt
├── .env.example
└── README.md

🤝 Contributing

Pull requests are welcome! Feel free to:

Add more modalities (images, audio, PDFs)
Improve the topic list in 05_describe.py
Add vector database integration examples (Pinecone, Qdrant, pgvector…)

📄 License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Gemini Multimodal Embedding – Examples

✨ What's Inside

🚀 Quick Start

1. Clone & install dependencies

2. Set up your API key

3. (Optional) Get a sample video

📖 Usage

Text embedding

Video embedding

Cross-modal search

Semantic search over saved embeddings

Describe a video

🔬 Model Reference

Task Types

⚠️ Important Notes

📁 Project Structure

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
01_text_embedding.py		01_text_embedding.py
02_video_embedding.py		02_video_embedding.py
03_multimodal_embedding.py		03_multimodal_embedding.py
04_search.py		04_search.py
05_describe.py		05_describe.py
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 Gemini Multimodal Embedding – Examples

✨ What's Inside

🚀 Quick Start

1. Clone & install dependencies

2. Set up your API key

3. (Optional) Get a sample video

📖 Usage

Text embedding

Video embedding

Cross-modal search

Semantic search over saved embeddings

Describe a video

🔬 Model Reference

Task Types

⚠️ Important Notes

📁 Project Structure

🤝 Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages