A practical tutorial project demonstrating how to replace traditional database queries with semantic vector search in a realistic movie database application scenario.
This repository accompanies an article about implementing vector search in Laravel applications. It showcases a real-world use case: building a movie search API that goes beyond simple keyword matching to understand the semantic meaning of search queries.
Key Learning Objectives:
- Integrate MongoDB Atlas Vector Search with Laravel
- Implement semantic search using embeddings (Voyage AI)
- Build a movie search API
- Framework: Laravel 12
- Database: MongoDB Atlas (cloud-hosted, sample_mflix database)
- Vector Embeddings: Voyage AI (voyage-3-lite model, 512 dimensions)
- Search Technology: MongoDB Atlas Vector Search (semantic search)
- MongoDB Atlas integration with Laravel (sample_mflix database)
- Movie model with MongoDB Eloquent
- Voyage AI service integration (voyage-3-lite model)
- Vector embeddings generation via CLI
- Vector search index creation (512 dimensions, cosine similarity)
- Semantic search endpoint with query vectorization
Note: the exemples below assume you're running on the default port 8000 when you develop locally. Adapt the URLs to your environment.
| Endpoint | Method | Status | Description |
|---|---|---|---|
/api/hello |
GET | ✅ | Test endpoint to verify API routing |
Example Request:
curl http://localhost:8000/api/helloExample Response:
{
"response": "hello world"
}| Endpoint | Method | Status | Description |
|---|---|---|---|
/api/mongodb-test |
GET | ✅ | Test MongoDB Atlas connection and display database info |
Example Request:
curl http://localhost:8000/api/mongodb-testExample Response:
{
"status": "success",
"connection": "MongoDB connection successful",
"database": "sample_mflix",
"collections_found": 6,
"collections": ["users", "embedded_movies", "movies", "sessions", "theaters", "comments"],
"movies_collection": {
"exists": true,
"document_count": 21349
}
}| Endpoint | Method | Status | Description |
|---|---|---|---|
/api/get-movie-by-title/{title} |
GET | ✅ | Retrieve movie by exact title match |
Example Request:
curl http://localhost:8000/api/get-movie-by-title/TitanicExample Response:
{
"_id": {"$oid": "573a139af29313caabcebf1b"},
"title": "Titanic",
"year": 1996,
"plot": "A woman's heart is divided between love and duty...",
"genres": ["Drama", "Romance"],
"cast": ["Peter Gallagher", "George C. Scott", "Catherine Zeta-Jones"],
"directors": ["Robert Lieberman"]
}| Endpoint | Method | Status | Description |
|---|---|---|---|
/api/embedding-model-info |
GET | ✅ | Test Voyage AI connection and get model info |
Example Request:
curl http://localhost:8000/api/embedding-model-infoExample Response:
{
"status": "connected",
"model": "voyage-3-lite",
"embedding_dimensions": 512,
"api_response": {
"model": "voyage-3-lite",
"usage": {
"total_tokens": 2
}
},
"configured": true
}| Endpoint | Method | Status | Description |
|---|---|---|---|
/api/embedding-model-vectorize/{input} |
GET | ✅ | Generate embedding for a single text input |
Example Request:
curl http://localhost:8000/api/embedding-model-vectorize/adventureExample Response:
{
"input": "adventure",
"embedding": [0.123, -0.456, 0.789, ...],
"embedding_dimensions": 512,
"model": "voyage-3-lite",
"usage": {
"total_tokens": 1
}
}| Endpoint | Method | Status | Description |
|---|---|---|---|
/api/movie-search-vector |
POST | ✅ | Semantic search using vector embeddings |
Example Request:
curl -X POST http://localhost:8000/api/movie-search-vector \
-H "Content-Type: application/json" \
-d '{"query": "outlaws on the run from law enforcement"}'Example Response:
{
"query": "outlaws on the run from law enforcement",
"results": [
{
"_id": {"$oid": "573a1390f29313caabcd42e8"},
"title": "The Great Train Robbery",
"plot": "A group of bandits stage a brazen train hold-up...",
"score": 0.8234567
}
],
"count": 10,
"embedding_model": "voyage-3-lite",
"vector_dimensions": 512
}Generate vector embeddings for movies using Voyage AI:
# Generate embeddings for movies without embeddings (up to 100)
php artisan embeddings:generate
# Force regenerate embeddings for all movies (up to 100)
php artisan embeddings:generate --force
# Generate embeddings for a specific number of movies
php artisan embeddings:generate --limit=20Command Details:
- Location: app/Console/Commands/GenerateEmbeddings.php
- Batch Processing: Processes 10 movies at a time to avoid API timeouts
- Safety Limit: Hard limit of 100 movies per invocation to prevent high API costs
- Text Preparation: Combines movie title and plot for embedding generation
- Progress Tracking: Shows progress bar with current movie being processed
Flags:
--force: Regenerate embeddings even if they already exist--limit=N: Process only N movies (subject to 100 max safety limit)
Delete all embeddings from the movies collection (useful for debugging):
# Delete embeddings with confirmation prompt
php artisan embeddings:delete
# Delete embeddings without confirmation
php artisan embeddings:delete --forceCommand Details:
- Location: app/Console/Commands/DeleteEmbeddings.php
- Operation: Uses MongoDB
$unsetoperation to remove embeddings field - Verification: Checks remaining embeddings after deletion
Create MongoDB Atlas Vector Search index for the movies collection:
# Create vector index (checks if already exists)
php artisan vector:create-index
# Force recreate index (deletes existing index first)
php artisan vector:create-index --forceCommand Details:
- Location: app/Console/Commands/CreateVectorIndex.php
- Index Configuration: Uses environment variables for dimensions (512) and similarity (cosine)
- Smart Detection: Checks for existing index before creating
- Force Mode: With
--forceflag, deletes existing index and creates new one - Wait Logic: Waits up to 30 seconds for index deletion to propagate in MongoDB Atlas
Flags:
--force: Delete existing index and create a new one
Vector search enables semantic understanding of queries, finding relevant results based on meaning rather than exact keyword matches. Here's how it works in this project:
What: Convert your text data into numerical vectors that capture semantic meaning.
How: Use the Voyage AI service to generate 512-dimensional embeddings from movie titles and plots.
Code Location: app/Console/Commands/GenerateEmbeddings.php
Key Implementation Details:
-
Text Preparation (GenerateEmbeddings.php:183-200):
private function prepareMovieText(Movie $movie): string { $parts = []; if (!empty($movie->title)) { $parts[] = "Title: {$movie->title}"; } if (!empty($movie->fullplot)) { $parts[] = "Plot: {$movie->fullplot}"; } elseif (!empty($movie->plot)) { $parts[] = "Plot: {$movie->plot}"; } return implode("\n", $parts); }
-
Batch Processing (GenerateEmbeddings.php:102-154):
- Processes 10 movies at a time using Laravel's
chunk()method - Calls Voyage AI API with batch of texts
- Updates each movie document with its embedding array
- Processes 10 movies at a time using Laravel's
-
Voyage AI Service (app/Services/VoyageAIService.php):
- Centralized API communication
- Uses
voyage-3-litemodel (512 dimensions) - Method:
generateEmbeddings()(supports single or batch processing)
Run the command:
php artisan embeddings:generate --limit=100What: Create a MongoDB Atlas Vector Search index that enables efficient similarity searches.
How: Use the CLI command to configure an index with the correct dimensions (512) and similarity function (cosine).
Code Location: app/Console/Commands/CreateVectorIndex.php
Key Implementation Details:
-
Index Configuration (CreateVectorIndex.php:42-46):
// Get vector configuration from environment $vectorDimensions = (int) env('VECTOR_DIMENSIONS', 512); $vectorSimilarity = env('VECTOR_SIMILARITY', 'cosine');
-
Collection Access (CreateVectorIndex.php:48-50):
// Get the MongoDB collection instance via Laravel DB facade $connection = DB::connection('mongodb'); $collection = $connection->getCollection('movies');
-
Index Creation (CreateVectorIndex.php:62-79):
$result = $collection->createSearchIndex( [ 'fields' => [ [ 'type' => 'vector', 'path' => 'embeddings', 'numDimensions' => $vectorDimensions, 'similarity' => $vectorSimilarity ] ] ], [ 'name' => $indexName, 'type' => 'vectorSearch' ] );
Run the command:
php artisan vector:create-indexEnvironment Variables (.env):
VECTOR_DIMENSIONS=512
VECTOR_SIMILARITY=cosineWhat: Convert search queries into vectors and find similar movies using semantic similarity.
How: Use Voyage AI to vectorize the query, then perform MongoDB vector search using Laravel Eloquent.
Code Location: app/Http/Controllers/MovieSearchVectorController.php
Key Implementation Details:
-
Query Vectorization (MovieSearchVectorController.php:33-42):
// Generate embedding for the query using VoyageAI $result = $voyageAI->generateEmbeddings([$query]); if (!$result['success']) { return response()->json([ 'error' => 'Failed to generate query embedding', 'message' => $result['error'] ], 500); } $queryVector = $result['embeddings'][0]['embedding'];
-
Vector Search Using Eloquent (MovieSearchVectorController.php:45-51):
// Perform vector search using Eloquent method $results = Movie::vectorSearch( index: config('vector.index.name'), path: config('vector.field_path'), queryVector: $queryVector, limit: config('vector.search.limit'), numCandidates: config('vector.search.num_candidates') );
-
Result Formatting (MovieSearchVectorController.php:54-67):
// Format results with score and selected fields $formattedResults = $results->map(function ($movie) { return [ '_id' => ['$oid' => (string) $movie->_id], 'title' => $movie->title, 'plot' => $movie->plot, 'fullplot' => $movie->fullplot, 'genres' => $movie->genres, 'year' => $movie->year, 'cast' => $movie->cast, 'directors' => $movie->directors, 'poster' => $movie->poster, 'score' => $movie->vectorSearchScore ]; });
Run a search:
curl -X POST http://localhost:8000/api/movie-search-vector \
-H "Content-Type: application/json" \
-d '{"query": "outlaws on the run from law enforcement"}'How It Works:
- User sends a natural language query (e.g., "outlaws on the run")
- Query is vectorized using Voyage AI (same model as data embeddings)
- MongoDB compares query vector to all movie embeddings using cosine similarity
- Returns top 10 most similar movies with relevance scores
- Movies are ranked by semantic similarity, not keyword matching
- Add pagination for search results
- Add filtering by genre, year, cast
- Create CRUD endpoints for movie management
- Add rate limiting and authentication
- Performance optimization and caching
- Implement query result ranking and relevance tuning
- PHP 8.2+
- Composer
- MongoDB PHP extension (
pecl install mongodb) - MongoDB Atlas account with sample_mflix database loaded
- Voyage AI API key (get one at voyageai.com)
- Clone the repository
git clone <repository-url>
cd laravel-books-retrieval-api-tutorial- Install dependencies
composer install- Configure environment
cp .env.example .env
php artisan key:generate- Add MongoDB Atlas and Voyage AI credentials to
.env
DB_CONNECTION=mongodb
DB_DSN=mongodb+srv://username:password@cluster.mongodb.net/sample_mflix?retryWrites=true&w=majority
DB_DATABASE=sample_mflix
VOYAGE_AI_API_KEY=your_voyage_ai_api_key_here
VECTOR_DIMENSIONS=512
VECTOR_SIMILARITY=cosine- Start development server
php artisan serve- Test the API
curl http://localhost:8000/api/hello- Generate embeddings for movies
php artisan embeddings:generate --limit=100- Create vector search index
php artisan vector:create-index- Try a semantic search
curl -X POST http://localhost:8000/api/movie-search-vector \
-H "Content-Type: application/json" \
-d '{"query": "outlaws on the run from law enforcement"}'No migrations needed! If you're coming from Laravel with SQL databases, you might be looking for migration files. MongoDB is schema-less, so you don't need to run migrations. The sample_mflix database already exists in MongoDB Atlas with the movie data. Just connect and start querying!
Other differences from SQL-based Laravel:
- Models extend
MongoDB\Laravel\Eloquent\Modelinstead of standardEloquent\Model - No migration files or
php artisan migrateneeded - Use
.envforDB_DSN(connection string) instead of separate host/port/database variables - Session, cache, and queue should use
fileorarraydrivers (notdatabase)
All vector search parameters are configured in config/vector.php. Most have sensible defaults, but you can override them in your .env file:
# Collection and field configuration
MONGODB_COLLECTION=movies # Default: movies
VECTOR_FIELD_PATH=embeddings # Default: embeddings
# Vector index configuration
VECTOR_INDEX_NAME=movies_vector_index # Default: movies_vector_index
VECTOR_DIMENSIONS=512 # Must match your embedding model
VECTOR_SIMILARITY=cosine # Options: cosine, euclidean, dotProduct
VECTOR_INDEX_DELETE_WAIT_TIME=30 # Seconds to wait for index deletion
VECTOR_INDEX_DELETE_WAIT_INTERVAL=2 # Check interval during deletion
# Embedding generation configuration
EMBEDDING_BATCH_SIZE=10 # Movies processed per batch
EMBEDDING_SAFETY_LIMIT=100 # Max movies per command invocation
EMBEDDING_BATCH_DELAY_MS=100 # Delay between batches (rate limiting)
# Vector search query configuration
VECTOR_SEARCH_LIMIT=10 # Number of results to return
VECTOR_SEARCH_NUM_CANDIDATES=100 # Candidates to consider during searchImportant: VECTOR_DIMENSIONS must match your embedding model:
- Voyage AI
voyage-3-lite: 512 dimensions (this project's default) - Other models: Check their documentation for dimensions
For Production:
- Increase
EMBEDDING_SAFETY_LIMITif you need to process more movies (but watch API costs!) - Adjust
EMBEDDING_BATCH_DELAY_MSif you hit Voyage AI rate limits - Increase
VECTOR_SEARCH_LIMITfor more search results per query
For Development:
- Keep defaults - they're optimized for tutorial usage
- Use
--limitflag when testing:php artisan embeddings:generate --limit=10
Problem: Failed to connect to MongoDB or connection timeouts
Solutions:
- Verify your MongoDB Atlas cluster is running (check atlas.mongodb.com)
- Ensure your IP address is whitelisted in MongoDB Atlas Network Access
- Check your
.envconnection string format:DB_DSN=mongodb+srv://username:password@cluster.mongodb.net/sample_mflix?retryWrites=true&w=majority
- If username/password contain special characters, URL-encode them
- Verify
sample_mflixdatabase is loaded (it's a free sample dataset in Atlas)
Problem: VOYAGE_AI_API_KEY is not set or API errors
Solutions:
- Get a free API key from voyageai.com
- Add it to
.env:VOYAGE_AI_API_KEY=pa-xxxxxxxxxxxxx
- Test the connection:
curl http://localhost:8000/api/embedding-model-info
Problem: Rate limiting or quota errors
Solutions:
- Reduce
EMBEDDING_BATCH_SIZEin.env(try 5 instead of 10) - Increase
EMBEDDING_BATCH_DELAY_MSto 500 or 1000 - Use
--limitflag to process fewer movies at once
Problem: Search returns no results or errors
Solutions:
- Verify embeddings exist: Check a movie in MongoDB Atlas - does it have an
embeddingsfield? - Check index status: Run
php artisan vector:create-index- it should show "already exists" if working - Wait for index: MongoDB Atlas indexes can take a few minutes to become active after creation
- Verify dimensions match: Check
.envhasVECTOR_DIMENSIONS=512(must match Voyage AI model) - Test with simple query: Try
{"query": "adventure"}first before complex queries
Problem: Index not found error
Solution: The vector index hasn't been created yet or is still building:
# Check if index exists
php artisan vector:create-index
# If it says "already exists", wait 2-3 minutes for MongoDB Atlas to build it
# Then try your search againProblem: embeddings:generate command times out or stops
Solutions:
- Use
--limitflag to process fewer movies:php artisan embeddings:generate --limit=20 - The safety limit caps at 100 movies per run - this is intentional to prevent high API costs
- To process more, increase
EMBEDDING_SAFETY_LIMITin.env(carefully!)
Problem: Tests fail with MongoDB connection errors
Solution: The phpunit.xml is configured to use MongoDB for testing. Ensure your MongoDB Atlas connection is active when running tests.
- routes/api.php - API endpoint definitions
- app/Models/Movie.php - MongoDB Movie model
- app/Services/VoyageAIService.php - Voyage AI API integration
- app/Console/Commands/GenerateEmbeddings.php - CLI for generating embeddings
- app/Console/Commands/DeleteEmbeddings.php - CLI for deleting embeddings
- app/Console/Commands/CreateVectorIndex.php - CLI for creating vector search index
- config/database.php - MongoDB configuration
- config/vector.php - Vector search configuration (NEW)
Movies collection structure (sample_mflix database):
_id: MongoDB ObjectId (primary key)title: Movie titleplot: Short plot summaryfullplot: Full plot descriptiongenres: Array of genre stringscast: Array of actor namesdirectors: Array of director namesyear: Release yearruntime: Movie runtime in minutesrated: MPAA ratingimdb: Embedded document with rating, votes, idtomatoes: Embedded document with Rotten Tomatoes dataembeddings: Vector embeddings (512 dimensions) - generated by this project
This project demonstrates how to implement semantic vector search in a practical movie database application context. By following along, you'll learn:
- How to implement vector search with Laravel & MongoDB
- Best practices for MongoDB Atlas Vector Search integration
- Vector embedding generation workflows with Voyage AI
- Building semantic search that understands meaning, not just keywords
- Example of batch processing for embedding generation
These search queries demonstrate the power of semantic/vector search by finding relevant movies without using obvious keywords from titles or plots. They showcase how the search understands meaning, concepts, and themes rather than just matching exact words.
- Finds: The Great Train Robbery (1903)
- Why it works: Understands "outlaws" = bandits, "law enforcement" = posse/sheriff
- Keywords NOT used: train, robbery, western
- Finds: A Corner in Wheat (1909)
- Why it works: Identifies economic injustice themes
- Keywords NOT used: wheat, tycoon, bread, poverty
- Finds: Winsor McCay films, Gertie the Dinosaur (1914)
- Why it works: Conceptually understands animation = bringing drawings to life
- Keywords NOT used: animation, cartoon, dinosaur
- Finds: Traffic in Souls (1913)
- Why it works: Understands trafficking = kidnapping/prostitution ring
- Keywords NOT used: prostitution, kidnapping, crime
- Finds: Gertie the Dinosaur (1914)
- Why it works: "prehistoric creature" = dinosaur concept
- Keywords NOT used: dinosaur, animation, McCay
- Finds: Little Caesar (1931), Scarface (1932)
- Why it works: Identifies gangster rise narrative arc
- Keywords NOT used: gangster, mob, bootlegger, mafia
- Finds: The Public Enemy (1931), Little Caesar (1931)
- Why it works: "alcohol smuggling" = bootlegging concept
- Keywords NOT used: bootlegging, prohibition specific terms
- Finds: A Woman of Paris (1923), The Italian (1915), Morocco (1930)
- Why it works: Identifies class-based romantic barriers
- Keywords NOT used: specific character names, locations
- Finds: Laugh, Clown, Laugh (1928), He Who Gets Slapped (1924)
- Why it works: Understands emotional state + profession
- Keywords NOT used: clown, circus, slapped
- Finds: The Front Page (1931)
- Why it works: "newsroom" = newspaper office, "deadline pressure" = journalism stress
- Keywords NOT used: newspaper, reporter, journalist, press
- Finds: Civilization (1916), Four Sons (1928), Wings (1927)
- Why it works: Identifies selfless/heroic narrative themes
- Keywords NOT used: war, soldier, battle
12. "identity hidden by disguise"
- Finds: Robin Hood (1922), Shanghai Express (1932)
- Why it works: Understands concealment and dual identity themes
- Keywords NOT used: mask, outlaw, secret
- Finds: Regeneration (1915), The Sin of Madelon Claudet (1931)
- Why it works: Identifies salvation/transformation narrative
- Keywords NOT used: redemption, salvation, reform
- Finds: Modern Times (1936), À Nous la Liberté (1931)
- Why it works: David vs. Goliath narrative structure
- Keywords NOT used: factory, worker, industrial
- Finds: Wings (1927), Four Sons (1928)
- Why it works: Identifies relationship + external pressure themes
- Keywords NOT used: war, friends, aviator
- Finds: Shanghai Express (1932), Red Dust (1932)
- Why it works: Understands geographic + mysterious atmosphere
- Keywords NOT used: Shanghai, China, Asia, train
- Finds: The Big House (1930), Mädchen in Uniform (1931)
- Why it works: Identifies restricted/controlled environments
- Keywords NOT used: prison, jail, boarding school
- Finds: The Big Trail (1930), The Iron Horse (1924)
- Why it works: American frontier movement concept
- Keywords NOT used: wagon, pioneer, frontier, west
- Finds: Tabu (1931), Red Dust (1932)
- Why it works: Idyllic setting disrupted by conflict
- Keywords NOT used: island, South Pacific, plantation
- Finds: Wings (1927), Four Sons (1928), All Quiet on the Western Front (1930)
- Why it works: Identifies wartime camaraderie theme
- Keywords NOT used: war, soldier, WWI, battle
These semantic search queries successfully find relevant movies because:
- Synonym Recognition: The search understands "outlaws" = bandits, "law enforcement" = sheriff/posse
- Conceptual Understanding: "Bringing drawings to life" is semantically understood as animation
- Thematic Matching: Abstract concepts like "wealth inequality" match plot themes about economic injustice
- Emotional States: "Broken heart" finds stories with emotional suffering and loss
- Professional Contexts: "Newsroom chaos" correctly identifies journalism/media settings
- Narrative Structures: "Rags to riches" identifies character progression arcs
- Atmospheric Understanding: "Tropical paradise turns complicated" finds settings + conflict patterns
Built on Laravel 12 - a web application framework with expressive, elegant syntax. Learn more at laravel.com
To verify functionality and spot regressions during refactoring, this project includes a test suite covering vector configuration, API endpoints, and CLI commands.
Run tests with:
php artisan testThe test suite includes:
- Unit tests for vector configuration and Voyage AI service
- Feature tests for all API endpoints
- Feature tests for CLI commands
Note: Tests require an active MongoDB Atlas connection as configured in phpunit.xml.
This project is licensed under the Apache License, Version 2.0.