Evaluate Your Geospatial Models Vibes

Yeah benchmarks are cool and stuff, but how are your model's vibes? With this tooling you'll hopefully be able to see via the magic of search/retrieval from your laptop!

This repo was originally inspired by the Earth Genome notebook tooling. GeoVibes supports comparison of multiple embeddings models through nearest neighbors search. It supports Google's AlphaEarth embeddings as well as other benchmarks such as ViT and Clay.

Highly experimental. This repo is not production-grade code.

Installation

GeoVibes can be installed using uv.

# 1) Install uv (if not already)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2) Create and activate a virtual environment
uv venv .venv
source .venv/bin/activate

# 3) Install GeoVibes in editable mode
uv pip install -e .

# 4) Register a Jupyter kernel for the notebook app
python -m ipykernel install --user --name geovibes --display-name "Python (geovibes)"

Quick Start

Prepare Local Data

With your virtual environment active, run the interactive downloader to fetch geometries and DuckDB/FAISS bundles listed in manifest.csv:

uv run download_embeddings.py

The script lets you select regions to download, stores geometries in geometries/, and extracts the model artifacts into local_databases/. You can rerun it at any time; previously downloaded files are skipped. We recommend starting with a small database: for example a google database, or a quantized one.

Configuration

YAML Configuration (Recommended)

Create a config.yaml file to configure GeoVibes. Paths to DuckDB databases and boundaries are optional—GeoVibes now discovers downloaded models automatically via manifest.csv:

# Basemap date range
start_date: "2024-01-01"
end_date: "2025-01-01"

Environment Variables

The main basemap relies on MapTiler. You need to create a MapTiler account and get an API key in order for this to work. Create a .env file in the repository root for sensitive configuration:

# Required for MapTiler satellite basemaps
MAPTILER_API_KEY=your_maptiler_api_key_here

Notebook (`vibe_checker.ipynb`)

Launch Jupyter Lab (or Notebook) and open vibe_checker.ipynb:

uv run jupyter lab

When the notebook opens, select the Python (geovibes) kernel registered during installation. The notebook loads the complete GeoVibes UI and expects the assets downloaded by download_embeddings.py in local_databases/ and geometries/.

Features

Multiple basemaps: MapTiler satellite, Sentinel-2 RGB/NDVI/NDWI composites, Google Hybrid maps
Flexible labeling: Point-click and polygon selection for positive/negative examples
Iterative search: Query vector updates with each labeling iteration using 2×positive_avg - negative_avg
Save/load: Persist labeled datasets as GeoJSON for continued refinement

Interactive Search Examples

Label a point and search
Start your search by picking a point for which you would like to find similar ones in your area, then click Search. This will open a tile panel showing your search results but also show them on the map.

Polygon Labeling
Search is iterative: positives get added to your query vector and negatives get subtracted. Use polygon labeling mode for bulk positive/negative selection.

Tile Panel Labeling You can use the tile panel to pan the map to a given tile, as well as label them.

Change basemap You can cycle through different basemaps as you are searching. This will cycle through in on both the leaflet map and the individual tiles in the tile panel. NB: we currently do not support pulling the S2 basemaps into the tiles, these are only available as tiles on the leaflet map served via Google Earth Engine.

Load Previous Datasets
Save your search results as GeoJSON and reload them to continue searching.

Use Google Street View You can also use google maps/street view to help you label.

Architecture

The GeoVibes system is designed for efficient large-scale geospatial similarity search. The core of the system is a hybrid architecture combining a FAISS index for fast vector search with a DuckDB database for storing metadata and geometries.

This dual-component system is built using the script in geovibes/database/faiss_db.py and is optimized for performance with the following features:

FAISS Index: It uses a highly optimized FAISS (Facebook AI Similarity Search) index for efficient similarity search on high-dimensional vector embeddings. The script builds an Inverted File (IVF) index with Product Quantization (PQ) for float embeddings or Scalar Quantization for int8 embeddings, enabling fast approximate nearest neighbor search.
DuckDB Metadata Store: A DuckDB database stores metadata associated with each vector, including a unique ID and the geometry. This allows for quick retrieval of spatial and other attributes after a vector search.

This combination of a dedicated vector index and a spatial database allows GeoVibes to perform complex queries that combine both content-based similarity and geographic location.

Earth Engine Authentication (Optional - for NDVI/NDWI basemaps)

Earth Engine authentication is completely optional. GeoVibes works perfectly without it!

If you want to use NDVI and NDWI basemaps, you'll need to authenticate with Google Earth Engine and explicitly opt in to the Earth Engine tiles:

# Make sure dependencies are installed
uv pip install -e .

# Authenticate with Earth Engine
earthengine authenticate

Follow the authentication flow in your browser. This is only required if you want the NDVI/NDWI basemap options. GeoVibes keeps Earth Engine disabled unless you opt in via:

Adding enable_ee: true to your config.yaml
Passing enable_ee=True when you construct GeoVibes
Exporting GEOVIBES_ENABLE_EE=1 in your environment

Step 3: Build Searchable Database

This method uses Facebook AI's FAISS library to create a highly optimized index, which is stored separately from the metadata in a DuckDB database. you can point this script at either a local set of embeddings or embeddings stored in the cloud. In this example we use the Earth Genome Softcon Embeddings on Source Cooperative.

# Build a full Alabama index using a ROI and remote embeddings
python geovibes/database/faiss_db.py \
  --roi-file geometries/alabama.geojson \
  --mgrs-reference-file geometries/mgrs_tiles.parquet \
  --embedding-dir s3://us-west-2.opendata.source.coop/earthgenome/earthindexembeddings/2024/ \
  --name earthgenome_softcon_2024_2025 \
  --tile-pixels 32 \
  --tile-overlap 16 \
  --tile-resolution 10 \
  --output_dir local_databases

# Build a smaller dry-run index for testing
python geovibes/database/faiss_db.py \
  --roi-file geometries/alabama.geojson \
  --mgrs-reference-file geometries/mgrs_tiles.parquet \
  --embedding-dir s3://us-west-2.opendata.source.coop/earthgenome/earthindexembeddings/2024/ \
  --name earthgenome_softcon_2024_2025 \
  --tile-pixels 32 --tile-overlap 16 --tile-resolution 10 \
  --output_dir local_databases \
  --dry-run --dry-run-size 5

This script processes parquet files from an input directory, builds a FAISS index, and creates a separate DuckDB file containing the metadata for the embeddings.

Classification Pipeline

GeoVibes includes a classification pipeline for training binary classifiers on labeled embeddings. This enables workflows like: label examples → train classifier → review detections → refine with corrections.

Basic Usage

# Train classifier on labeled data
uv run python -m geovibes.classification.pipeline \
    --positives labeled_dataset.geojson \
    --output results/ \
    --db path/to/embeddings.duckdb \
    --threshold 0.5

Iterative Retraining with Corrections

After reviewing detection results in the GeoVibes UI, export corrections and retrain:

# Combine original labels with corrections from detection review
uv run python -m geovibes.classification.pipeline \
    --positives original_labels.geojson corrections.geojson \
    --output results_v2/ \
    --db path/to/embeddings.duckdb \
    --correction-weight 2.5

The --correction-weight parameter emphasizes correction samples (class relabel_pos/relabel_neg) during training, helping the model learn from its mistakes.

Detection Review Mode

When you load a GeoJSON file with a probability field (classifier output), GeoVibes automatically enters detection review mode:

Threshold slider: Filter detections by probability, with min/max set from your dataset
Colormap visualization: Probability-scaled colors (plasma colormap)
Interactive labeling: Click or polygon-select to mark false positives/negatives
Export corrections: Save as augmented dataset for retraining

Spatial Cross-Validation

For robust model evaluation, use 5-fold spatial cross-validation instead of a single train/test split:

uv run python -m geovibes.classification.pipeline \
    --positives labeled_data.geojson corrections.geojson \
    --negatives sampled_negatives.geojson \
    --db path/to/embeddings.duckdb \
    --output results/ \
    --cv \
    --cv-buffer 500

The spatial CV algorithm:

Buffers positive points (default 500m) and unions into contiguous clusters
Explodes union into separate polygon regions
Distributes clusters across 5 folds with greedy sample balancing
Assigns negatives to the fold of their nearest positive cluster
Reports mean ± std for all metrics

Example output:

=================================================================
SPATIAL 5-FOLD CROSS-VALIDATION RESULTS
=================================================================
Metric       Mean ± Std
-----------------------------------------------------------------
Accuracy:    0.9734 ± 0.0297
Precision:   0.9387 ± 0.0988
Recall:      0.9805 ± 0.0115
F1:          0.9566 ± 0.0590
AUC-ROC:     0.9970 ± 0.0034
-----------------------------------------------------------------
Spatial clusters: 59 (distributed across 5 folds)
=================================================================

This tests generalization to new geographic areas, not just new samples from areas the model has seen.

CLI Options

Argument	Description
`--positives`	One or more GeoJSON files with training examples
`--negatives`	Optional separate file for negative examples
`--output`	Output directory for results
`--db`	Path to DuckDB database with embeddings
`--threshold`	Probability threshold for detection (default: 0.5)
`--correction-weight`	Weight multiplier for correction samples (default: 1.0)
`--batch-size`	Batch size for inference (default: 100000)
`--test-fraction`	Fraction of data for test set (default: 0.2)
`--cv`	Run 5-fold spatial cross-validation (skips inference)
`--cv-buffer`	Buffer distance in meters for spatial clustering (default: 500)

Output Files

The pipeline generates:

classification_detections.geojson - All detections above threshold
classification_union.geojson - Merged detection polygons
model.json - Trained XGBoost model (can be reloaded)

Performance & Limitations

Database scaling: Tested up to 5M embeddings
Memory management: Two-layer approach for handling large datasets
- DuckDB memory limits: Default 24GB (MEMORY_LIMIT = '24GB') controls database operations
- Application chunking: 10,000 embeddings per chunk prevents Python memory overflow during data transfer
- Both are necessary: DuckDB limits control internal operations, chunking controls Python data loading
- Modify DatabaseConstants.MEMORY_LIMIT and EMBEDDING_CHUNK_SIZE for custom allocation
Index performance: FAISS index creation time varies depending on the number of vectors and parameters. For example, training on a sample of a few million vectors can take several minutes.
Future work: Investigating external vector databases (e.g., Qdrant) and custom embedding pipelines.

Contributing

GeoVibes is experimental research code. Contributions welcome for:

Alternative vector index backends (lanceDB etc...)
Front-end/UI/UX (help please)
Custom embedding model support
Performance optimizations
Documentation improvements

Contact: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 406 Commits
.cursor/rules		.cursor/rules
geometries		geometries
geovibes		geovibes
images		images
notebooks		notebooks
test_data		test_data
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.yaml		config.yaml
download_embeddings.py		download_embeddings.py
manifest.csv		manifest.csv
pyproject.toml		pyproject.toml
uv.lock		uv.lock
vibe_checker.ipynb		vibe_checker.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluate Your Geospatial Models Vibes

Installation

Quick Start

Prepare Local Data

Configuration

YAML Configuration (Recommended)

Environment Variables

Notebook (`vibe_checker.ipynb`)

Features

Interactive Search Examples

Architecture

Earth Engine Authentication (Optional - for NDVI/NDWI basemaps)

Step 3: Build Searchable Database

Classification Pipeline

Basic Usage

Iterative Retraining with Corrections

Detection Review Mode

Spatial Cross-Validation

CLI Options

Output Files

Performance & Limitations

Contributing

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

License

cr458/geovibes

Folders and files

Latest commit

History

Repository files navigation

Evaluate Your Geospatial Models Vibes

Installation

Quick Start

Prepare Local Data

Configuration

YAML Configuration (Recommended)

Environment Variables

Notebook (vibe_checker.ipynb)

Features

Interactive Search Examples

Architecture

Earth Engine Authentication (Optional - for NDVI/NDWI basemaps)

Step 3: Build Searchable Database

Classification Pipeline

Basic Usage

Iterative Retraining with Corrections

Detection Review Mode

Spatial Cross-Validation

CLI Options

Output Files

Performance & Limitations

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Notebook (`vibe_checker.ipynb`)

Packages