This repository aggregates multiple command line loaders that migrate and/or incrementally sync data from common SQL supporting systems into FalkorDB using declarative JSON/YAML mappings. It includes a control plane web tool to configure, initiate and track data migration (ETL/CDC) runs.
- Prerequisites
- Tools
- Metrics exposed by each tool
- Common concepts
- Scaffold schema + template generation behavior
- FalkorDB connection
- Rust toolchain (Cargo)
- Node.js + npm (optional; for the control plane UI)
- Network access to your source system (BigQuery / ClickHouse / Databricks / MariaDB / MySQL / PostgreSQL / Snowflake / Spark / SQL Server)
- A reachable FalkorDB endpoint (for example
falkor://127.0.0.1:6379)
Tool list view in the Control Plane
Configuration File Editor with Graph Schema preview
- Location:
BigQuery-to-FalkorDB/ - What it does: Loads and incrementally syncs tabular data from BigQuery into FalkorDB using GoogleSQL (ANSI SQL mode) over BigQuery REST APIs, with optional purge modes and daemon mode.
- Scaffolding: supports
--introspect-schemaand--generate-templateusing BigQueryINFORMATION_SCHEMAmetadata; emits fully-qualified source tables and inferred node/edge mappings where metadata is available. - Documentation: BigQuery-to-FalkorDB/README.md
- Scaffold behavior: see Scaffold schema + template generation behavior
Quick start (from the crate directory):
cd BigQuery-to-FalkorDB/bigquery-to-falkordb
cargo build --release
# Run once
cargo run --release -- --config ../bigquery_sample.yaml
# Continuous sync
cargo run --release -- --config ../bigquery_sample.yaml --daemon --interval-secs 60- Location:
ClickHouse-to-FalkorDB/ - What it does: Migrates and continuously syncs data from ClickHouse into FalkorDB (supports full/incremental modes, optional purge modes, and daemon mode).
- Scaffolding: supports schema introspection and starter template generation via
--introspect-schemaand--generate-template; infers node mappings from tables and conservative relationship mappings from id-like columns with review notes. - Documentation: ClickHouse-to-FalkorDB/readme.md
- Scaffold behavior: see Scaffold schema + template generation behavior
Quick start (from the crate directory):
cd ClickHouse-to-FalkorDB
cargo build --release
# Single run
cargo run --release -- --config clickhouse.incremental.yaml
# Continuous sync
cargo run --release -- --config clickhouse.incremental.yaml --daemon --interval-secs 60- Location:
Databricks-to-FalkorDB/ - What it does: Loads and incrementally syncs tabular data from Databricks (Databricks SQL / warehouses) into FalkorDB based on a JSON/YAML mapping config.
- Scaffolding: supports
--introspect-schemaand--generate-templateusing Databricksinformation_schema; emits catalog/schema-qualified source tables and inferred node/edge mappings where metadata is available. - Documentation: Databricks-to-FalkorDB/README.md
- Scaffold behavior: see Scaffold schema + template generation behavior
Quick start (from the crate directory):
cd Databricks-to-FalkorDB/databricks-to-falkordb
cargo build --release
# Run once
cargo run --release -- --config path/to/config.yaml- Location:
MariaDB-to-FalkorDB/ - What it does: Migrates and continuously syncs data from MariaDB into FalkorDB (supports full/incremental modes, optional purge modes, and daemon mode).
- Scaffolding: supports schema extraction and template generation from metadata (
information_schema) with PK/UK/FK-based inference, join-table heuristics, and schema-qualified source table output. - Documentation: MariaDB-to-FalkorDB/readme.md
- Scaffold behavior: see Scaffold schema + template generation behavior
- End-to-end sample:
MariaDB-to-FalkorDB/sample_data/+MariaDB-to-FalkorDB/mariadb_sample_to_falkordb.yaml
Quick start (from the crate directory):
cd MariaDB-to-FalkorDB
cargo build --release
# Single run
cargo run --release -- --config mariadb.incremental.yaml
# Continuous sync
cargo run --release -- --config mariadb.incremental.yaml --daemon --interval-secs 60- Location:
MySQL-to-FalkorDB/ - What it does: Migrates and continuously syncs data from MySQL into FalkorDB (supports full/incremental modes, optional purge modes, and daemon mode).
- Scaffolding: supports schema extraction and template generation from metadata (
information_schema) with PK/UK/FK-based inference, join-table heuristics, and schema-qualified source table output. - Documentation: MySQL-to-FalkorDB/readme.md
- Scaffold behavior: see Scaffold schema + template generation behavior
- End-to-end sample:
MySQL-to-FalkorDB/sample_data/+MySQL-to-FalkorDB/mysql_sample_to_falkordb.yaml
Quick start (from the crate directory):
cd MySQL-to-FalkorDB
cargo build --release
# Single run
cargo run --release -- --config mysql.incremental.yaml
# Continuous sync
cargo run --release -- --config mysql.incremental.yaml --daemon --interval-secs 60- Location:
PostgreSQL-to-FalkorDB/ - What it does: Migrates and continuously syncs data from PostgreSQL into FalkorDB (supports full or incremental mode; optional daemon mode).
- Scaffolding: supports schema introspection and template generation from PostgreSQL catalogs with qualified
schema.tableoutput and incremental delta detection includinglast_update. - Documentation: PostgreSQL-to-FalkorDB/README.md
- Scaffold behavior: see Scaffold schema + template generation behavior
Quick start (from the crate directory):
cd PostgreSQL-to-FalkorDB/postgres-to-falkordb
cargo build --release
# Single run
cargo run --release -- --config example.config.yaml
# Continuous sync
cargo run --release -- --config example.config.yaml --daemon --interval-secs 60- Location:
Snowflake-to-FalkorDB/ - What it does: Migrates and continuously syncs structured data from Snowflake into FalkorDB (supports incremental watermarks, optional purge modes, and daemon mode).
- Scaffolding: supports schema introspection and template generation from Snowflake metadata views, emitting fully-qualified source tables and best-effort FK-derived edge mappings with ambiguity notes.
- Documentation: Snowflake-to-FalkorDB/README.md
- Scaffold behavior: see Scaffold schema + template generation behavior
Quick start (from the crate directory):
cd Snowflake-to-FalkorDB
cargo build --release
# Single run
cargo run --release -- --config path/to/config.yaml
# Continuous sync
cargo run --release -- --config path/to/config.yaml --daemon --interval-secs 300- Location:
Spark-to-FalkorDB/ - What it does: Loads and incrementally syncs Spark SQL result sets into FalkorDB using a declarative mapping config.
- Source transport: Apache Livy interactive sessions (
spark.livy_url+spark.session_id), supporting table-based and custom query sources. - Parity controls: supports
source.query_count, partition hints/ranges (source.partition.*), schema strategy controls (preserve/json_stringify/drop_complex/flatten), and edge endpoint match shorthand. - Hardening: includes transient Livy retry/backoff controls and clearer classified error messages for auth/throttle/timeout/statement failures.
- Scaffolding: supports
--introspect-schemaand--generate-templateviaSHOW TABLESandDESCRIBE TABLEmetadata. - Documentation: Spark-to-FalkorDB/README.md
- Scaffold behavior: see Scaffold schema + template generation behavior
Quick start (from the crate directory):
cd Spark-to-FalkorDB/spark-to-falkordb
cargo build --release
# Run once
cargo run --release -- --config path/to/config.yaml- Location:
SQLServer-to-FalkorDB/ - What it does: Migrates and continuously syncs data from SQL Server into FalkorDB (supports full/incremental modes, optional purge modes, and daemon mode).
- Scaffolding: supports schema introspection and template generation from SQL Server system catalogs with PK/UK/FK inference, join-table detection, and qualified
schema.tablesources. - Documentation: SQLServer-to-FalkorDB/readme.md
- Scaffold behavior: see Scaffold schema + template generation behavior
- End-to-end sample:
SQLServer-to-FalkorDB/sample_data/+SQLServer-to-FalkorDB/sqlserver_sample_to_falkordb.yaml
Quick start (from the crate directory):
cd SQLServer-to-FalkorDB
cargo build --release
# Single run
cargo run --release -- --config sqlserver.incremental.yaml
# Continuous sync
cargo run --release -- --config sqlserver.incremental.yaml --daemon --interval-secs 60- Location:
control-plane/(control-plane/server+control-plane/ui) - What it does: Runs alongside the loaders and provides a web UI + REST API to:
- Discover tools by scanning the repo for
tool.manifest.json - Create/edit per-tool configs (YAML or JSON) with a syntax-highlighted editor
- Start runs (one-shot or daemon) and stop running jobs
- Auto-configure internal metrics collector ports for metrics-capable tools when launching runs
- Stream logs live (SSE) and keep run history (SQLite)
- View run log output after the fact (persisted per-run log file)
- Inspect and clear file-backed incremental state (watermarks) per config
- View per-tool runtime metrics (including per-mapping counters where supported), persisted in the control-plane database
- Discover tools by scanning the repo for
Quick start (server):
cd control-plane/server
# Optional: require an API key for all /api routes (except /api/health)
export CONTROL_PLANE_API_KEY="..."
cargo run --release
# UI (if built) + API will be on http://localhost:3003UI development (optional):
cd control-plane/ui
npm install
npm run dev
# Vite runs on http://localhost:5173 and proxies /api to http://localhost:3003Configuration:
CONTROL_PLANE_BIND(default:0.0.0.0:3003)CONTROL_PLANE_REPO_ROOT(optional; migration repo root to scan for tool manifests)CONTROL_PLANE_DATA_DIR(default:control-plane/data/)CONTROL_PLANE_UI_DIST(default:control-plane/ui/dist/; if missing, the API still works)CONTROL_PLANE_API_KEY(optional; if set, calls must includeAuthorization: Bearer <key>)
Notes:
- The config editor supports YAML/JSON syntax highlighting (Auto/YAML/JSON selector).
- The config editor provides 4 viewer tabs: Config file, Extracted schema, Generated template, and Graph visualization.
- Graph visualization is derived from the selected config file mappings (config-first), with scaffold-template fallback only when mappings are missing/unusable.
- The UI has an "API key" button that stores the key in browser localStorage.
- The log stream endpoint uses Server-Sent Events. Since
EventSourcecan’t set headers, the UI falls back to?api_key=<token>for SSE when an API key is configured. - Runtime data lives under
CONTROL_PLANE_DATA_DIR(by defaultcontrol-plane/data/), including a SQLite DB (control-plane.sqlite) and per-run artifacts/logs underruns/<run_id>/. - Runs are executed locally on the machine running the control plane server (it spawns the underlying CLI tools).
- Metrics endpoints/ports are internal collector settings from each tool manifest and are not shown in the Metrics UI.
Selected API endpoints:
GET /api/healthGET /api/tools,GET /api/tools/:tool_idPOST /api/tools/:tool_id/scaffold-template(generate mapping template from source schema for supported tools)POST /api/tools/:tool_id/schema-graph-preview(build canvas graph preview from config mappings; returns warnings and derivation source)GET /api/configs,POST /api/configsGET /api/configs/:config_id,PUT /api/configs/:config_idGET /api/configs/:config_id/state,POST /api/configs/:config_id/state/clearGET /api/runs,POST /api/runsGET /api/runs/:run_id,POST /api/runs/:run_id/stopGET /api/runs/:run_id/events(SSE)GET /api/runs/:run_id/logs(persisted log lines for viewing past runs)GET /api/metrics(all tools metrics snapshot; optional?config_id=<uuid>to scope to one ETL config)GET /api/metrics/:tool_id(single tool metrics snapshot; optional?config_id=<uuid>)
For tools that expose runtime metrics, configure both:
capabilities.supports_metrics: true- a
metricssection in the manifest
The control plane uses this in two places:
- Run start: when a run is started, the server parses the port from
metrics.endpointand adds--metrics-port <port>to the tool invocation. - Metrics collection + persistence: while a run is active, the server polls the raw endpoint, filters samples by
metricPrefix, groups per-mapping samples bymappingLabel(defaultmapping), and stores snapshots in SQLite.
/api/metrics and /api/metrics/:tool_id now serve the latest persisted snapshot, so metrics remain available even after tool processes stop.
Both endpoints support optional config_id filtering, which is useful when multiple ETL configurations use the same tool (for example, different source tables mapped to different destination graphs).
When config_id is provided, the control plane returns the latest persisted snapshot for that specific config context instead of the latest snapshot across all configs of the tool.
The Metrics UI reads these persisted snapshots and does not display raw scrape endpoint/port details.
metrics fields:
endpoint: HTTP endpoint to scrape (internal collector setting, not shown in UI; for examplehttp://127.0.0.1:9993/)format: currentlyprometheus_textmetricPrefix: prefix used to match this tool’s metric namesmappingLabel: label key used for per-mapping metrics
Adding a new tool to the control plane:
- Add a
tool.manifest.jsonanywhere under the repo root (the control plane scans to depth 4). - The manifest declares how to run the tool and which optional features it supports (daemon/purge/etc.).
Minimal example:
{
"id": "my_tool",
"displayName": "My Source → FalkorDB",
"description": "...",
"workingDir": "path/to/tool/dir",
"executable": {
"type": "cargo",
"manifestPath": "path/to/Cargo.toml",
"release": true
},
"capabilities": {
"supports_daemon": false,
"supports_purge_graph": false,
"supports_purge_mapping": false,
"supports_metrics": false
},
"config": {
"fileExtensions": [".yaml", ".yml", ".json"],
"examples": []
},
"metrics": {
"endpoint": "http://127.0.0.1:9999/",
"format": "prometheus_text",
"metricPrefix": "my_tool_to_falkordb_",
"mappingLabel": "mapping"
}
}bigquery_to_falkordb_runsbigquery_to_falkordb_failed_runsbigquery_to_falkordb_rows_fetchedbigquery_to_falkordb_rows_writtenbigquery_to_falkordb_rows_deletedbigquery_to_falkordb_mapping_runs{mapping="<name>"}bigquery_to_falkordb_mapping_failed_runs{mapping="<name>"}bigquery_to_falkordb_mapping_rows_fetched{mapping="<name>"}bigquery_to_falkordb_mapping_rows_written{mapping="<name>"}bigquery_to_falkordb_mapping_rows_deleted{mapping="<name>"}
clickhouse_to_falkordb_runsclickhouse_to_falkordb_failed_runsclickhouse_to_falkordb_rows_fetchedclickhouse_to_falkordb_rows_writtenclickhouse_to_falkordb_rows_deletedclickhouse_to_falkordb_mapping_runs{mapping="<name>"}clickhouse_to_falkordb_mapping_failed_runs{mapping="<name>"}clickhouse_to_falkordb_mapping_rows_fetched{mapping="<name>"}clickhouse_to_falkordb_mapping_rows_written{mapping="<name>"}clickhouse_to_falkordb_mapping_rows_deleted{mapping="<name>"}
databricks_to_falkordb_runsdatabricks_to_falkordb_failed_runsdatabricks_to_falkordb_rows_fetcheddatabricks_to_falkordb_rows_writtendatabricks_to_falkordb_rows_deleteddatabricks_to_falkordb_mapping_runs{mapping="<name>"}databricks_to_falkordb_mapping_failed_runs{mapping="<name>"}databricks_to_falkordb_mapping_rows_fetched{mapping="<name>"}databricks_to_falkordb_mapping_rows_written{mapping="<name>"}databricks_to_falkordb_mapping_rows_deleted{mapping="<name>"}
mariadb_to_falkordb_runsmariadb_to_falkordb_failed_runsmariadb_to_falkordb_rows_fetchedmariadb_to_falkordb_rows_writtenmariadb_to_falkordb_rows_deletedmariadb_to_falkordb_mapping_runs{mapping="<name>"}mariadb_to_falkordb_mapping_failed_runs{mapping="<name>"}mariadb_to_falkordb_mapping_rows_fetched{mapping="<name>"}mariadb_to_falkordb_mapping_rows_written{mapping="<name>"}mariadb_to_falkordb_mapping_rows_deleted{mapping="<name>"}
mysql_to_falkordb_runsmysql_to_falkordb_failed_runsmysql_to_falkordb_rows_fetchedmysql_to_falkordb_rows_writtenmysql_to_falkordb_rows_deletedmysql_to_falkordb_mapping_runs{mapping="<name>"}mysql_to_falkordb_mapping_failed_runs{mapping="<name>"}mysql_to_falkordb_mapping_rows_fetched{mapping="<name>"}mysql_to_falkordb_mapping_rows_written{mapping="<name>"}mysql_to_falkordb_mapping_rows_deleted{mapping="<name>"}
postgres_to_falkordb_runspostgres_to_falkordb_failed_runspostgres_to_falkordb_rows_fetchedpostgres_to_falkordb_rows_writtenpostgres_to_falkordb_rows_deletedpostgres_to_falkordb_mapping_runs{mapping="<name>"}postgres_to_falkordb_mapping_failed_runs{mapping="<name>"}postgres_to_falkordb_mapping_rows_fetched{mapping="<name>"}postgres_to_falkordb_mapping_rows_written{mapping="<name>"}postgres_to_falkordb_mapping_rows_deleted{mapping="<name>"}
snowflake_to_falkordb_runssnowflake_to_falkordb_failed_runssnowflake_to_falkordb_rows_fetchedsnowflake_to_falkordb_rows_writtensnowflake_to_falkordb_rows_deletedsnowflake_to_falkordb_mapping_runs{mapping="<name>"}snowflake_to_falkordb_mapping_failed_runs{mapping="<name>"}snowflake_to_falkordb_mapping_rows_fetched{mapping="<name>"}snowflake_to_falkordb_mapping_rows_written{mapping="<name>"}snowflake_to_falkordb_mapping_rows_deleted{mapping="<name>"}
spark_to_falkordb_runsspark_to_falkordb_failed_runsspark_to_falkordb_rows_fetchedspark_to_falkordb_rows_writtenspark_to_falkordb_rows_deletedspark_to_falkordb_mapping_runs{mapping="<name>"}spark_to_falkordb_mapping_failed_runs{mapping="<name>"}spark_to_falkordb_mapping_rows_fetched{mapping="<name>"}spark_to_falkordb_mapping_rows_written{mapping="<name>"}spark_to_falkordb_mapping_rows_deleted{mapping="<name>"}
sqlserver_to_falkordb_runssqlserver_to_falkordb_failed_runssqlserver_to_falkordb_rows_fetchedsqlserver_to_falkordb_rows_writtensqlserver_to_falkordb_rows_deletedsqlserver_to_falkordb_mapping_runs{mapping="<name>"}sqlserver_to_falkordb_mapping_failed_runs{mapping="<name>"}sqlserver_to_falkordb_mapping_rows_fetched{mapping="<name>"}sqlserver_to_falkordb_mapping_rows_written{mapping="<name>"}sqlserver_to_falkordb_mapping_rows_deleted{mapping="<name>"}
- Declarative mapping: You define how source rows map to graph nodes and edges.
- Idempotent upserts: Writes use Cypher
UNWIND+MERGEbased on configured keys. - FalkorDB index creation (performance):
- Loaders create required indexes before writes for node key properties and edge endpoint
match_onproperties. - You can also define explicit indexes via
falkordb.indexesin connector configs. - Explicit and implicit indexes are deduplicated and applied for both initial and incremental runs.
- Scaffold-generated templates may include suggested
falkordb.indexesentries when source index metadata is available (best effort by source engine).
- Loaders create required indexes before writes for node key properties and edge endpoint
- Incremental sync: When configured with a watermark column (e.g.
updated_at), the loader fetches only rows newer than the last successful run. - Soft deletes (optional): A configured deleted-flag column/value can be interpreted as deletes in FalkorDB.
- State: Watermarks are typically stored in a file-backed state JSON so runs can resume safely.
Most SQL-style loaders in this repository support scaffold mode:
- BigQuery
- ClickHouse
- Databricks
- MariaDB
- MySQL
- PostgreSQL
- Snowflake
- Spark
- SQL Server
Scaffold mode is exposed through CLI flags:
--introspect-schema: introspects source metadata and prints a normalized schema summary.--generate-template: generates a starter YAML mapping template inferred from schema metadata.--output <path>: writes generated template to file (otherwise prints to stdout).
- Default rule: each table becomes a node mapping.
- Foreign keys become edge mappings.
- Join tables (tables dominated by FK columns) may be inferred as edge mappings with optional edge properties.
- Key selection prefers:
- primary key,
- single-column unique key,
- fallback first/id-like column with review notes.
- Incremental
deltais inferred only when common update/delete columns are found (for exampleupdated_at,last_update,is_deleted).
- Generated templates are starter scaffolds, not guaranteed production-ready configs.
- Scaffold relies on schema metadata and cannot reliably infer business-specific joins that require custom
source.selectSQL. - You should always review and adjust:
- relationship names,
- key/property choices,
- incremental/delete semantics,
- custom edge sources that depend on multi-table joins.
In the control plane Config Editor:
- Preview schema calls scaffold introspection and shows extracted schema.
- Generate template calls scaffold template generation and shows generated YAML.
- Use as config copies generated template into the editable config tab.
- Preview graph renders graph topology in the Graph visualization tab from the selected config mappings (
node/edge), including non-fatal warnings for partial mappings.
Each tool’s config describes the FalkorDB endpoint and graph name. Typical endpoints look like:
falkor://127.0.0.1:6379
See each tool’s README for the exact configuration schema.