Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 22 additions & 13 deletions docs/contributors/dev-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,26 +80,35 @@ except Exception as e:

### DuckDB Session Management

MXCP uses a specific pattern for DuckDB connection management:
MXCP manages DuckDB connections:

- **No connection pooling**: DuckDB is embedded, so we use single connections per session
- **One connection per operation**: Each CLI command creates its own session
- **Session-scoped setup**: All Python functions, secrets, and plugins are loaded per session at startup
- **Context manager pattern**: Ensures proper cleanup of connections
- **Connection pooling**: MXCP manages a pool of connections for efficient resource usage
- **Graceful reloads**: Connection pool intelligently drains and refreshes without service interruption
- **Thread-safe operations**: Each request gets its own connection from the pool
- **Context manager pattern**: Ensures proper acquisition and return of connections to the pool
- **Zero-downtime updates**: Database changes are visible to new connections without stopping the service

```python
# Session creation and management pattern
with DuckDBSession(user_config, site_config, profile, readonly=readonly) as session:
# Modern connection management pattern using DuckDBRuntime
runtime = DuckDBRuntime(database_config, plugins, plugin_config, secrets)

# Get a connection from the pool
with runtime.get_connection() as session:
# All database operations happen here
result = session.conn.execute(sql, params).fetchdf()
result = session.execute_query_to_dict(sql, params)

# Connection automatically closed when context exits
# Connection automatically returned to pool when context exits

# For graceful shutdown
runtime.shutdown()
```

**Server vs CLI sessions:**
- **CLI commands**: Create new session per operation
- **Server mode**: Single shared session with thread-safe locking
- **Session initialization**: Extensions, secrets, and plugins loaded once per session
**Connection Management:**
- **CLI commands**: Create their own `DuckDBRuntime` instance for the operation
- **Server mode**: Shared `DuckDBRuntime` with connection pool (default size: 2 × CPU cores)
- **Thread safety**: Each request gets its own connection from the pool
- **Initialization**: Extensions, secrets, and plugins loaded once when pool is created
- **Reload behavior**: Pool gracefully drains and refreshes without downtime

### Common CLI Patterns

Expand Down
2 changes: 1 addition & 1 deletion docs/features/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ MXCP provides a comprehensive set of enterprise features designed for production

### [Python Reference](../reference/python.md)
- **Runtime APIs**: Database, config, secrets access
- **Lifecycle Hooks**: Server initialization/shutdown
- **Lifecycle Hooks**: Server initialization/shutdown/reload
- **Thread Safety**: Concurrent execution support
- **Type Compatibility**: Seamless SQL/Python integration

Expand Down
157 changes: 157 additions & 0 deletions docs/features/python-endpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,163 @@ def cleanup():

**Important:** These hooks are for managing Python resources (HTTP clients, connections to external services, etc.), NOT for database management. The DuckDB connection is managed automatically by MXCP.

## Dynamic Reload with Database Rebuild

MXCP provides a feature that allows Python endpoints to trigger a safe reload of the server. This enables you to, for example, update your DuckDB database externally without restarting the server.

### Why Use DuckDB Reload?

**In most cases, you don't need this feature.** Your Python endpoints can perform database operations directly using the `db` proxy. DuckDB's concurrency model allows a single process (MXCP) to own the connection while multiple threads operate on it safely.

Even if you're using dbt, you can invoke the dbt Python API directly from your endpoints. Since it runs in the same process, dbt can apply changes to the DuckDB database without issues - this works correctly under DuckDB's MVCC transactional model.

However, sometimes you may need to run external tools or processes that require exclusive access to the DuckDB database file. In these cases, MXCP must temporarily release its hold on the database so the external tool can operate safely.

This is where MXCP's `reload_duckdb` solves these problems by providing a safe way to rebuild your database while the server continues handling requests.

### How It Works

```python
from mxcp.runtime import reload_duckdb
import subprocess
import pandas as pd

def update_analytics_data():
"""Endpoint that triggers a data refresh."""

def rebuild_database():
"""This runs with all connections closed."""
# Option 1: Run dbt to rebuild models
# NOTE: This is just an example of running an external tool.
# In most cases, you should use the dbt Python API directly instead.
subprocess.run(["dbt", "run", "--target", "prod"], check=True)

# Option 2: Replace with a pre-built database
import shutil
shutil.copy("/staging/analytics.duckdb", "/app/data/analytics.duckdb")

# Option 3: Load fresh data from APIs/files
df = pd.read_parquet("s3://bucket/latest-data.parquet")
# DuckDB file is exclusively ours during rebuild
import duckdb
conn = duckdb.connect("/app/data/analytics.duckdb")
conn.execute("CREATE OR REPLACE TABLE sales AS SELECT * FROM df")
conn.close()

# Schedule the reload with our rebuild function
# The payload function only runs after the server has drained all connections
# and released its hold on the database. This ensures safe external access.
# Afterwards, everything automatically comes back up with the updated data.
reload_duckdb(
payload_func=rebuild_database,
description="Updating analytics data"
)

# Return immediately - reload happens asynchronously
return {"status": "Data refresh scheduled", "message": "Reload will complete in background"}
```

### The Reload Process

When you call `reload_duckdb`, MXCP:

1. **Queues the reload request** - Function returns immediately
2. **Drains active requests** - Existing requests complete normally
3. **Shuts down runtime components** - Closes Python hooks and DuckDB connections
4. **Runs your payload function** - With all connections closed
5. **Restarts runtime components** - Fresh configuration and connections
6. **Processes waiting requests** - With the updated data

The reload happens asynchronously after your request completes.

**Important:** Remember that you normally don't need to use this feature. Only use `reload_duckdb` if you absolutely must have an external process update the DuckDB database file. In general, direct database operations through the `db` proxy are preferred.

### Real-World Example: Scheduled Data Updates

```python
from mxcp.runtime import reload_duckdb, db
from datetime import datetime
import requests

def scheduled_update(source: str = "api") -> dict:
"""Endpoint called by cron to update data."""

start_time = datetime.now()

def rebuild_from_api():
"""Fetch latest data and rebuild database."""
# Fetch data from external API
response = requests.get("https://api.example.com/analytics/export")
data = response.json()

# Write to DuckDB (we have exclusive access)
import duckdb
conn = duckdb.connect("/app/data/analytics.duckdb")

# Clear old data
conn.execute("DROP TABLE IF EXISTS daily_metrics")

# Load new data
conn.execute("""
CREATE TABLE daily_metrics AS
SELECT * FROM read_json_auto(?)
""", [data])

# Update metadata
conn.execute("""
INSERT INTO update_log (timestamp, source, record_count)
VALUES (?, ?, ?)
""", [datetime.now(), source, len(data)])

conn.close()

# Schedule the rebuild
reload_duckdb(
payload_func=rebuild_from_api,
description=f"Scheduled update from {source}"
)

# Return immediately - the reload happens asynchronously
return {
"status": "scheduled",
"source": source,
"timestamp": datetime.now().isoformat(),
"message": "Data update will complete in background"
}
```

### Best Practices

**Primary recommendation: Avoid using `reload_duckdb` when possible.** Use direct database operations through the `db` proxy instead - this works fine for most use cases and is much simpler.

When you do need to use `reload_duckdb`:

1. **Keep payload functions focused** - Do one thing well in your payload function
2. **Handle errors gracefully** - Failed reloads leave the server in its previous state
3. **Return quickly** - The reload happens asynchronously, so return a status immediately
4. **Test thoroughly** - Payload functions run with all connections closed
5. **Use for data updates** - Not for schema migrations or structural changes
6. **Check completion indirectly** - Query data or use monitoring to verify reload completed

### Configuration-Only Reloads

You can also reload just the configuration (secrets, environment variables) without a payload:

```python
def rotate_secrets():
"""Endpoint to reload after secret rotation."""
# Schedule config reload without database rebuild
reload_duckdb(description="Reloading after secret rotation")

# Return immediately - new secrets will be active after reload
return {
"status": "Reload scheduled",
"message": "Configuration will refresh in background"
}
```

**Important Note:** Since `reload_duckdb` is asynchronous, you cannot immediately use the new configuration values. The reload happens after your current request completes.

## Async Functions

Python endpoints support both synchronous and asynchronous functions:
Expand Down
7 changes: 4 additions & 3 deletions docs/guides/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -575,9 +575,10 @@ kill -HUP <pid>
```

The reload process:
1. **Only external references are refreshed** - the configuration file structure is NOT re-read
2. **Service remains available** - queries wait for the reload to complete
3. **Automatic rollback on failure** - if new values cause errors, the server continues with old values
1. **SIGHUP handler waits synchronously** - up to 60 seconds for the reload to complete
2. **Only external references are refreshed** - the configuration file structure is NOT re-read
3. **Service remains available** - new requests wait while reload completes
4. **Automatic rollback on failure** - if new values cause errors, the server continues with old values

What gets refreshed:
- ✅ Vault secrets (vault://)
Expand Down
15 changes: 13 additions & 2 deletions docs/guides/operational.md
Original file line number Diff line number Diff line change
Expand Up @@ -871,7 +871,7 @@ http:

### SIGHUP Configuration Reload

MXCP supports hot configuration reload via SIGHUP:
MXCP supports hot configuration reload via SIGHUP. The reload process is designed to be safe and minimize disruption:

```bash
# Send SIGHUP to reload configuration
Expand All @@ -881,16 +881,27 @@ kill -HUP <mxcp-pid>
docker kill -s HUP mxcp-container
```

**Reload Process:**
1. SIGHUP handler queues a reload request
2. Active requests are allowed to complete (drained)
3. Runtime components are shut down, including DuckDB connection pool
4. Configuration files are re-read from disk
5. Runtime components are restarted with new configuration, including a new DuckDB connection pool
6. The handler waits up to 60 seconds for completion

What gets reloaded:
- External configuration values (environment variables, vault://, file://)
- Secret values
- Database connections
- Database connection pool (gracefully, without downtime)
- Python runtime hooks

What doesn't reload:
- Endpoint definitions (requires restart)
- OAuth configuration (requires restart)
- Transport settings (requires restart)

**Note:** New requests that arrive during reload will wait until the reload completes before being processed.

### Graceful Shutdown

MXCP handles SIGTERM for graceful shutdown:
Expand Down
63 changes: 62 additions & 1 deletion docs/reference/python.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Python endpoints in MXCP have access to the `mxcp.runtime` module, which provide
## Quick Example

```python
from mxcp.runtime import db, config, plugins, on_init, on_shutdown
from mxcp.runtime import db, config, plugins, on_init, on_shutdown, reload_duckdb

def my_endpoint(param: str) -> dict:
# Query database
Expand Down Expand Up @@ -139,6 +139,67 @@ def cleanup():
print("Server shutting down")
```

## Reload Management

### `reload_duckdb(payload_func=None, description="")`
Request an asynchronous system reload with an optional payload function.

This feature allows Python endpoints to trigger a safe reload of the MXCP server, optionally executing custom logic like rebuilding the DuckDB database with new data. The reload process:
1. Queues the reload request and returns immediately
2. Active requests are drained (allowed to complete)
3. Runtime components (Python hooks + DuckDB) are shut down
4. Your payload function runs (if provided)
5. Runtime components are restarted with fresh configuration

**Important:** For most database updates, you don't need to use `reload_duckdb`. You can perform database operations directly using the `db` proxy without triggering any reload. This is the recommended approach since DuckDB supports concurrent operations through its MVCC transactional model.

Only use `reload_duckdb` when you need external tools to have exclusive access to the database file.

```python
from mxcp.runtime import reload_duckdb
import subprocess
import shutil

def replace_database():
"""Payload function - runs with all connections closed."""
# Run dbt to rebuild models
subprocess.run(["dbt", "run"], check=True)

# Or copy a new database file
shutil.copy("/data/updated.duckdb", "/app/data.duckdb")

# Or fetch and load new data
fetch_latest_data()
load_into_duckdb()

# Schedule reload with database replacement
reload_duckdb(
payload_func=replace_database,
description="Replacing database with updated version"
)

# Or just reload configuration (refreshes secrets, env vars, etc.)
reload_duckdb()

# Return immediately - reload happens asynchronously
return {"status": "Reload scheduled"}
```

**Use Cases:**
- Updating DuckDB data without server restart
- Running ETL pipelines on demand
- Refreshing materialized views
- Swapping in pre-built database files
- Reloading configuration after secret rotation

**Important Notes:**
- This function returns immediately (non-blocking)
- The reload happens asynchronously after the current request completes
- The payload function runs with all connections closed
- Only one reload can be processing at a time
- From MCP tools, you cannot wait for completion - check status indirectly
- Only available when called from within MXCP endpoints

## Context Availability

The runtime context is automatically set when your function is called by MXCP. All functions are thread-safe and maintain proper isolation between concurrent requests.
Expand Down
9 changes: 5 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,6 @@ packages = [
# Server - Executor
"mxcp.server.executor",
"mxcp.server.executor.runners",
"mxcp.server.executor.session",
# Server - Interfaces
"mxcp.server.interfaces",
"mxcp.server.interfaces.cli",
Expand All @@ -134,11 +133,10 @@ packages = [
"mxcp.sdk.core.analytics",
"mxcp.sdk.core.config",
"mxcp.sdk.core.config.resolvers",
"mxcp.sdk.core.config.schemas",
"mxcp.sdk.duckdb",
"mxcp.sdk.evals",
"mxcp.sdk.executor",
"mxcp.sdk.executor.plugins",
"mxcp.sdk.executor.plugins.duckdb_plugin",
"mxcp.sdk.executor.plugins.python_plugin",
"mxcp.sdk.policy",
"mxcp.sdk.telemetry",
Expand All @@ -151,7 +149,7 @@ include-package-data = true
[tool.setuptools.package-data]
"mxcp" = ["py.typed"]
"mxcp.server.schemas" = ["*.json"]
"mxcp.sdk.core.config.schemas" = ["*.json"]
"mxcp.sdk.core.config" = ["schemas/*.json"]
"mxcp.sdk.validator.decorators.schemas" = ["*.json"]

[tool.pytest.ini_options]
Expand All @@ -167,6 +165,9 @@ markers = [
[tool.black]
line-length = 100
target-version = ["py310"]
extend-exclude = '''
^/examples/
'''

[tool.ruff]
line-length = 100
Expand Down
Loading
Loading