gpc-rs

Rust implementation of Generative Robot Policies via Predictive World Modeling. A diffusion policy proposes candidate action sequences, a learned world model imagines their consequences, and an evaluator scores and selects the best trajectory — all composed at inference time in a closed-loop replanning cycle.

Built on Burn for training and Tract for ONNX inference.

Demos

Interactive Web Demo

A browser-based visualization of the full GPC pipeline running in real time against a 2-link robot arm navigating around obstacles.

The native Rust server trains both models from synthetic demonstrations in ~1 second, then serves a REST API that the React frontend consumes to visualize closed-loop replanning: candidate trajectories from the diffusion policy, the raw policy baseline, world-model rollouts, GPC-RANK selection, GPC-OPT refinement, and the resulting executed path.

The demo lets you switch between policy, rank, and opt planner modes. policy shows the unscored diffusion sample directly, rank evaluates K candidates and selects the best one, and opt starts from a policy sample and refines it with the world model.

# Terminal 1 — start the planner server
cargo run --release -p gpc-demo-server

# Terminal 2 — start the frontend
cd web-demo && npm install && npm run dev

# Open http://localhost:5174

See docs/demo-explained.md for a detailed walkthrough of every visual element and how it maps to the paper.

Terminal TUI Demo

cargo run -p world-models-gpc-cli -- demo          # interactive TUI
cargo run -p world-models-gpc-cli -- demo --plain   # plain terminal output

Pipeline

The codebase follows the same high-level decomposition as the paper:

Training (offline):
  Expert demonstrations
      ├── Phase 1: single-step MSE ──→ World Model
      ├── Phase 2: rollout MSE ──────→ World Model (refined)
      └── DDPM noise prediction ─────→ Diffusion Policy

Inference (per step, closed-loop):
  Observation history
      │
      ├── policy.sample_k(K) ──→ K candidate action sequences
      │         │
      │         ▼
      │   world_model.rollout() ──→ K predicted trajectories
      │         │
      │         ▼
      │   reward.score() ──→ K scalar scores
      │         │
      │         ├── GPC-RANK: argmax ──→ best trajectory
      │         └── GPC-OPT:  gradient ascent ──→ refined trajectory
      │                   │
      └──── execute first action only, then replan

Diffusion policy generates diverse candidate action sequences via DDPM reverse diffusion.
World model rolls each candidate forward, predicting future states step by step.
Reward function scores each imagined trajectory (progress, goal proximity, obstacle clearance).
GPC-RANK selects the highest-scoring candidate. GPC-OPT gradient-refines a candidate; CLI evaluation uses Burn autodiff while the demo runtimes keep a finite-difference fallback.
Policy baseline executes the raw diffusion-policy sample with no ranking or refinement. This is useful as a sanity check for what the generative policy proposes before evaluation.
Only the first action of the selected trajectory executes. The system replans from the actual state every step.

Scope

This is a clean systems-oriented Rust implementation of the GPC framework. It is useful for:

Experimenting with the GPC decomposition (policy + world model + evaluator)
Training on synthetic data and inspecting the full pipeline
Running end-to-end demos through the CLI or the interactive web demo
Exercising the crate APIs from Rust code
Inspecting ONNX models and checkpoint metadata

The official Python implementation is at han20192019/gpc_code. Use that as the primary reference for paper-aligned training and benchmark reproduction.

Workspace Layout

Crate	Responsibility
`gpc-core`	Shared config types, error handling, core traits, tensor utilities, diffusion schedules
`gpc-policy`	Diffusion policy with DDPM-style denoising network
`gpc-world`	State dynamics model (residual MLP) and reward functions
`gpc-eval`	`GpcRank` and `GpcOpt` evaluators
`gpc-train`	Dataset handling, synthetic data generation, training orchestration
`gpc-compat`	Checkpoint metadata I/O and ONNX inspection/runtime
`gpc-cli`	Command-line entrypoint, training, evaluation, and TUI demo
`gpc-demo-server`	Native HTTP server for the web demo (axum, trains on startup)
`gpc-wasm`	WebAssembly build of the engine (experimental)
`web-demo`	React + TypeScript frontend for the interactive demo

Shared dependencies: Burn 0.20.1, stable Rust 1.85, CI for check, test, clippy -D warnings, and fmt --check.

Published Packages

The crates are published to crates.io with a world-models- prefix:

Workspace crate	crates.io package
`gpc-core`	`world-models-gpc-core`
`gpc-policy`	`world-models-gpc-policy`
`gpc-world`	`world-models-gpc-world`
`gpc-eval`	`world-models-gpc-eval`
`gpc-train`	`world-models-gpc-train`
`gpc-compat`	`world-models-gpc-compat`
`gpc-cli`	`world-models-gpc-cli`

Quick Start

Requirements

Rust 1.85+ (pinned in rust-toolchain.toml)
Node.js 18+ (for the web demo only)

Build and Verify

cargo build --workspace
cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --all -- --check

Run the Web Demo

# Build the server in release mode (first time only)
cargo build --release -p gpc-demo-server

# Start the planner server (trains models in ~1s, serves on :3100)
cargo run --release -p gpc-demo-server

# In another terminal
cd web-demo && npm install && npm run dev
# Open http://localhost:5174

Run the TUI Demo

cargo run -p world-models-gpc-cli -- demo

Smaller smoke-test:

cargo run -p world-models-gpc-cli -- demo --plain --epochs 1 --episodes 4 --episode-length 8 --num-candidates 4

CLI

Command	Purpose	Notes
`demo`	Run the end-to-end synthetic pipeline	Interactive TUI by default, `--plain` for log output
`train`	Train policy, world model, or both	Uses synthetic data or `DATA_DIR/episodes.json`; accepts explicit overrides for batch size, learning rate, weight decay, clipping, warmup, log cadence, and seed; writes `train_report.json` under the output dir unless `--report-output` is set
`eval`	Evaluate saved policy/world-model checkpoints	Supports `policy`, `rank`, or `opt` against synthetic or JSON datasets; `-o/--output` writes a summary JSON artifact and `--details-output` writes detailed telemetry
`benchmark`	Compare `policy`, `rank`, and `opt` on checkpoint-backed suites	Runs deterministic synthetic closed-loop or dataset open-loop benchmarks, emits JSON, and can gate regressions against a saved baseline
`checkpoint`	Inspect, verify, and convert `.onnx`, `.bin`, `.mpk`, and `.meta.json` files	`inspect` shows parsed config and file sizes; `verify` proves checkpoints reload cleanly
`init-config`	Write a default JSON config file	Prints the generated config to stdout

CLI Examples

# Generate a default config
cargo run -p world-models-gpc-cli -- init-config --output gpc_config.json

# Train on synthetic data
cargo run -p world-models-gpc-cli -- train --synthetic --component all --epochs 20

# Override the effective training settings explicitly
cargo run -p world-models-gpc-cli -- train --synthetic --component all --epochs 20 --batch-size 128 --learning-rate 5e-4 --weight-decay 1e-6 --grad-clip-norm 0.5 --warmup-steps 250 --log-every 5 --seed 7

# Train from a dataset directory
cargo run -p world-models-gpc-cli -- train --data data --component world-model --epochs 50 --horizon 8

# Save checkpoints to a custom directory
cargo run -p world-models-gpc-cli -- train --synthetic --component all --output runs/exp-001

# Write the training report to a custom path
cargo run -p world-models-gpc-cli -- train --synthetic --component all --output runs/exp-001 --report-output runs/exp-001/reports/train-report.json

# Run checkpoint-backed evaluation on reproducible synthetic data
cargo run -p world-models-gpc-cli -- eval --checkpoint-dir runs/exp-001 --strategy policy --synthetic --episodes 8 --episode-length 24 --seed 42
cargo run -p world-models-gpc-cli -- eval --checkpoint-dir runs/exp-001 --strategy rank --synthetic --episodes 8 --episode-length 24 --seed 42 --num-candidates 64
cargo run -p world-models-gpc-cli -- eval --checkpoint-dir runs/exp-001 --strategy opt --synthetic --episodes 8 --episode-length 24 --seed 42 --opt-steps 10 --opt-learning-rate 0.01

# Evaluate checkpoints against a dataset directory or episodes.json
cargo run -p world-models-gpc-cli -- eval --checkpoint-dir runs/exp-001 --data data --strategy rank --num-candidates 64

# Save evaluation summary and detailed telemetry artifacts
cargo run -p world-models-gpc-cli -- eval --checkpoint-dir runs/exp-001 --strategy rank --synthetic --episodes 8 --episode-length 24 --seed 42 --num-candidates 64 --output runs/exp-001/eval-rank-summary.json --details-output runs/exp-001/eval-rank-details.json

# Run a multi-seed closed-loop benchmark and save JSON output
cargo run -p world-models-gpc-cli -- benchmark --checkpoint-dir runs/exp-001 --episodes 8 --episode-length 24 --seed 42 --seed 43 --seed 44 --num-candidates 64 --opt-steps 10 --opt-learning-rate 0.01 --output runs/exp-001/benchmark.json

# Benchmark dataset windows instead of synthetic closed-loop episodes
cargo run -p world-models-gpc-cli -- benchmark --checkpoint-dir runs/exp-001 --data data --strategy rank --strategy opt --num-candidates 64 --opt-steps 10 --opt-learning-rate 0.01 --output runs/exp-001/dataset-benchmark.json

# Compare a fresh benchmark run against a saved baseline and fail on regressions
cargo run -p world-models-gpc-cli -- benchmark --checkpoint-dir runs/exp-001 --episodes 8 --episode-length 24 --seed 42 --seed 43 --seed 44 --num-candidates 64 --opt-steps 10 --opt-learning-rate 0.01 --output runs/exp-001/benchmark-candidate-64.json --baseline runs/exp-001/benchmark-baseline.json --max-rollout-mse-increase 0.002 --max-terminal-distance-increase 0.01 --max-reward-drop 0.05 --max-success-rate-drop 0.02

# Run evaluator demos with random models
cargo run -p world-models-gpc-cli -- eval --demo --strategy policy
cargo run -p world-models-gpc-cli -- eval --demo --strategy rank --num-candidates 64
cargo run -p world-models-gpc-cli -- eval --demo --strategy opt --opt-steps 10

# Inspect artifacts
cargo run -p world-models-gpc-cli -- checkpoint --action inspect --path model.onnx
cargo run -p world-models-gpc-cli -- checkpoint --action inspect --path checkpoints/policy_final.bin

# Verify that a checkpoint or ONNX artifact is loadable
cargo run -p world-models-gpc-cli -- checkpoint --action verify --path checkpoints/policy_final.bin
cargo run -p world-models-gpc-cli -- checkpoint --action verify --path model.onnx

# Convert a Burn checkpoint between formats
cargo run -p world-models-gpc-cli -- checkpoint --action convert --path checkpoints/policy_final.bin
cargo run -p world-models-gpc-cli -- checkpoint --action convert --path checkpoints/world_model_final.mpk --output exported/world_model_final

Training now persists real Burn artifacts by default:

policy_final.bin plus policy_final.meta.json
policy_best.bin plus policy_best.meta.json when a validation split is used
world_model_phase1.bin plus world_model_phase1.meta.json
world_model_phase1_best.bin plus world_model_phase1_best.meta.json when a validation split is used
world_model_final.bin plus world_model_final.meta.json
world_model_best.bin plus world_model_best.meta.json when a validation split is used

Each checkpoint is reloaded and verified immediately after it is written, so failed saves surface during train instead of later during eval or benchmark.

Use --validation-split <fraction> on train to hold out episodes for validation and select the best epoch by validation loss. The checkpoint metadata stores the model kind, epoch, loss, timestamp, and the serialized config used to reconstruct the module during conversion or verification.

Within gpc-train, TrainingConfig.seed now seeds model initialization, minibatch shuffling, and policy diffusion noise/timestep sampling so repeated runs on the same backend/device are reproducible.

TrainingConfig.warmup_steps now applies a linear per-optimizer-step learning-rate warmup, and TrainingConfig.grad_clip_norm enables AdamW norm clipping during policy and world-model training when set above 0.0.

The train CLI flags above override the effective TrainingConfig before dataset loading and training start, and the resulting train_report.json records the applied values verbatim.

When benchmark is run with --baseline <previous-report.json>, the emitted JSON includes a comparison section for every shared strategy and the CLI can turn metric deltas into a hard failure via --max-rollout-mse-increase, --max-terminal-distance-increase, --max-action-mse-increase, --max-reward-drop, and --max-success-rate-drop.

Minimal Library Example

Wire up randomly initialized components and run GPC-RANK:

use burn::backend::NdArray;
use burn::prelude::*;
use gpc_core::traits::Evaluator;
use gpc_eval::GpcRankBuilder;
use gpc_policy::DiffusionPolicyConfig;
use gpc_world::reward::L2RewardFunctionConfig;
use gpc_world::world_model::StateWorldModelConfig;

type B = NdArray;

fn main() -> gpc_core::Result<()> {
    let device = <B as Backend>::Device::default();

    let policy = DiffusionPolicyConfig {
        obs_dim: 20,
        action_dim: 2,
        obs_horizon: 2,
        pred_horizon: 16,
        hidden_dim: 128,
        time_embed_dim: 64,
        num_res_blocks: 2,
    }
    .init::<B>(&device);

    let world_model = StateWorldModelConfig {
        state_dim: 20,
        action_dim: 2,
        hidden_dim: 128,
        num_layers: 2,
    }
    .init::<B>(&device);

    let reward = L2RewardFunctionConfig { state_dim: 20 }
        .init::<B>(&device)
        .with_goal(Tensor::<B, 1>::zeros([20], &device));

    let evaluator = GpcRankBuilder::new(policy, world_model, reward)
        .num_candidates(32)
        .build();

    let obs = Tensor::<B, 3>::zeros([1, 2, 20], &device);
    let state = Tensor::<B, 2>::zeros([1, 20], &device);

    let action = evaluator.select_action(&obs, &state, &device)?;
    assert_eq!(action.dims(), [1, 16, 2]);

    Ok(())
}

Configuration

The top-level config type is gpc_core::config::GpcConfig with sections for policy, world_model, training, gpc_rank, and gpc_opt. Generate a valid config:

cargo run -p world-models-gpc-cli -- init-config --output gpc_config.json

All config structs are serde-serializable. See gpc-core/src/config.rs.

Documentation

Demo Explained — Deep dive into what the web demo shows, from GPC fundamentals through every visual element
Paper — Generative Robot Policies via Predictive World Modeling
Reference Implementation — Official Python implementation

Development

cargo check --workspace --all-targets
cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --all -- --check

CI workflow: .github/workflows/ci.yml.

Limitations

benchmark supports deterministic synthetic closed-loop suites and dataset-backed open-loop windows, but not paper-specific real-world closed-loop task datasets yet.
checkpoint convert only covers Burn .bin and .mpk checkpoints for policy and world-model modules.

References

Paper: Generative Robot Policies via Predictive World Modeling
Burn: https://burn.dev
Tract: https://github.com/sonos/tract
Sister project: jepa-rs

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.agents		.agents
.claude		.claude
.codex/skills		.codex/skills
.github/workflows		.github/workflows
docs		docs
gpc-cli		gpc-cli
gpc-compat		gpc-compat
gpc-core		gpc-core
gpc-demo-server		gpc-demo-server
gpc-eval		gpc-eval
gpc-policy		gpc-policy
gpc-train		gpc-train
gpc-wasm		gpc-wasm
gpc-world		gpc-world
web-demo		web-demo
.editorconfig		.editorconfig
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
agents.md		agents.md
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gpc-rs

Demos

Interactive Web Demo

Terminal TUI Demo

Pipeline

Scope

Workspace Layout

Published Packages

Quick Start

Requirements

Build and Verify

Run the Web Demo

Run the TUI Demo

CLI

CLI Examples

Minimal Library Example

Configuration

Documentation

Development

Limitations

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gpc-rs

Demos

Interactive Web Demo

Terminal TUI Demo

Pipeline

Scope

Workspace Layout

Published Packages

Quick Start

Requirements

Build and Verify

Run the Web Demo

Run the TUI Demo

CLI

CLI Examples

Minimal Library Example

Configuration

Documentation

Development

Limitations

References

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages