🏋️‍♂️ CartPoleRL

Train and ship a PPO agent for CartPole using Stable-Baselines3. The project uses uv for fast, reproducible environments and includes ready-to-run experiment variants, TensorBoard monitoring, ONNX export, and seamless ProtoTwin integration for inference.

It’s designed to take you from a naive baseline to industry‑standard practice with structured experiments, reproducible training, model packaging, and deployment. You can train on ProtoTwin, export to ONNX, and run inference on ProtoTwin by writing simple control logic in TypeScript.

Quickstart • Structure • Training • Monitoring • ONNX • Troubleshooting • Roadmap

✨ Features

Area	Capability
Algorithms	PPO (easily extensible to A2C, DQN, etc.)
Experimentation	Versioned training entrypoints: `main-v1.py`, `main-v2.py`, ...
Monitoring	TensorBoard logs per variant: `tensorboard-v1/`, `tensorboard-v2/`
Export	ONNX conversion via `export_onnx.py`
Deployment	Ready for ProtoTwin (upload ONNX or Python policy)
Extensibility	Clean project layout – add new envs or models fast
Dev UX	Minimal commands to get started

📚 Important Links

Important

• Notes PDF: Important understandings
• Curated Resource: TLDRAW board

✅ Prerequisites

Python 3.10+
ProtoTwin Connect for training and deployment
NVIDIA GPU + CUDA-capable PyTorch build (Optional)

Note

The project will work on CPU, no problem if you do not have a GPU

⚡ Quickstart

git clone https://github.com/amugoodbad229/CartPoleRL.git
cd CartPoleRL

Install uv (one time):

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
winget install --id=astral-sh.uv -e

# Check installation
uv --version

Sync environment + dependencies:

uv sync

Activate environment:

# Linux/macOS
source .venv/bin/activate

# Windows PowerShell
.venv\Scripts\Activate.ps1

Run a training variant:

ls main-v*.py          # discover available variants
python main-v1.py      # or main-v2.py, etc.

# OPTIONAL: For custom CLI commands
python main-v1.py --num_envs 32 --initial_lr 0.001 --num_timesteps 500000

📊 Monitoring

Launch TensorBoard (choose the appropriate variant path):

python -m tensorboard.main --logdir tensorboard-v1

Or to watch all:

python -m tensorboard.main --logdir .

Tip

If nothing appears, ensure training produced events:
find tensorboard-v1 -type f -name "*tfevent*"

🧪 Training & Experiment Variants

Each main-vX.py file encapsulates a slightly different configuration:

Hyperparameters (Learning rate, gamma, entropy)
Network architecture (Default or Custom)
Logging folder (Agent Models)
Callback setup

Tip

Duplicate an existing file to create a new experiment:
cp main-v1.py main-v3.py → edit run name, log path, and hyperparameters.

Suggested Naming Convention

Variant	Purpose
`main-v0.py`	Baseline PPO
`main-v1.py`	Tuned learning rate / entropy
`main-v2.py`	Different network width
`main-v3.py`	Longer training horizon
`main-vN.py`	Custom experiment

📦 ONNX Export & Deployment

Generate an ONNX policy (after training):

python export_onnx.py

Note

If the script uses hardcoded paths, edit export_onnx.py or extend it with argparse.

ProtoTwin usage: Upload the ONNX file to ProtoTwin or deploy the Python inference.

🧱 Project Structure

.
├── main-v1.py             # Training variant 1
├── main-v2.py             # Training variant 2 (extend as needed)
├── export_onnx.py         # Convert trained model to ONNX
├── logs-v1/               # Training logs + checkpoints (variant 1)
│   └── checkpoints/
├── tensorboard-v1/        # TensorBoard event files (variant 1)
├── pyproject.toml         # Project + dependency definitions
├── uv.lock                # Locked, reproducible dependency set
└── README.md

Note

Additional variants (e.g., logs-v2/, tensorboard-v2/) appear after running those scripts.

🔧 Extending the Project

Task	How
Add a new algorithm	Replace PPO import with another SB3 algorithm
Add custom policy	Define `policy_kwargs` in the training script
Change environment	Swap `CartPole-v1` with another Gymnasium env
Add callbacks	Implement `BaseCallback` and register in training
Log extra metrics	Use custom callback + `self.logger.record()`

🧪 Evaluating a Policy

Add (or use) a snippet like:

from stable_baselines3.common.evaluation import evaluate_policy
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=10)
print(f"Mean: {mean_reward:.2f} ± {std_reward:.2f}")

🛠 Useful Git Commands

git status
git add .
git commit -m "Experiment: tuned lr and entropy"
git push origin main

Tip

Use branches for big experiments:
git checkout -b feat/entropy-sweep

🚑 Troubleshooting

Symptom	Fix
`uv: command not found`	Reinstall `uv`, restart terminal
No TensorBoard data	Confirm correct `tensorboard-vX/` path
CPU instead of GPU	Check: `python -c "import torch; print(torch.cuda.is_available())"`
ImportError (SB3)	Run `uv sync` again (env might be stale)
Permission denied on activate	On Unix: `chmod +x .venv/bin/activate` (rare)

Caution

Paths are case-sensitive. Use cd CartPoleRL, not cd cartpolerl.

🧭 Roadmap

Add evaluation script (e.g., evaluate.py)
Hyperparameter sweeps integration (Optuna / WandB)
Dockerfile for containerized deployment
Unified config system (config/ + YAML)
CI workflow (lint + format + smoke test)

🤝 Contributing

Fork the repo
Create a feature branch: git checkout -b feat/new-idea
Submit PR with: description, metrics, rationale

🙏 Acknowledgements

📄 License

MIT License © 2025 Ayman Khan

⭐ Support

If this helps you learn or prototype faster:

Star the repo
Share feedback
Open issues for improvements

Happy balancing! 🛠️🧠

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cartpole-v0.ptm		cartpole-v0.ptm
cartpole-v1.ptm		cartpole-v1.ptm
cartpole-v2.ptm		cartpole-v2.ptm
cartpole.ts		cartpole.ts
export_onnx.py		export_onnx.py
main-v0.py		main-v0.py
main-v1.py		main-v1.py
main-v2.py		main-v2.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏋️‍♂️ CartPoleRL

✨ Features

📚 Important Links

✅ Prerequisites

⚡ Quickstart

📊 Monitoring

🧪 Training & Experiment Variants

Suggested Naming Convention

📦 ONNX Export & Deployment

🧱 Project Structure

🔧 Extending the Project

🧪 Evaluating a Policy

🛠 Useful Git Commands

🚑 Troubleshooting

🧭 Roadmap

🤝 Contributing

🙏 Acknowledgements

📄 License

⭐ Support

About

Uh oh!

Languages

License

amugoodbad229/CartPoleRL

Folders and files

Latest commit

History

Repository files navigation

🏋️‍♂️ CartPoleRL

✨ Features

📚 Important Links

✅ Prerequisites

⚡ Quickstart

📊 Monitoring

🧪 Training & Experiment Variants

Suggested Naming Convention

📦 ONNX Export & Deployment

🧱 Project Structure

🔧 Extending the Project

🧪 Evaluating a Policy

🛠 Useful Git Commands

🚑 Troubleshooting

🧭 Roadmap

🤝 Contributing

🙏 Acknowledgements

📄 License

⭐ Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages