Skip to content

KameniAlexNea/ludo-king-ai

Repository files navigation

Ludo King AI

A modern Ludo rules engine wrapped in a Gymnasium-compatible reinforcement learning (RL) environment. The project couples a differentiable feature extractor and Stable Baselines3's MaskablePPO agent to explore self-play training for Ludo King.

Highlights

  • Full Ludo simulator with movement validation, captures, blockades, and reward shaping.
  • LudoEnv Gymnasium environment exposing rich observations and mandatory action masking.
  • Custom 1D CNN feature extractor (LudoCnnExtractor) tailored to the stacked board representation.
  • Ready-to-run PPO training script that handles vectorised environments, checkpoints, and TensorBoard logging.
  • Configuration-first design (ludo_rl/ludo_king/config.py, reward.py) to tweak board, network, and reward parameters in one place.

Project Layout

ludo_rl
├─ __init__.py → loads .env so simulator/env can read opponent strategy settings
├─ ludo_env.LudoEnv (Gymnasium Env)
│  ├─ wraps ludo_king.simulator.Simulator to expose observation dict & masked Discrete(4) actions
│  ├─ handles invalid moves, turn limit, reward shaping, and rendering snapshots
│  └─ converts Game.board into a 10‑channel board tensor + dice_roll token
├─ ludo_king.simulator.Simulator
│  ├─ owns Game, tracks agent_index
│  ├─ runs opponents’ turns (respecting extra rolls)
│  └─ can be driven by environment strategies (registry) or random fallback
├─ ludo_king.game.Game
│  ├─ instantiates 2 or 4 Player objects and provides dice + rule enforcement
│  ├─ enforces entry, home column, safe squares, captures, blockades, extra turns
│  └─ builds per‑agent board tensors via Board.build_tensor
├─ ludo_king.board.Board
│  ├─ absolute↔relative mapping, safe squares and channel construction
│  └─ counts pieces per player/channel for tensor features
├─ ludo_king.player.Player
│  ├─ keeps Piece state, win detection
│  ├─ chooses moves via strategies (lazy instantiation by name)
│  └─ falls back to random legal move if requested heuristic is unknown
├─ strategy package
│  ├─ features.build_move_options turns env observation into StrategyContext
│  ├─ BaseStrategy + concrete heuristics (defensive, killer, etc.) score MoveOption
│  └─ registry.create/available expose factories to simulator & players
├─ extractor.LudoCnnExtractor / LudoTransformerExtractor
│  ├─ convert observation dict into feature vectors for MaskablePPO
│  └─ fuse CNN/Transformer encodings with per‑piece embeddings and dice token
├─ tools (arguments, scheduler, evaluate, tournaments, imitation)
│  ├─ evaluate.py — supports 2‑player (opposite seats) and 4‑player lineups
│  ├─ tournament.py — strategy league for 2 or 4 players
│  └─ llm_vs_models.py — LLM/RL/Static mixed matches for 2 or 4 players
└─ train.py
   ├─ parses CLI args, configures MaskablePPO w/ custom extractor
   └─ runs vectorized envs, callbacks (checkpoints, entropy annealing, profiler)
  • LudoEnv mediates RL interaction: builds masked actions, enforces rewards, loops until player or opponents advance, and emits 10‑channel observations.
  • Simulator orchestrates turns: applies agent move (in env), simulates opponents with heuristic strategies, and ensures extra‑turn logic.
  • Core rules live in ludo_king Game + Board + Piece/Player; reward.compute_move_rewards still produces shaped returns for PPO.
  • Strategy module supplies configurable heuristics; features.build_move_options transforms env data into StrategyContext so Player.choose can score moves consistently.
  • extractor.py houses CNN/Transformer feature pipelines that embed board channels, per‑piece context, and dice roll before feeding MaskablePPO during training (train.py).

Getting Started

Prerequisites

  • Python 3.11+
  • A virtual environment is recommended (python -m venv .venv && source .venv/bin/activate).

Installation

Install the package and dependencies in editable mode:

pip install -e .

Alternatively, install the raw dependencies:

pip install -r requirements.txt

Training the Agent

The train.py script configures MaskablePPO with the custom feature extractor and launches multi-process self-play training.

python train.py

What the script does:

  1. Creates timestamped subdirectories under training/ludo_logs/ and training/ludo_models/.
  2. Spawns SubprocVecEnv workers (half the available CPU cores) and wraps them with VecMonitor.
  3. Sets up checkpointing every 10k steps and periodic evaluation (20k step cadence).
  4. Trains for 1,000,000 timesteps, saves the initial and final policies, and performs a short interactive rollout.

TensorBoard logs end up in the run-specific training/ludo_logs/<run_id>/ directory:

tensorboard --logdir training/ludo_logs

Customisation

  • Rewards: Adjust per-event incentives in ludo_rl/ludo_king/reward.py.
  • Network: Tune convolution and MLP widths in ludo_rl/ludo_king/config.py (NetworkConfig).
  • Environment: Modify truncation length (MAX_TURNS) or add render hooks in ludo_rl/ludo_env.py.
  • Training Hyperparameters: Tweak PPO arguments and callback intervals in train.py.

Current Status & Roadmap

  • ✅ Environment, simulator, and training loop are in place.
  • ✅ Evaluation tooling (e.g., scripted benchmarks, head-to-head matches).

Development

Static analysis and linting use tox:

tox

The default configuration runs formatting checks (via Ruff/Black if installed) and unit tests once they are introduced.

License

Released under the Apache License. See LICENSE for details.

About

Simple yet effective ludo king algo for computer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages