AI-Driven Adaptive PID Tuning via Soft Actor-Critic Reinforcement Learning
A hybrid control system that augments classical cascaded PID with a Soft Actor-Critic (SAC) agent, achieving up to 71% RMSE improvement in position tracking under wind disturbances.
- Problem Statement
- Proposed Solution
- System Architecture
- Key Results
- Project Structure
- Requirements
- Quick Start
- Project Phases
- Technical Details
- Visualization
- License
Classical cascaded PID controllers achieve precise position tracking for quadcopter UAVs in calm conditions. However, fixed PID gains represent an inherent compromise between responsiveness and stability — they cannot adapt to changing environmental conditions such as:
- Sudden crosswind gusts
- Sustained sinusoidal wind patterns
- Stochastic atmospheric turbulence
This results in significant performance degradation (up to 0.41 m RMSE) when the quadcopter encounters external disturbances.
A Soft Actor-Critic (SAC) reinforcement learning agent acts as a supervisory layer that dynamically adjusts the inner PID loop's gain parameters (Kp, Ki, Kd) in real time. The agent does not replace the PID controller — it continuously tunes it based on observed tracking errors and current system state.
| Feature | Benefit |
|---|---|
| Off-policy | Sample-efficient; reuses past experience via replay buffer |
| Entropy regularisation | Naturally explores the gain space; avoids premature convergence |
| Continuous actions | Directly outputs smooth gain adjustments without discretisation artifacts |
| Twin critic architecture | Prevents Q-value overestimation, ensuring stable training |
┌─────────────┐
Setpoint ────────►│ Outer PID │───► desired φ, θ
[x, y, z] │ (50 Hz) │
└─────────────┘
│
┌──────▼──────┐ ┌────────────┐
│ Inner PID │◄───│ SAC Agent │
│ (500 Hz) │ │ (50 Hz) │
│ φ, θ, ψ │ │ ΔKp,Ki,Kd │
└──────┬──────┘ └──────▲──────┘
│ │
┌──────▼──────┐ 15-dim observation:
│ Thrust │ [pos err, att err,
│ Mixer │ vel err, curr gains]
└──────┬──────┘
│
┌──────▼──────┐ ┌─────────────┐
│ 6-DOF │◄───│ Wind │
│ Plant │ │ Disturbance │
└─────────────┘ └─────────────┘
The SAC agent observes a 15-dimensional state vector (position errors, attitude errors, velocity errors, angular rates, and current effective gains) and outputs 6 gain deltas that adjust the roll and pitch inner-loop PID parameters in real time.
| Scenario | Baseline PID RMSE | AI-Augmented PID RMSE | Improvement |
|---|---|---|---|
| Calm air | 0.002 m | 0.002 m | — |
| Step wind (5 N) | 0.24 m | 0.07 m | 71% |
| Sinusoidal wind (2 Hz, 3 N) | 0.31 m | 0.11 m | 65% |
| Stochastic turbulence | 0.41 m | 0.14 m | 66% |
The SAC-augmented controller matches baseline performance in calm conditions while dramatically reducing tracking error under all disturbance scenarios.
Quadcopter_control_system/
├── setup.m # Master entry point — loads params & creates agent
├── README.md
│
├── scripts/
│ ├── init_params.m # Physical constants, PID gains, simulation parameters
│ ├── create_sac_agent.m # SAC agent with actor/critic neural networks
│ ├── build_simulink_model.m # Programmatic Simulink model builder
│ ├── train_agent.m # RL training loop with domain randomisation
│ ├── reward_function.m # Reward logic with built-in unit tests
│ ├── evaluate_and_plot.m # Metrics computation and figure generation
│ └── visualize_flight.m # 3D animated quadcopter flight demo
│
├── models/
│ ├── quadcopter_rl_env.slx # Main Simulink model (RL environment)
│ ├── quadcopter_baseline_pid.slx # PID-only baseline model
│ └── subsystems/
│ └── plant_6dof_notes.md # 6-DOF plant modelling notes
│
├── docs/
│ ├── architecture_overview.md # Detailed system design narrative
│ └── simulink_build_guide.md # Step-by-step Simulink wiring guide
│
├── saved_agents/ # Trained agent checkpoints (.mat files)
└── results/ # Generated figures and evaluation outputs
- MATLAB R2024b or later
- Simulink
- Aerospace Blockset — 6-DOF rigid body dynamics
- Reinforcement Learning Toolbox — SAC agent, training, environment
- Control System Toolbox — PID controller blocks
- (Optional) Parallel Computing Toolbox — for faster training
% 1. Open MATLAB and navigate to the project root
cd path/to/Quadcopter_control_system
% 2. Run setup — loads physical parameters and creates the SAC agent
run('setup.m')
% 3. Build the Simulink models programmatically
run('scripts/build_simulink_model.m')
% 4. (Optional) Open the model to inspect wiring
open_system('models/quadcopter_rl_env.slx')
% 5. Train the SAC agent (~30–90 min depending on hardware)
run('scripts/train_agent.m')
% 6. Evaluate performance and generate comparison figures
run('scripts/evaluate_and_plot.m')
% 7. (Optional) Run the 3D flight visualization demo
visualize_flight()- Newton-Euler 6-DOF rigid body dynamics via Aerospace Blockset
- 250mm X-frame quadcopter: mass = 0.468 kg, arm length = 0.225 m
- Thrust/torque mixing matrix maps motor commands to body forces and torques
- Cascaded architecture: outer position loop (0.8 Hz bandwidth) → inner attitude loop (8 Hz bandwidth)
- 6 independent PID controllers for X, Y, Z, Roll, Pitch, and Yaw
- Manually tuned via step-response analysis
- Validates stable hover at z = 1 m in calm conditions
- Step wind: 5 N lateral force at t = 5 s (crosswind gust)
- Sinusoidal wind: 3 N amplitude at 2 Hz (oscillatory turbulence)
- Stochastic turbulence: random force perturbations
- Demonstrates baseline PID degradation under disturbances
- SAC agent observes 15-dimensional normalised state vector
- Outputs 6 gain deltas (ΔKp, ΔKi, ΔKd for roll and pitch)
- Effective gains:
K_eff = K_base + α · ΔKwhere α = 0.5 - Trained over 3000 episodes with domain randomisation (wind timing, magnitude, initial positions)
| Index | Signal | Normalisation |
|---|---|---|
| 1–3 | Position errors (x, y, z) | ÷ 3 m |
| 4–6 | Attitude errors (φ, θ, ψ) | ÷ 0.5 rad |
| 7–9 | Velocity errors (dx, dy, dz) | ÷ 5 m/s |
| 10–12 | Angular rate errors (dφ, dθ, dψ) | ÷ 3 rad/s |
| 13–15 | Current effective gains (Kp, Ki, Kd) | ÷ [10, 2, 5] |
| Index | Parameter | Target Loop |
|---|---|---|
| 1–3 | ΔKp, ΔKi, ΔKd | Inner PID — Roll (φ) |
| 4–6 | ΔKp, ΔKi, ΔKd | Inner PID — Pitch (θ) |
R = −(w_pos · ‖e_pos‖² + w_att · ‖e_att‖² + w_gain · ‖ΔK‖²) + bonus − crash_penalty
| Component | Weight | Purpose |
|---|---|---|
| Position cost | 1.0 | Primary tracking objective |
| Attitude cost | 0.5 | Prevent excessive tilting |
| Gain regularisation | 0.1 | Penalise large gain adjustments |
| Stability bonus | 0.3 | Reward sustained low-error hover |
| Crash penalty | 100.0 | Discourage unsafe states |
Actor (Gaussian Policy):
obs(15) → FC(256) → ReLU → FC(256) → ReLU → FC(128) → ReLU → FC(6) → tanh
Twin Critics (Q-Value):
obs(15) → FC(128) ──┐
├─ Add → ReLU → FC(256) → ReLU → FC(1)
act(6) → FC(128) ──┘
| Parameter | Value |
|---|---|
| Max episodes | 3,000 |
| Steps per episode | 500 (10 s at 50 Hz) |
| Learning rate | 3 × 10⁻⁴ |
| Replay buffer | 10⁶ transitions |
| Mini-batch size | 256 |
| Discount factor (γ) | 0.99 |
| Target smoothing (τ) | 0.005 |
| Warm-start steps | 5,000 |
| Entropy target | −6 (auto-tuned) |
The project includes a 3D animated flight visualization (visualize_flight.m) that demonstrates:
- Takeoff — Smooth ascent from ground to 1 m hover altitude
- Stable Hover — Position holding with minimal drift
- Wind Disturbance — Lateral gust impact with visible displacement
- PID + RL Compensation — Real-time gain adaptation and recovery
- Recovery — Return to stable hover position
Run it with:
visualize_flight()This project is released under the MIT License.
Built with MATLAB & Simulink · Soft Actor-Critic · Deep Reinforcement Learning