Drl ppo #487

kim-mskw · 2024-11-20T07:53:19Z

Pull Request

Description

This PR merges the implementation of mini-batch sampling. This solves the torch problem with redundant iteration over the identical tensors. Also, it implements hyperparameter tuning and action changes as a result of the discussion between @kim-mskw and @adiwied.

This now results in stable single-unit learning with the PPO

Changes Proposed

removed action clipping as it introduces non-stationarity
adjusted learning hyperparameters
implemented mini-batch sampling

Testing

Algorithm tested with example_02a local and in docker

Checklist

Not applicable yet as we only merge to branch on assume not yet main.

…orking hyper params

adiwied added 4 commits November 13, 2024 20:13

add mini batch sampling to ppo

5ad6e15

fix clamping of action distribution

8928bf7

ppo is now stable in ex2a base, added orthogonal initialization and w…

949ba73

…orking hyper params

improve hyperparams

68c16a9

kim-mskw merged commit 8b3196a into assume-framework:drl-ppo Nov 20, 2024
1 check was pending

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Drl ppo #487

Drl ppo #487

Uh oh!

kim-mskw commented Nov 20, 2024

Uh oh!

Uh oh!

Uh oh!

Drl ppo #487

Drl ppo #487

Uh oh!

Conversation

kim-mskw commented Nov 20, 2024

Pull Request

Description

Changes Proposed

Testing

Checklist

Uh oh!

Uh oh!

Uh oh!