Skip to content

Commit 892705c

Browse files
committed
update README
1 parent b385d4c commit 892705c

File tree

5 files changed

+20
-0
lines changed

5 files changed

+20
-0
lines changed

mujoco_playground/_src/manipulation/tetheria_hand/README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,26 @@ To train policies for the Tetheria Hand:
7171
python learning/train_jax_ppo.py --env_name TetheriaCubeRotateZAxis
7272
```
7373

74+
Although the reward curves from different training runs may vary due to stochasticity in the learning process, they consistently **converge toward a positive reward**.
75+
The plots below show an example set of reward curves obtained from training with the **PPO algorithm**.
76+
77+
78+
Overall Reward:
79+
80+
![overall](imgs/reward_overall.png)
81+
82+
Angular Velocity Reward
83+
84+
![reward_angvel](imgs/reward_angvel.png)
85+
86+
Action-Rate Penalty:
87+
88+
![penalty_action_rate](imgs/penalty_action_rate.png)
89+
90+
Termination Penalty
91+
92+
![penalty_termination](imgs/penalty_termination.png)
93+
7494
## 3. Running a pretrained policy
7595

7696

183 KB
Loading
185 KB
Loading
177 KB
Loading
156 KB
Loading

0 commit comments

Comments
 (0)