-
Notifications
You must be signed in to change notification settings - Fork 523
Description
Acme currently lacks first-class support for Gymnasium environments (the successor to OpenAI Gym). This issue proposes implementing a new wrapper (acme/wrappers/gymnasium_wrapper.py) or significantly refactoring the existing gym_wrapper.py to correctly support Gymnasium’s updated APIs, semantics, and space definitions.
This change is required for compatibility with modern RL environments and reproducible experimentation.
📌 Motivation
Gymnasium has breaking API changes compared to legacy Gym (reset/step semantics, seeding, spaces).
Many Acme users now rely on Gymnasium-based environments.
A robust wrapper enables seamless interoperability between Gymnasium → dm_env → Acme.
-
Dependency Management
Gymnasium must be imported safely (guarded import).
The wrapper should not break Acme installations where Gymnasium is not installed.
Gymnasium should be treated as an optional dependency. -
Space Conversion
Acme uses dm_env.specs, while Gymnasium uses gymnasium.spaces. -
Reset Semantics
Gymnasium reset() returns (observation, info).
Legacy Gym returned only observation.
The wrapper should:
Capture (obs, info)
Optionally store info in TimeStep.extras
Return dm_env.restart(obs) -
Step Semantics
Gymnasium step() returns:
(obs, reward, terminated, truncated, info)
This must be mapped to:
dm_env.TimeStep(step_type, reward, discount, observation)
Decision Logic:
terminated → natural termination
step_type = LAST
discount = 0.0
truncated → artificial termination (time limit)
step_type = LAST
discount = 1.0 (or configurable γ)
otherwise → mid-episode
step_type = MID
discount = 1.0 -
Seeding
Gymnasium enforces strict seeding via reset(seed=...).
Acme environments often expose a global seed setter.
The wrapper must:
Bridge Acme-style seeding to Gymnasium’s reset(seed=...)
Ensure full reproducibility
🧠 Implementation Strategy
Initialization (init)
Store wrapped environment
Convert:
observation_spec
action_spec
reward_spec
Initialize internal episode state (_reset_next_step)
Accept optional discount override for infinite-horizon tasks
Reset (reset)
Call env.reset(seed=...)
Capture (obs, info)
Set _reset_next_step = False
Return dm_env.restart(obs)
Step (step)
If _reset_next_step is True, call reset()
Call env.step(action)
Unpack (obs, reward, terminated, truncated, info)
Apply termination logic
Return dm_env.TimeStep(...)