Skip to content

Add Gymnasium Wrapper for Acme (dm_env compatible) #359

@Arsh123344423

Description

@Arsh123344423

Acme currently lacks first-class support for Gymnasium environments (the successor to OpenAI Gym). This issue proposes implementing a new wrapper (acme/wrappers/gymnasium_wrapper.py) or significantly refactoring the existing gym_wrapper.py to correctly support Gymnasium’s updated APIs, semantics, and space definitions.

This change is required for compatibility with modern RL environments and reproducible experimentation.

📌 Motivation
Gymnasium has breaking API changes compared to legacy Gym (reset/step semantics, seeding, spaces).
Many Acme users now rely on Gymnasium-based environments.
A robust wrapper enables seamless interoperability between Gymnasium → dm_env → Acme.

  1. Dependency Management
    Gymnasium must be imported safely (guarded import).
    The wrapper should not break Acme installations where Gymnasium is not installed.
    Gymnasium should be treated as an optional dependency.

  2. Space Conversion
    Acme uses dm_env.specs, while Gymnasium uses gymnasium.spaces.

  3. Reset Semantics
    Gymnasium reset() returns (observation, info).
    Legacy Gym returned only observation.
    The wrapper should:
    Capture (obs, info)
    Optionally store info in TimeStep.extras
    Return dm_env.restart(obs)

  4. Step Semantics
    Gymnasium step() returns:
    (obs, reward, terminated, truncated, info)
    This must be mapped to:
    dm_env.TimeStep(step_type, reward, discount, observation)
    Decision Logic:
    terminated → natural termination
    step_type = LAST
    discount = 0.0
    truncated → artificial termination (time limit)
    step_type = LAST
    discount = 1.0 (or configurable γ)
    otherwise → mid-episode
    step_type = MID
    discount = 1.0

  5. Seeding
    Gymnasium enforces strict seeding via reset(seed=...).
    Acme environments often expose a global seed setter.
    The wrapper must:
    Bridge Acme-style seeding to Gymnasium’s reset(seed=...)
    Ensure full reproducibility

🧠 Implementation Strategy

Initialization (init)
Store wrapped environment
Convert:
observation_spec
action_spec
reward_spec
Initialize internal episode state (_reset_next_step)
Accept optional discount override for infinite-horizon tasks

Reset (reset)
Call env.reset(seed=...)
Capture (obs, info)
Set _reset_next_step = False
Return dm_env.restart(obs)
Step (step)
If _reset_next_step is True, call reset()
Call env.step(action)
Unpack (obs, reward, terminated, truncated, info)
Apply termination logic
Return dm_env.TimeStep(...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions