Skip to content

fix: Remove --local-files-only from the README and update root/repo_id path in lerobot_dataset.py #1057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

tc-huang
Copy link
Contributor

@tc-huang tc-huang commented Apr 30, 2025

What This PR Does

This PR introduces two changes:

Change 1: Remove the Instruction for the --local-files-only Argument from README.md

  • The --local-files-only argument and its instructions were removed from README.md to align with changes in commit 3354d91 (LeRobotDataset v2.1, PR LeRobotDataset v2.1 #711), which removed --local-files-only from lerobot/scripts/visualize_dataset.py (commit diff) and lerobot/scripts/visualize_dataset_html.py (commit diff).
  • The --local-files-only argument in lerobot/scripts/visualize_dataset.py and lerobot/scripts/visualize_dataset_html.py is deprecated. Users should now use the --root and --repo-id options, as shown in the updated README.md example:
    python lerobot/scripts/visualize_dataset.py \
        --repo-id lerobot/pusht \
        --root ./my_local_data_dir \
    -    --local-files-only 1 \
        --episode-index 0
    • In the above example, the dataset is expected in ./my_local_data_dir/lerobot/pusht, as described in the LeRobotDataset class parameter docstring:
      Args:
      repo_id (str): This is the repo id that will be used to fetch the dataset. Locally, the dataset
      will be stored under root/repo_id.
      root (Path | None, optional): Local directory to use for downloading/writing files. You can also
      set the LEROBOT_HOME environment variable to point to a different location. Defaults to
      '~/.cache/huggingface/lerobot'.

Updated Files for Change 1:

  • README.md

Change 2: Update root path for dataset and metadata store locally

  • Updated the __init__ method of the LeRobotDataset and LeRobotDatasetMetadata class to support local datasets stored under root/repo_id:

    - self.root = Path(root) if root else HF_LEROBOT_HOME / repo_id
    + self.root = Path(root) / repo_id if root else HF_LEROBOT_HOME / repo_id
    - self.root = Path(root) if root is not None else HF_LEROBOT_HOME / repo_id
    + self.root = Path(root)  / repo_id if root is not None else HF_LEROBOT_HOME / repo_id
    • Previously, the __init__ method did not correctly handle local dataset paths with the root/repo_id structure, causing lerobot/scripts/visualize_dataset.py and lerobot/scripts/visualize_dataset_html.py to fail to load datasets located at root/repo_id.
  • Update LeRobotDatasetMetadata init call in LeRobotDataset class, changing self.root to root to avoid duplicate repo_id appending.

      self.meta = LeRobotDatasetMetadata(
    - self.repo_id, self.root, self.revision, force_cache_sync=force_cache_sync
    + self.repo_id, root, self.revision, force_cache_sync=force_cache_sync
      )
  • Updated the create method of the LeRobotDatasetMetadata class to support local metadata stored under root/repo_id:

    - obj.root = Path(root) if root is not None else HF_LEROBOT_HOME / repo_id
    + obj.root = Path(root) / repo_id if root is not None else HF_LEROBOT_HOME / repo_id

Updated Files for Change 2:

  • lerobot/common/datasets/lerobot_dataset.py

How It Was Tested

Preparing the pusht Dataset Locally

  1. cd lerobot repository directory.
  2. Download the pusht dataset using the same instructions in README.md to verify that visualization from the Hugging Face Hub can work:
    python lerobot/scripts/visualize_dataset.py \
        --repo-id lerobot/pusht \
        --episode-index 0
    • The dataset is stored at ~/.cache/huggingface/lerobot/lerobot/pusht by default.
  3. Move the dataset to a newly created local directory (my_local_data_dir) for testing:
    mkdir my_local_data_dir
    mv ~/.cache/huggingface/lerobot/lerobot my_local_data_dir

Test 1: Visualize Local Dataset with visualize_dataset.py

  • Verified visualization of the local dataset in my_local_data_dir using:
    python lerobot/scripts/visualize_dataset.py \
        --repo-id lerobot/pusht \
        --root my_local_data_dir \
        --episode-index 0

Test 2: Visualize Local Dataset with visualize_dataset_html.py

  • Verified visualization of the local dataset in my_local_data_dir using:
    python lerobot/scripts/visualize_dataset_html.py \
        --repo-id lerobot/pusht \
        --root my_local_data_dir \
        --episodes 0

Test 3: Train with Local Dataset

  • Tested training with the local dataset:
    python lerobot/scripts/train.py \
        --dataset.repo_id=lerobot/pusht \
        --dataset.root=my_local_data_dir \
        --policy.type=act \
        --policy.device=cuda \
        --env.type=pusht \
        --output_dir=outputs/train/act_pusht_test \
        --job_name=act_pusht_test \
        --wandb.enable=false \
        --steps=1

Test 4: Create LeRobotDatasetMetadata and LeRobotDataset instances using simple Python code

  • The following Python code was used for testing:
    from lerobot.common.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
    
    dataset_metadata = LeRobotDatasetMetadata(
       repo_id = "lerobot/pusht",
       root = "my_local_data_dir"
    )
    print(dataset_metadata)
    
    dataset = LeRobotDataset(
       repo_id = "lerobot/pusht",
       root = "my_local_data_dir"
    )
    print(dataset)

Test 5: Functional Testing with Physical Robot and Local Dataset

  • Record a dataset and save it to my_local_data_dir/lerobot_user/koch_test (e.g., using the koch arm).
    • Note: The --robot.port and --teleop.id arguments should be replaced with your actual USB port values.
    python -m lerobot.record \
        --robot.type=koch_follower \
        --robot.port=/dev/ttyUSB0 \
        --robot.id=my_awesome_follower_arm \
        --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
        --teleop.type=koch_leader \
        --teleop.port=/dev/ttyACM0 \
        --teleop.id=my_awesome_leader_arm \
        --display_data=true \
        --dataset.repo_id=lerobot_user/koch_test \
        --dataset.root=my_local_data_dir \
        --dataset.num_episodes=1 \
        --dataset.episode_time_s=10 \
        --dataset.reset_time_s=10 \
        --dataset.single_task="Recording test" \
        --dataset.push_to_hub=False
  • Replay the dataset saved at my_local_data_dir/lerobot_user/koch_test (e.g., using the koch arm).
    • Note: The --robot.port argument should be replaced with your actual USB port value.
    python -m lerobot.replay \
        --robot.type=koch_follower \
        --robot.port=/dev/ttyUSB0 \
        --dataset.repo_id=lerobot_user/koch_test \
        --dataset.root=my_local_data_dir \
        --dataset.episode=0
  • Visiualization the dataset saved at my_local_data_dir/lerobot_user/koch_test using the Rerun UI.
    python lerobot/scripts/visualize_dataset.py \
        --repo-id lerobot_user/koch_test \
        --root my_local_data_dir \
        --episode-index 0
  • Visiualization the dataset saved at my_local_data_dir/lerobot_user/koch_test using the HTML UI.
    python lerobot/scripts/visualize_dataset_html.py \
        --repo-id lerobot_user/koch_test \
        --root my_local_data_dir \
        --episodes 0
  • Train a policy using the dataset saved at my_local_data_dir/lerobot_user/koch_test.
    python lerobot/scripts/train.py \
        --dataset.repo_id=lerobot_user/koch_test \
        --dataset.root=my_local_data_dir \
        --policy.type=act \
        --policy.device=cuda \
        --output_dir=outputs/train/act_koch_test \
        --job_name=act_koch_test \
        --wandb.enable=false \
        --steps=1

How to Review & Test

  1. Review the updated lines in README.md.
  2. Follow the testing steps in the "How It Was Tested" section. (For Test 5, modify --robot.type=koch_follower and teleop.type=koch_leader if you use a robot arm other than koch.)

This PR is related to issue #963.

The `--local-files-only` argument and its related instructions were removed
from the following documentation files to reflect changes introduced in
commit 3354d91 "LeRobotDataset v2.1 (huggingface#711)", which removed the `--local-files-only`
argument from `lerobot/scripts/visualize_dataset.py` and
`lerobot/scripts/visualize_dataset_html.py`.

Updated files:
1. README.md
2. examples/7_get_started_with_real_robot.md
3. examples/10_use_so100.md
4. examples/11_use_lekiwi.md
5. examples/11_use_moss.md
6. examples/12_use_so101.md
Modified the `LeRobotDataset.__init__` method to append the `repo_id`
to the specified `root` path.

Before change:
- If `root` was None => `HF_LEROBOT_HOME / repo_id`
- If `root` was provided => `Path(root)`

After change:
- If `root` was None => `HF_LEROBOT_HOME / repo_id`
- If `root` was provided => `Path(root) / repo_id`

This change ensures that the local dataset stored in a subdirectory named after their repo_id, aligning with the LeRobotDataset docstring: "Locally, the dataset will be stored under root/repo_id."
@tc-huang tc-huang marked this pull request as draft May 2, 2025 17:26
tc-huang added 2 commits May 3, 2025 15:05
**Dataset**: Modified `LeRobotDatasetMetadata.create` to append
`repo_id` to `root` path.
- Before: `root = HF_LEROBOT_HOME / repo_id` (if None) or
  `root = Path(root)` (if provided).
- Now: `root = HF_LEROBOT_HOME / repo_id` (if None) or
 `root = Path(root) / repo_id` (if provided).

**Tests**: Updated `test_record_and_replay_and_policy` in
`tests/robots/test_control_robot.py` to remove `repo_id` from
`root` path.
- Before: `root = tmp_path / "data" / repo_id` (i.e.,
  `tmp_path/data/lerobot_test/debug`).
- Now: `root = tmp_path / "data"` (i.e., `tmp_path/data`).
Updated test cases in `tests/robots/test_control_robot.py` to remove `repo_id` or `eval_repo_id` from the `root` path used for test data storage.

**Changes**:
  1. `test_record_without_cameras`:
     - Before: `root = tmp_path / "data" / repo_id` (i.e., `tmp_path/data/lerobot_test/debug`)
     - Now: `root = tmp_path / "data"` (i.e., `tmp_path/data`)
  2. `test_record_and_replay_and_policy`:
     - Before: `eval_root = tmp_path / "data" / eval_repo_id` (i.e., `tmp_path/data/lerobot/eval_debug`)
     - Now: `eval_root = tmp_path / "data"` (i.e., `tmp_path/data`)
  3. `test_resume_record`:
     - Before: `root = tmp_path / "data" / repo_id` (i.e., `tmp_path/data/lerobot_test/debug`)
     - Now: `root = tmp_path / "data"` (i.e., `tmp_path/data`)
  4. `test_record_with_event_rerecord_episode`:
     - Before: `root = tmp_path / "data" / repo_id` (i.e., `tmp_path/data/lerobot_test/debug`)
     - Now: `root = tmp_path / "data"` (i.e., `tmp_path/data`)
  5. `test_record_with_event_exit_early`:
     - Before: `root = tmp_path / "data" / repo_id` (i.e., `tmp_path/data/lerobot_test/debug`)
     - Now: `root = tmp_path / "data"` (i.e., `tmp_path/data`)
  6. `test_record_with_event_stop_recording`:
     - Before: `root = tmp_path / "data" / repo_id` (i.e., `tmp_path/data/lerobot_test/debug`)
     - Now: `root = tmp_path / "data"` (i.e., `tmp_path/data`)
@tc-huang tc-huang changed the title fix: Remove --local-files-only from docs and append repo_id to root in LeRobotDataset.__init__ fix: Remove --local-files-only from docs and update root/repo_id path in lerobot_dataset.py May 3, 2025
tc-huang and others added 3 commits May 7, 2025 16:22
- Updated `LeRobotDatasetMetadata.__init__` to append `repo_id` to `root`:
  - Old: `self.root = HF_LEROBOT_HOME / repo_id` (if `root` is None) or
         `self.root = Path(root)` (if `root` is provided).
  - New: `root = HF_LEROBOT_HOME / repo_id` (if `root` is None) or
         `root = Path(root) / repo_id` (if `root` is provided).

- Fixed `LeRobotDatasetMetadata` init call in `LeRobotDataset.__init__`:
  - Old: `self.meta = LeRobotDatasetMetadata(...self.root,...)`.
  - New: `self.meta = LeRobotDatasetMetadata(...root,...)`.
  - Prevents duplicate `repo_id` appending in dataset path.
@tc-huang tc-huang marked this pull request as ready for review May 8, 2025 07:18
@imstevenpmwork imstevenpmwork added bug Something isn’t working correctly documentation Improvements or fixes to the project’s docs labels Jun 6, 2025
@tc-huang tc-huang marked this pull request as draft June 11, 2025 08:24
@tc-huang tc-huang force-pushed the fix/local-files-only-arg-in-md branch from e81d2dc to 0e54b2c Compare June 13, 2025 17:41
@tc-huang tc-huang changed the title fix: Remove --local-files-only from docs and update root/repo_id path in lerobot_dataset.py fix: Remove --local-files-only from the README and update root/repo_id path in lerobot_dataset.py Jun 15, 2025
@tc-huang tc-huang marked this pull request as ready for review June 15, 2025 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn’t working correctly documentation Improvements or fixes to the project’s docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants