Skip to content

Conversation

@gdlg
Copy link
Contributor

@gdlg gdlg commented Aug 6, 2025

Summary

This PR implements the conversion from the legacy to the experimental Dataset class. I will implement the conversion back to the legacy class in a separate PR.

Misc fixes:

  • Also implements __len__, __delitem__ and __iter__ in the Dataset class.
  • Fix bug when fetching ann_types() before the cache is initialised.

The conversion works in two steps: the first step analyse the existing dataset and generate the schema for the new dataset. The second step actually converts the data.

I have defined MediaConverter and AnnotationConverter base class which can be extended to support new media/annotation types. This PR implements the conversion logic but the conversion for specific media/annotation type will be implemented later.

Part of #1789

How to test

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have added the description of my changes into CHANGELOG.​
  • I have updated the documentation accordingly

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2025 Intel Corporation
#
# SPDX-License-Identifier: MIT

* Also implements __len__ and __iter__in the Dataset class.
* Fix bug when fetching ann_types() before the cache is initialised.
@gdlg gdlg requested a review from AlbertvanHouten August 6, 2025 13:36
Comment on lines 358 to 365
# Add third sample
sample3 = TestSample(
image=np.array([[[128, 64, 192]], [[96, 160, 32]]], dtype=np.uint8),
bbox=np.array([[0.9, 0.8, 0.7, 0.6]], dtype=np.float32),
image_info=ImageInfo(width=1, height=2),
)
dataset.append(sample3)
assert len(dataset) == 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this third sample seems redundant after having already tested two appends. It would make more sense to remove one here and test if the len still works properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I have also implemented __delitem__ for that.

@gdlg gdlg merged commit 561f0d2 into develop Aug 7, 2025
15 checks passed
gdlg added a commit that referenced this pull request Aug 7, 2025
The approach is similar to #1810. The conversion works in two steps: the
first step analyse the existing dataset and generate the media type,
annotation type and categories for the new dataset. The second step
actually converts the data.

I have defined BackwardMediaConverter and BackwardAnnotationConverter
base class which can be extended to support new media/annotation types.
This PR implements the conversion logic but the conversion for specific
media/annotation type will be implemented later.

Follow-up from #1810. Fixes #1789

<!-- Contributing guide:
https://github.com/open-edge-platform/datumaro/blob/develop/CONTRIBUTING.md
-->

### Summary

<!--
Resolves #111 and #222.
Depends on #1000 (for series of dependent commits).

This PR introduces this capability to make the project better in this
and that.

- Added this feature
- Removed that feature
- Fixed the problem #1234
-->

### How to test
<!-- Describe the testing procedure for reviewers, if changes are
not fully covered by unit tests or manual testing can be complicated.
-->

### Checklist
<!-- Put an 'x' in all the boxes that apply -->
- [x] I have added unit tests to cover my changes.​
- [x] I have added integration tests to cover my changes.​
- [x] I have added the description of my changes into
[CHANGELOG](https://github.com/open-edge-platform/datumaro/blob/develop/CHANGELOG.md).​
- [ ] I have updated the
[documentation](https://github.com/open-edge-platform/datumaro/tree/develop/docs)
accordingly

### License

- [ ] I submit _my code changes_ under the same [MIT
License](https://github.com/open-edge-platform/datumaro/blob/develop/LICENSE)
that covers the project.
  Feel free to contact the maintainers if that's a concern.
- [ ] I have updated the license header for each file (see an example
below).

```python
# Copyright (C) 2025 Intel Corporation
#
# SPDX-License-Identifier: MIT
```
@gdlg gdlg deleted the gppayend/legacy-conversion branch August 18, 2025 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants