Skip to content

Conversation

@gdlg
Copy link
Contributor

@gdlg gdlg commented Sep 8, 2025

This pull request introduces polygon and mask annotations. It also extends the label annotation to allow list of labels which can be used alongside polygons or bounding boxes. The PR adds conversion between polygons and masks, as well as conversion between legacy polygon datasets and new dataset class; however, I haven’t implemented this conversion yet for masks.

Polygon and Mask Annotation Support

  • Added PolygonField and MaskField classes to fields.py, enabling robust representation and conversion of polygon and mask data, including normalization, format specification, and efficient serialization to Polars dataframes.
  • Implemented PolygonToMaskConverter in converters.py, allowing conversion of polygon annotations to rasterized masks using OpenCV.

Label Handling Improvements

  • Enhanced LabelField to support both multi-label and list semantics via new is_list property, and updated serialization logic to accommodate these cases. The label_field factory now accepts is_list as a parameter. [1] [2] [3]

Converter Registration and Instantiation Refactor

  • Refactored the annotation converter registry in legacy.py to store converter classes instead of instances, and introduced logic to instantiate converters based on dataset categories using a new create_from_categories class method. [1] [2] [3]
  • Updated the ForwardBboxAnnotationConverter to use the new instantiation pattern, including conditional support for label categories and improved schema attribute handling.

Legacy Compatibility and Imports

  • Added Polygon to legacy annotation imports and updated usage to reflect new field and converter types for seamless integration with legacy datasets. [1] [2]

Checklist

  • I have added tests to cover my changes or documented any manual tests.
  • I have added the description of my changes into CHANGELOG.
  • I have updated the documentation accordingly

@codecov-commenter
Copy link

codecov-commenter commented Sep 8, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 77.34375% with 58 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/datumaro/experimental/converters.py 48.93% 24 Missing ⚠️
src/datumaro/experimental/type_registry.py 31.03% 14 Missing and 6 partials ⚠️
src/datumaro/experimental/fields.py 82.35% 9 Missing ⚠️
src/datumaro/experimental/legacy.py 96.12% 2 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

@gdlg gdlg force-pushed the gppayend/polygons-and-masks branch 2 times, most recently from 4d78d51 to 94d177e Compare September 8, 2025 10:24
Signed-off-by: Grégoire Payen de La Garanderie <[email protected]>
@gdlg gdlg force-pushed the gppayend/polygons-and-masks branch from 94d177e to fb3f606 Compare September 8, 2025 13:36
Signed-off-by: Grégoire Payen de La Garanderie <[email protected]>
@gdlg gdlg requested a review from AlbertvanHouten September 8, 2025 14:06
Signed-off-by: Grégoire Payen de La Garanderie <[email protected]>
@gdlg gdlg marked this pull request as ready for review September 9, 2025 10:58
[6 7 8 9]
]
"""
if data.dtype == "O":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define a variable for this magic string and also reuse it throughout this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replaced it with object to avoid the string. It’s equivalent and more explicit.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces comprehensive polygon and mask annotation support to the experimental dataset framework, enabling conversion between polygon, mask, and legacy annotation formats. It refactors the converter registration system to use classes instead of instances for better flexibility and category-based instantiation.

Key changes:

  • Added PolygonField and MaskField classes with their factory functions for representing polygon coordinates and segmentation masks
  • Enhanced LabelField to support list semantics via is_list parameter for better multi-label handling
  • Implemented PolygonToMaskConverter for converting polygon annotations to rasterized masks using OpenCV
  • Refactored annotation converter registry to store converter classes and instantiate them based on dataset categories

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/datumaro/experimental/fields.py Added PolygonField, MaskField, and enhanced LabelField with is_list support
src/datumaro/experimental/converters.py Added PolygonToMaskConverter for polygon-to-mask conversion
src/datumaro/experimental/legacy.py Refactored converter registry and added ForwardPolygonAnnotationConverter/BackwardPolygonAnnotationConverter
src/datumaro/experimental/type_registry.py Added polars_to_numpy_dtype utility function and improved dtype conversion
tests/unit/experimental/test_schema.py Added comprehensive tests for PolygonField functionality
tests/unit/experimental/test_legacy.py Updated tests for new converter pattern and added polygon conversion tests
tests/unit/experimental/test_converters.py Added tests for PolygonToMaskConverter
CHANGELOG.md Updated to reference this PR

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


# When using np.array, there is a corner case for the case where len(polygons) == 1 where
# Numpy creates a 2D array of objects instead of a 1D array of objects.
# We may be able to solve this in the upcoming version of Numpy with the argument ndmax.
Copy link

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions 'ndmax' which should be 'ndmin' (minimum number of dimensions).

Suggested change
# We may be able to solve this in the upcoming version of Numpy with the argument ndmax.
# We may be able to solve this in the upcoming version of Numpy with the argument ndmin (minimum number of dimensions).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not correct, I am indeed referring to ndmax, not ndmin, since we would like to limit the number of dimensions to 1. See https://numpy.org/devdocs//reference/generated/numpy.array.html

gdlg added 2 commits September 9, 2025 16:06
Signed-off-by: Grégoire Payen de La Garanderie <[email protected]>
Signed-off-by: Grégoire Payen de La Garanderie <[email protected]>
Signed-off-by: Grégoire Payen de La Garanderie <[email protected]>
@gdlg gdlg merged commit 6d4f896 into develop Sep 10, 2025
16 checks passed
@gdlg gdlg deleted the gppayend/polygons-and-masks branch September 25, 2025 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants