-
Notifications
You must be signed in to change notification settings - Fork 154
Additional fields and type inferring for union #1834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request introduces comprehensive support for Python Union types in Datumaro's experimental type registry and dataset schema inference. It enables seamless conversion between multiple candidate types using both modern (A | B) and legacy (typing.Union) syntax, with fallback logic when conversions fail.
Key changes include:
- Union type support in the type registry with fallback behavior
- Enhanced schema inference for string annotations and Union types
- New image type conversion capabilities
- Expanded API with additional fields and converters
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit/experimental/test_type_registry.py |
New comprehensive test suite for type registry functionality |
tests/unit/experimental/test_dataset.py |
Added Union type handling tests for dataset samples |
src/datumaro/experimental/type_registry.py |
Core Union type support and image conversion functionality |
src/datumaro/experimental/legacy.py |
Minor type annotation fix |
src/datumaro/experimental/fields.py |
New annotation field classes and helper functions |
src/datumaro/experimental/dataset.py |
Enhanced schema inference and type resolution |
src/datumaro/experimental/converters.py |
New ImageTypeConverter for image format transformations |
src/datumaro/experimental/__init__.py |
Updated public API exports |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
gdlg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Albert! First round of review, I still need to review the part on the type union.
src/datumaro/experimental/dataset.py
Outdated
| DType = TypeVar("DType", bound=Sample, default=Sample) | ||
| DTargetType = TypeVar("DTargetType", bound=Sample, default=Sample) | ||
| DType = TypeVar("DType") | ||
| DTargetType = TypeVar("DTargetType") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment in the code explaining why removing them. Mention the issue with Dataset(schema) too.
src/datumaro/experimental/fields.py
Outdated
| if isinstance(value, np.ndarray): | ||
| value_list = value.tolist() | ||
| elif isinstance(value, list): | ||
| value_list = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert to Numpy instead of to a list. You can use the to_numpy function from type_registry.py
src/datumaro/experimental/fields.py
Outdated
| return {name: pl.Series(name, [value_list], dtype=pl.List(self.dtype))} | ||
|
|
||
| # Handle single integer value | ||
| elif isinstance(value, (int, np.integer)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| elif isinstance(value, (int, np.integer)): | |
| else: |
In this case, you can assume that we are working with a single label. No need to check the input type.
src/datumaro/experimental/fields.py
Outdated
| if target_type == np.ndarray or target_type is np.ndarray: | ||
| return np.array(data, dtype=np.int64) | ||
| elif target_type is list: | ||
| return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, use from_polars_data from type_registry.py
gdlg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Albert!
Summary
This pull request introduces robust support for Python
Uniontypes in the experimental Datumaro type registry and dataset schema inference. It enables seamless conversion between multiple candidate types (including bothtyping.Unionand modernA | Bsyntax), with fallback logic and comprehensive test coverage. The changes also improve image type conversion and schema inference for datasets, making the system more flexible and reliable.Type registry and conversion improvements
Uniontypes in the type registry: bothtyping.Unionand Python 3.10+A | Bsyntax are now handled, with fallback to subsequent types if the first conversion fails. This includes updated logic infrom_polars_dataand new tests for ordering, error handling, and fallback behavior. [1] [2] [3]Dataset and schema inference enhancements
Datasetto resolve string annotations to actual type objects, supporting cases wherefrom __future__ import annotationsis used, and added correct handling forUniontypes to preserve the original annotation. [1] [2]dataset.pyfor clarity and correctness, and removed unnecessary imports. [1] [2] [3]API and import improvements
Test coverage
These changes significantly improve the flexibility and reliability of type conversion and schema inference in Datumaro’s experimental pipeline.
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.