Commit 6417998
authored
Additional fields and type inferring for union (#1834)
<!-- Contributing guide:
https://github.com/open-edge-platform/datumaro/blob/develop/CONTRIBUTING.md
-->
### Summary
This pull request introduces robust support for Python `Union` types in
the experimental Datumaro type registry and dataset schema inference. It
enables seamless conversion between multiple candidate types (including
both `typing.Union` and modern `A | B` syntax), with fallback logic and
comprehensive test coverage. The changes also improve image type
conversion and schema inference for datasets, making the system more
flexible and reliable.
### Type registry and conversion improvements
* Added full support for `Union` types in the type registry: both
`typing.Union` and Python 3.10+ `A | B` syntax are now handled, with
fallback to subsequent types if the first conversion fails. This
includes updated logic in `from_polars_data` and new tests for ordering,
error handling, and fallback behavior.
[[1]](diffhunk://#diff-e324261812079d99ca2989612441e5df1dd15dabde37fb2e5e8c0c1b639dac0dR122-R154)
[[2]](diffhunk://#diff-e324261812079d99ca2989612441e5df1dd15dabde37fb2e5e8c0c1b639dac0dR170-R269)
[[3]](diffhunk://#diff-30f23b2869128577a39c918ed25c78229a30cb96578c33728d45e5ebce740ac2R1-R162)
* Added comprehensive tests for type registry conversions, including
basic types, union types, error cases, ordering, and converter
functionality for numpy and torch tensors.
### Dataset and schema inference enhancements
* Improved schema inference in `Dataset` to resolve string annotations
to actual type objects, supporting cases where `from __future__ import
annotations` is used, and added correct handling for `Union` types to
preserve the original annotation.
[[1]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fR65-R80)
[[2]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fR94-R110)
* Updated type variable definitions and method signatures in
`dataset.py` for clarity and correctness, and removed unnecessary
imports.
[[1]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fR19-R25)
[[2]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fL105-R128)
[[3]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fL134-R157)
### API and import improvements
* Updated the experimental module’s public API to expose new converters,
dataset classes, fields, schema types, and registry functions.
### Test coverage
* Added targeted tests for union type handling in dataset samples,
ensuring both modern and legacy union syntax are supported.
These changes significantly improve the flexibility and reliability of
type conversion and schema inference in Datumaro’s experimental
pipeline.
<!--
Resolves #111 and #222.
Depends on #1000 (for series of dependent commits).
This PR introduces this capability to make the project better in this
and that.
- Added this feature
- Removed that feature
- Fixed the problem #1234
-->
### How to test
<!-- Describe the testing procedure for reviewers, if changes are
not fully covered by unit tests or manual testing can be complicated.
-->
### Checklist
<!-- Put an 'x' in all the boxes that apply -->
- [ ] I have added unit tests to cover my changes.
- [ ] I have added integration tests to cover my changes.
- [ ] I have added the description of my changes into
[CHANGELOG](https://github.com/open-edge-platform/datumaro/blob/develop/CHANGELOG.md).
- [ ] I have updated the
[documentation](https://github.com/open-edge-platform/datumaro/tree/develop/docs)
accordingly
### License
- [ ] I submit _my code changes_ under the same [MIT
License](https://github.com/open-edge-platform/datumaro/blob/develop/LICENSE)
that covers the project.
Feel free to contact the maintainers if that's a concern.
- [ ] I have updated the license header for each file (see an example
below).
```python
# Copyright (C) 2025 Intel Corporation
#
# SPDX-License-Identifier: MIT
```File tree
7 files changed
+420
-11
lines changed- src/datumaro/experimental
- tests/unit/experimental
7 files changed
+420
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| 12 | + | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
| |||
16 | 18 | | |
17 | 19 | | |
18 | 20 | | |
| 21 | + | |
19 | 22 | | |
20 | 23 | | |
21 | 24 | | |
22 | 25 | | |
23 | 26 | | |
24 | | - | |
| 27 | + | |
25 | 28 | | |
26 | 29 | | |
27 | 30 | | |
| |||
61 | 64 | | |
62 | 65 | | |
63 | 66 | | |
| 67 | + | |
64 | 68 | | |
65 | 69 | | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
66 | 82 | | |
67 | 83 | | |
68 | 84 | | |
| |||
78 | 94 | | |
79 | 95 | | |
80 | 96 | | |
81 | | - | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
82 | 103 | | |
83 | 104 | | |
84 | 105 | | |
85 | 106 | | |
86 | | - | |
87 | | - | |
| 107 | + | |
| 108 | + | |
88 | 109 | | |
89 | 110 | | |
90 | 111 | | |
| |||
102 | 123 | | |
103 | 124 | | |
104 | 125 | | |
105 | | - | |
| 126 | + | |
106 | 127 | | |
107 | 128 | | |
108 | 129 | | |
| |||
131 | 152 | | |
132 | 153 | | |
133 | 154 | | |
134 | | - | |
| 155 | + | |
135 | 156 | | |
136 | 157 | | |
137 | 158 | | |
| |||
282 | 303 | | |
283 | 304 | | |
284 | 305 | | |
285 | | - | |
286 | | - | |
| 306 | + | |
287 | 307 | | |
288 | 308 | | |
289 | 309 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
246 | 246 | | |
247 | 247 | | |
248 | 248 | | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
194 | 194 | | |
195 | 195 | | |
196 | 196 | | |
197 | | - | |
| 197 | + | |
198 | 198 | | |
199 | 199 | | |
200 | 200 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
118 | 119 | | |
119 | 120 | | |
120 | 121 | | |
| 122 | + | |
121 | 123 | | |
122 | 124 | | |
123 | 125 | | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
124 | 155 | | |
125 | 156 | | |
126 | 157 | | |
| |||
136 | 167 | | |
137 | 168 | | |
138 | 169 | | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
599 | 599 | | |
600 | 600 | | |
601 | 601 | | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
0 commit comments