fix: enforce nullable vector compact data invariants#49924
Conversation
|
@marcelo-cjl Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco. |
|
[ci-v2-notice] To rerun ci-v2 checks, comment with:
If you have any questions or requests, please contact @zhikunyao. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #49924 +/- ##
===========================================
- Coverage 84.54% 78.58% -5.96%
===========================================
Files 668 2219 +1551
Lines 113722 385930 +272208
===========================================
+ Hits 96145 303282 +207137
- Misses 17525 73402 +55877
- Partials 52 9246 +9194
🚀 New features to boost your workflow:
|
eaccc20 to
db48a91
Compare
❌ CI Loop Results
|
| Stage | Result | Duration | Tests |
|---|---|---|---|
| ✅ Build | SUCCESS | 11.9min | - |
| ❌ Code-Check | FAILURE | 1.7min | - |
| ❌ UT-CPP-Cov | FAILURE | 44.5min | 6046 run, 1 failed |
Total: 59min | Pipeline | Artifacts
Diff Coverage: CPP 79.0% (529 hit, 141 miss, 670 measurable lines)
Total Patch Coverage: 79.0% (529/670 measurable lines)
Failed Test Logs:
- UT-CPP-Cov: view log
db48a91 to
cbaf5b7
Compare
❌ CI Loop Results
|
| Stage | Result | Duration | Tests |
|---|---|---|---|
| ✅ Build | SUCCESS | 9.0min | - |
| ✅ Code-Check | SUCCESS | 7.6min | - |
| ❌ UT-GO | FAILURE | 23.1min | 1012 passed |
| ✅ UT-Integration | SUCCESS | 23.8min | 46 passed |
| ✅ UT-CPP-Cov | SUCCESS | 38.1min | 7731 passed |
Total: 66min | Pipeline | Artifacts
Diff Coverage: CPP 89.7% (594 hit, 68 miss, 662 measurable lines)
Total Patch Coverage: 89.7% (594/662 measurable lines)
Failed Test Logs:
- UT-GO: view log
❌ CI Loop Results
|
| Stage | Result | Duration | Tests |
|---|---|---|---|
| ✅ Build | SUCCESS | 11.6min | - |
| ✅ Code-Check | SUCCESS | 7.2min | - |
| ❌ UT-GO | FAILURE | 25.4min | 1012 passed |
| ✅ UT-Integration | SUCCESS | 24.0min | 46 passed |
| ❌ UT-CPP-Cov | FAILURE | 44.4min | 7733 run, 2 failed |
Total: 71min | Pipeline | Artifacts
Diff Coverage: CPP 87.2% (699 hit, 103 miss, 802 measurable lines)
Total Patch Coverage: 87.2% (699/802 measurable lines)
Failed Test Logs:
3df125c to
7c07e1e
Compare
7c07e1e to
e2a85e0
Compare
❌ CI Loop Results
|
| Stage | Result | Duration | Tests |
|---|---|---|---|
| ✅ Build | SUCCESS | 9.8min | - |
| ❌ Code-Check | FAILURE | 1.7min | - |
| ✅ UT-CPP-Cov | SUCCESS | 35.6min | 7735 passed |
Total: 51min | Pipeline | Artifacts
Diff Coverage: CPP 88.4% (860 hit, 113 miss, 973 measurable lines)
Total Patch Coverage: 88.4% (860/973 measurable lines)
| return (count + 7) / 8; | ||
| } | ||
|
|
||
| inline size_t |
There was a problem hiding this comment.
The valid-data count is serialized as size_t, which varies between 32-bit and 64-bit platforms. Since this is persisted index metadata, use uint64_t for a stable wire format. Additionally, validate both the count blob size (count_ptr->size == sizeof(uint64_t)) and the bitmap blob size (data_ptr->size >= (count+7)/8) before memcpy/bitmap reads to prevent out-of-bounds access on corrupt or truncated index data.
There was a problem hiding this comment.
Done. Count wire format is uint64_t now. Load also validates the count blob size and bitmap blob size before reading.
| } | ||
|
|
||
| valid_data.found = true; | ||
| local_chunk_manager->Read( |
There was a problem hiding this comment.
Same size_t persistence issue as in VectorIndexValidDataUtils.h. The disk index load and build paths serialize/deserialize the valid-data count as size_t. This must match the fixed uint64_t type across all persisted metadata paths. The VectorDiskAnnIndex::Load all-null branch that skips Knowhere deserialization also lacks a direct round-trip test (see separate testing issue).
There was a problem hiding this comment.
Done. Disk valid-data uses the same uint64_t wire count and file-size checks. Added all-null disk build/load coverage as well.
|
|
||
| for _, field := range schema.GetFields() { | ||
| if field.GetDataType() != schemapb.DataType_Array && field.GetDataType() != schemapb.DataType_ArrayOfVector { | ||
| msg := fmt.Sprintf("fields in StructArrayField can only be array or array of vector, but field %s is %s", field.Name, field.DataType.String()) |
There was a problem hiding this comment.
This PR architecturally rejects nullable ArrayOfVector at schema validation time (checkStructArrayFieldSchema). If this validation runs during collection metadata replay at rootcoord startup (via etcd), existing collections with nullable ArrayOfVector schemas will fail to load — a catastrophic upgrade failure. If it doesn't run on replay, runtime queries against those collections will hit new error branches. The PR author must confirm: does metadata loaded from etcd at rootcoord boot get revalidated through this function? Either answer has consequences that need explicit migration/deprecation handling.
There was a problem hiding this comment.
Confirmed. RootCoord metadata boot/replay does not call checkStructArrayFieldSchema; it is used on create collection and alter add struct field. Existing metadata still loads. New schemas/import/proxy validation reject nullable ArrayOfVector. Added this to the PR notes.
| if err := ValidateFieldsInStruct(subField, schema); err != nil { | ||
| return err | ||
| } | ||
| if subField.GetDataType() == schemapb.DataType_ArrayOfVector && (structArrayField.GetNullable() || subField.GetNullable()) { |
There was a problem hiding this comment.
Proxy validation now also rejects nullable ArrayOfVector. Combined with rootcoord rejection, this closes off the type at both CreateCollection and query time. The PR deletes prior proxy tests for nullable struct/vector cases — this is evidence of a product behavior change, not just test cleanup. The deletion should be justified in the PR description.
There was a problem hiding this comment.
Agree. This is intentional: ArrayOfVector has no nullable compact-storage contract in this PR. The old nullable cases were invalid under that contract; explicit rejection coverage and PR notes were added.
Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
b9463bf to
af862cb
Compare
✅ CI Loop Results
|
| Stage | Result | Duration | Tests |
|---|---|---|---|
| ✅ Build | SUCCESS | 9.6min | - |
| ✅ Code-Check | SUCCESS | 7.5min | - |
| ✅ UT-GO | SUCCESS | 22.6min | 1012 passed |
| ✅ UT-Integration | SUCCESS | 23.8min | 46 passed |
| ✅ UT-CPP-Cov | SUCCESS | 37.7min | 7744 passed |
Total: 68min | Pipeline | Artifacts
Overall Coverage: 70.7%
Diff Coverage: CPP 89.9% (927 hit, 104 miss, 1031 measurable lines) | Go 85.0% (781 hit, 138 miss, 919 measurable lines)
Total Patch Coverage: 87.6% (1708/1950 measurable lines)
✅ CI Loop Results
|
| Stage | Result | Duration | Tests |
|---|---|---|---|
| ✅ Build | SUCCESS | 9.9min | - |
| ✅ Code-Check | SUCCESS | 6.9min | - |
| ✅ UT-GO | SUCCESS | 22.3min | 1012 passed |
| ✅ UT-Integration | SUCCESS | 23.4min | 46 passed |
| ✅ UT-CPP-Cov | SUCCESS | 37.7min | 7744 passed |
Total: 67min | Pipeline | Artifacts
Overall Coverage: 70.7%
Diff Coverage: CPP 89.9% (927 hit, 104 miss, 1031 measurable lines) | Go 85.0% (781 hit, 138 miss, 919 measurable lines)
Total Patch Coverage: 87.6% (1708/1950 measurable lines)
✅ CI Loop Results
|
| Stage | Result | Duration | Tests |
|---|---|---|---|
| ✅ Build | SUCCESS | 13.3min | - |
| ✅ Code-Check | SUCCESS | 6.8min | - |
| ✅ UT-GO | SUCCESS | 22.7min | 1012 passed |
| ✅ UT-Integration | SUCCESS | 24.5min | 46 passed |
| ✅ UT-CPP-Cov | SUCCESS | 53.5min | 7744 passed |
Total: 72min | Pipeline | Artifacts
Overall Coverage: 67.3%
Diff Coverage: CPP 89.9% (927 hit, 104 miss, 1031 measurable lines) | Go 75.0% (778 hit, 259 miss, 1037 measurable lines)
Total Patch Coverage: 82.4% (1705/2068 measurable lines)
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liliu-z, marcelo-cjl The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
## Summary - Cherrypick nullable vector compact-storage invariant fixes from #49924 to 2.6. - Enforce the compact nullable vector contract: `ValidData` tracks logical rows, while vector payloads contain only valid physical rows. - Cover the affected storage, import, proxy, index build, and segcore paths with regression tests. pr: #49924 issue: #49881 ## Test plan - [x] `git diff --check HEAD` - [x] `cd pkg && go test ./util/funcutil ./util/typeutil ./mq/msgstream` targeted nullable compact vector tests Local full build/static checks were not completed in this workspace: - `make build-cpp` was blocked by generated v3 proto imports and an existing CMake generator-cache mismatch. - `make static-check` was blocked by stale/generated canalyzer C header mismatch. Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
- Enforce the nullable vector compact-storage contract across query, storage, import, segcore, and index build paths. - Keep `ValidData` as the logical row source and only read/write physical vector payload for valid rows. - Preserve master nullable `ArrayOfVector` behavior: it is supported as row-dense vector-array data with null-row placeholders, but it is intentionally excluded from compact nullable-vector helpers. issue: milvus-io#49881 issue: milvus-io#49974 issue: milvus-io#49992 - Add shared nullable vector helpers for supported compact vector types and logical-to-physical compact row mapping. - Fix proxy search result reorder/order_by when output fields include nullable compact vectors. - Fix queryutil row count, row-size accounting, slice/order/merge paths for nullable compact vector payloads. - Fix storage payload reader/writer, row serde, record builder metadata, data sorter, and insert-data merge for nullable compact vectors. - Fix parquet sparse vector import and numpy nullable dense vector `ValidData` generation. - Fix C++ growing flush, external take output, result merge, and bitset transform for nullable compact vector rows. - Skip 0-row Knowhere build for all-null nullable vectors while preserving valid-data-only index metadata. - Keep nullable `ArrayOfVector` on master on its existing row-dense path; only V1 storage rejects nullable `ArrayOfVector`. - Add regression coverage in Go unit tests, C++ gtests, and go-client nullable vector e2e scenarios. - Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no longer rejects it at schema, proxy, JSON/CSV/Parquet import, or add-field validation. `ArrayOfVector` remains excluded from `IsSupportedNullableVectorType` because that helper means compact nullable vector payloads; `ArrayOfVector` uses row-dense data with empty placeholder rows for null logical rows. - Review 10 - nullable sparse vector paths were audited around the compact layout: `ValidData` is logical-row sized, while sparse `Contents` only stores valid physical rows. Reader/writer/serde/import/query reorder paths derive logical-to-physical offsets from `ValidData` instead of indexing sparse contents by logical row. - Storage V1 note - nullable `ArrayOfVector` is still rejected only in V1 storage-format paths. That does not remove master V2 nullable `ArrayOfVector` support. - [x] `git diff --check HEAD` - [x] `make static-check` - [x] `cmake --build cmake_build --target all_tests -j8` - [x] C++ `all_tests` targeted nullable vector / nullable VectorArray cases - [x] `go test -count=1 ./internal/util/queryutil` - [x] `cd pkg && go test -count=1 ./util/typeutil` - [x] `cd pkg && go test -count=1 ./util/funcutil` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/util/importutilv2/parquet` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/proxy -run 'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/datanode/importv2` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/storage -run '<nullable-vector targeted tests>'` Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com> (cherry picked from commit c4154af) Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
- Enforce the nullable vector compact-storage contract across query, storage, import, segcore, and index build paths. - Keep `ValidData` as the logical row source and only read/write physical vector payload for valid rows. - Preserve master nullable `ArrayOfVector` behavior: it is supported as row-dense vector-array data with null-row placeholders, but it is intentionally excluded from compact nullable-vector helpers. issue: milvus-io#49881 issue: milvus-io#49974 issue: milvus-io#49992 - Add shared nullable vector helpers for supported compact vector types and logical-to-physical compact row mapping. - Fix proxy search result reorder/order_by when output fields include nullable compact vectors. - Fix queryutil row count, row-size accounting, slice/order/merge paths for nullable compact vector payloads. - Fix storage payload reader/writer, row serde, record builder metadata, data sorter, and insert-data merge for nullable compact vectors. - Fix parquet sparse vector import and numpy nullable dense vector `ValidData` generation. - Fix C++ growing flush, external take output, result merge, and bitset transform for nullable compact vector rows. - Skip 0-row Knowhere build for all-null nullable vectors while preserving valid-data-only index metadata. - Keep nullable `ArrayOfVector` on master on its existing row-dense path; only V1 storage rejects nullable `ArrayOfVector`. - Add regression coverage in Go unit tests, C++ gtests, and go-client nullable vector e2e scenarios. - Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no longer rejects it at schema, proxy, JSON/CSV/Parquet import, or add-field validation. `ArrayOfVector` remains excluded from `IsSupportedNullableVectorType` because that helper means compact nullable vector payloads; `ArrayOfVector` uses row-dense data with empty placeholder rows for null logical rows. - Review 10 - nullable sparse vector paths were audited around the compact layout: `ValidData` is logical-row sized, while sparse `Contents` only stores valid physical rows. Reader/writer/serde/import/query reorder paths derive logical-to-physical offsets from `ValidData` instead of indexing sparse contents by logical row. - Storage V1 note - nullable `ArrayOfVector` is still rejected only in V1 storage-format paths. That does not remove master V2 nullable `ArrayOfVector` support. - [x] `git diff --check HEAD` - [x] `make static-check` - [x] `cmake --build cmake_build --target all_tests -j8` - [x] C++ `all_tests` targeted nullable vector / nullable VectorArray cases - [x] `go test -count=1 ./internal/util/queryutil` - [x] `cd pkg && go test -count=1 ./util/typeutil` - [x] `cd pkg && go test -count=1 ./util/funcutil` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/util/importutilv2/parquet` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/proxy -run 'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/datanode/importv2` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/storage -run '<nullable-vector targeted tests>'` Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com> (cherry picked from commit c4154af) Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
## Summary - Enforce the nullable vector compact-storage contract across query, storage, import, segcore, and index build paths. - Keep `ValidData` as the logical row source and only read/write physical vector payload for valid rows. - Preserve master nullable `ArrayOfVector` behavior: it is supported as row-dense vector-array data with null-row placeholders, but it is intentionally excluded from compact nullable-vector helpers. issue: milvus-io#49881 issue: milvus-io#49974 issue: milvus-io#49992 ## What changed - Add shared nullable vector helpers for supported compact vector types and logical-to-physical compact row mapping. - Fix proxy search result reorder/order_by when output fields include nullable compact vectors. - Fix queryutil row count, row-size accounting, slice/order/merge paths for nullable compact vector payloads. - Fix storage payload reader/writer, row serde, record builder metadata, data sorter, and insert-data merge for nullable compact vectors. - Fix parquet sparse vector import and numpy nullable dense vector `ValidData` generation. - Fix C++ growing flush, external take output, result merge, and bitset transform for nullable compact vector rows. - Skip 0-row Knowhere build for all-null nullable vectors while preserving valid-data-only index metadata. - Keep nullable `ArrayOfVector` on master on its existing row-dense path; only V1 storage rejects nullable `ArrayOfVector`. - Add regression coverage in Go unit tests, C++ gtests, and go-client nullable vector e2e scenarios. ## Review notes - Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no longer rejects it at schema, proxy, JSON/CSV/Parquet import, or add-field validation. `ArrayOfVector` remains excluded from `IsSupportedNullableVectorType` because that helper means compact nullable vector payloads; `ArrayOfVector` uses row-dense data with empty placeholder rows for null logical rows. - Review 10 - nullable sparse vector paths were audited around the compact layout: `ValidData` is logical-row sized, while sparse `Contents` only stores valid physical rows. Reader/writer/serde/import/query reorder paths derive logical-to-physical offsets from `ValidData` instead of indexing sparse contents by logical row. - Storage V1 note - nullable `ArrayOfVector` is still rejected only in V1 storage-format paths. That does not remove master V2 nullable `ArrayOfVector` support. ## Test plan - [x] `git diff --check HEAD` - [x] `make static-check` - [x] `cmake --build cmake_build --target all_tests -j8` - [x] C++ `all_tests` targeted nullable vector / nullable VectorArray cases - [x] `go test -count=1 ./internal/util/queryutil` - [x] `cd pkg && go test -count=1 ./util/typeutil` - [x] `cd pkg && go test -count=1 ./util/funcutil` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/util/importutilv2/parquet` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/proxy -run 'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/datanode/importv2` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/storage -run '<nullable-vector targeted tests>'` Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com> Signed-off-by: chyezh <chyezh@outlook.com>
- Enforce the nullable vector compact-storage contract across query, storage, import, segcore, and index build paths. - Keep `ValidData` as the logical row source and only read/write physical vector payload for valid rows. - Preserve master nullable `ArrayOfVector` behavior: it is supported as row-dense vector-array data with null-row placeholders, but it is intentionally excluded from compact nullable-vector helpers. issue: milvus-io#49881 issue: milvus-io#49974 issue: milvus-io#49992 - Add shared nullable vector helpers for supported compact vector types and logical-to-physical compact row mapping. - Fix proxy search result reorder/order_by when output fields include nullable compact vectors. - Fix queryutil row count, row-size accounting, slice/order/merge paths for nullable compact vector payloads. - Fix storage payload reader/writer, row serde, record builder metadata, data sorter, and insert-data merge for nullable compact vectors. - Fix parquet sparse vector import and numpy nullable dense vector `ValidData` generation. - Fix C++ growing flush, external take output, result merge, and bitset transform for nullable compact vector rows. - Skip 0-row Knowhere build for all-null nullable vectors while preserving valid-data-only index metadata. - Keep nullable `ArrayOfVector` on master on its existing row-dense path; only V1 storage rejects nullable `ArrayOfVector`. - Add regression coverage in Go unit tests, C++ gtests, and go-client nullable vector e2e scenarios. - Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no longer rejects it at schema, proxy, JSON/CSV/Parquet import, or add-field validation. `ArrayOfVector` remains excluded from `IsSupportedNullableVectorType` because that helper means compact nullable vector payloads; `ArrayOfVector` uses row-dense data with empty placeholder rows for null logical rows. - Review 10 - nullable sparse vector paths were audited around the compact layout: `ValidData` is logical-row sized, while sparse `Contents` only stores valid physical rows. Reader/writer/serde/import/query reorder paths derive logical-to-physical offsets from `ValidData` instead of indexing sparse contents by logical row. - Storage V1 note - nullable `ArrayOfVector` is still rejected only in V1 storage-format paths. That does not remove master V2 nullable `ArrayOfVector` support. - [x] `git diff --check HEAD` - [x] `make static-check` - [x] `cmake --build cmake_build --target all_tests -j8` - [x] C++ `all_tests` targeted nullable vector / nullable VectorArray cases - [x] `go test -count=1 ./internal/util/queryutil` - [x] `cd pkg && go test -count=1 ./util/typeutil` - [x] `cd pkg && go test -count=1 ./util/funcutil` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/util/importutilv2/parquet` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/proxy -run 'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/datanode/importv2` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/storage -run '<nullable-vector targeted tests>'` Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com> (cherry picked from commit c4154af) Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
- Enforce the nullable vector compact-storage contract across query, storage, import, segcore, and index build paths. - Keep `ValidData` as the logical row source and only read/write physical vector payload for valid rows. - Preserve master nullable `ArrayOfVector` behavior: it is supported as row-dense vector-array data with null-row placeholders, but it is intentionally excluded from compact nullable-vector helpers. issue: milvus-io#49881 issue: milvus-io#49974 issue: milvus-io#49992 - Add shared nullable vector helpers for supported compact vector types and logical-to-physical compact row mapping. - Fix proxy search result reorder/order_by when output fields include nullable compact vectors. - Fix queryutil row count, row-size accounting, slice/order/merge paths for nullable compact vector payloads. - Fix storage payload reader/writer, row serde, record builder metadata, data sorter, and insert-data merge for nullable compact vectors. - Fix parquet sparse vector import and numpy nullable dense vector `ValidData` generation. - Fix C++ growing flush, external take output, result merge, and bitset transform for nullable compact vector rows. - Skip 0-row Knowhere build for all-null nullable vectors while preserving valid-data-only index metadata. - Keep nullable `ArrayOfVector` on master on its existing row-dense path; only V1 storage rejects nullable `ArrayOfVector`. - Add regression coverage in Go unit tests, C++ gtests, and go-client nullable vector e2e scenarios. - Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no longer rejects it at schema, proxy, JSON/CSV/Parquet import, or add-field validation. `ArrayOfVector` remains excluded from `IsSupportedNullableVectorType` because that helper means compact nullable vector payloads; `ArrayOfVector` uses row-dense data with empty placeholder rows for null logical rows. - Review 10 - nullable sparse vector paths were audited around the compact layout: `ValidData` is logical-row sized, while sparse `Contents` only stores valid physical rows. Reader/writer/serde/import/query reorder paths derive logical-to-physical offsets from `ValidData` instead of indexing sparse contents by logical row. - Storage V1 note - nullable `ArrayOfVector` is still rejected only in V1 storage-format paths. That does not remove master V2 nullable `ArrayOfVector` support. - [x] `git diff --check HEAD` - [x] `make static-check` - [x] `cmake --build cmake_build --target all_tests -j8` - [x] C++ `all_tests` targeted nullable vector / nullable VectorArray cases - [x] `go test -count=1 ./internal/util/queryutil` - [x] `cd pkg && go test -count=1 ./util/typeutil` - [x] `cd pkg && go test -count=1 ./util/funcutil` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/util/importutilv2/parquet` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/proxy -run 'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/datanode/importv2` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/storage -run '<nullable-vector targeted tests>'` Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com> (cherry picked from commit c4154af) Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
## Summary - Enforce the nullable vector compact-storage contract across query, storage, import, segcore, and index build paths. - Keep `ValidData` as the logical row source and only read/write physical vector payload for valid rows. - Preserve master nullable `ArrayOfVector` behavior: it is supported as row-dense vector-array data with null-row placeholders, but it is intentionally excluded from compact nullable-vector helpers. issue: milvus-io#49881 issue: milvus-io#49974 issue: milvus-io#49992 ## What changed - Add shared nullable vector helpers for supported compact vector types and logical-to-physical compact row mapping. - Fix proxy search result reorder/order_by when output fields include nullable compact vectors. - Fix queryutil row count, row-size accounting, slice/order/merge paths for nullable compact vector payloads. - Fix storage payload reader/writer, row serde, record builder metadata, data sorter, and insert-data merge for nullable compact vectors. - Fix parquet sparse vector import and numpy nullable dense vector `ValidData` generation. - Fix C++ growing flush, external take output, result merge, and bitset transform for nullable compact vector rows. - Skip 0-row Knowhere build for all-null nullable vectors while preserving valid-data-only index metadata. - Keep nullable `ArrayOfVector` on master on its existing row-dense path; only V1 storage rejects nullable `ArrayOfVector`. - Add regression coverage in Go unit tests, C++ gtests, and go-client nullable vector e2e scenarios. ## Review notes - Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no longer rejects it at schema, proxy, JSON/CSV/Parquet import, or add-field validation. `ArrayOfVector` remains excluded from `IsSupportedNullableVectorType` because that helper means compact nullable vector payloads; `ArrayOfVector` uses row-dense data with empty placeholder rows for null logical rows. - Review 10 - nullable sparse vector paths were audited around the compact layout: `ValidData` is logical-row sized, while sparse `Contents` only stores valid physical rows. Reader/writer/serde/import/query reorder paths derive logical-to-physical offsets from `ValidData` instead of indexing sparse contents by logical row. - Storage V1 note - nullable `ArrayOfVector` is still rejected only in V1 storage-format paths. That does not remove master V2 nullable `ArrayOfVector` support. ## Test plan - [x] `git diff --check HEAD` - [x] `make static-check` - [x] `cmake --build cmake_build --target all_tests -j8` - [x] C++ `all_tests` targeted nullable vector / nullable VectorArray cases - [x] `go test -count=1 ./internal/util/queryutil` - [x] `cd pkg && go test -count=1 ./util/typeutil` - [x] `cd pkg && go test -count=1 ./util/funcutil` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/util/importutilv2/parquet` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/proxy -run 'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/datanode/importv2` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/storage -run '<nullable-vector targeted tests>'` Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
## Summary - Enforce the nullable vector compact-storage contract across query, storage, import, segcore, and index build paths. - Keep `ValidData` as the logical row source and only read/write physical vector payload for valid rows. - Preserve master nullable `ArrayOfVector` behavior: it is supported as row-dense vector-array data with null-row placeholders, but it is intentionally excluded from compact nullable-vector helpers. issue: milvus-io#49881 issue: milvus-io#49974 issue: milvus-io#49992 ## What changed - Add shared nullable vector helpers for supported compact vector types and logical-to-physical compact row mapping. - Fix proxy search result reorder/order_by when output fields include nullable compact vectors. - Fix queryutil row count, row-size accounting, slice/order/merge paths for nullable compact vector payloads. - Fix storage payload reader/writer, row serde, record builder metadata, data sorter, and insert-data merge for nullable compact vectors. - Fix parquet sparse vector import and numpy nullable dense vector `ValidData` generation. - Fix C++ growing flush, external take output, result merge, and bitset transform for nullable compact vector rows. - Skip 0-row Knowhere build for all-null nullable vectors while preserving valid-data-only index metadata. - Keep nullable `ArrayOfVector` on master on its existing row-dense path; only V1 storage rejects nullable `ArrayOfVector`. - Add regression coverage in Go unit tests, C++ gtests, and go-client nullable vector e2e scenarios. ## Review notes - Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no longer rejects it at schema, proxy, JSON/CSV/Parquet import, or add-field validation. `ArrayOfVector` remains excluded from `IsSupportedNullableVectorType` because that helper means compact nullable vector payloads; `ArrayOfVector` uses row-dense data with empty placeholder rows for null logical rows. - Review 10 - nullable sparse vector paths were audited around the compact layout: `ValidData` is logical-row sized, while sparse `Contents` only stores valid physical rows. Reader/writer/serde/import/query reorder paths derive logical-to-physical offsets from `ValidData` instead of indexing sparse contents by logical row. - Storage V1 note - nullable `ArrayOfVector` is still rejected only in V1 storage-format paths. That does not remove master V2 nullable `ArrayOfVector` support. ## Test plan - [x] `git diff --check HEAD` - [x] `make static-check` - [x] `cmake --build cmake_build --target all_tests -j8` - [x] C++ `all_tests` targeted nullable vector / nullable VectorArray cases - [x] `go test -count=1 ./internal/util/queryutil` - [x] `cd pkg && go test -count=1 ./util/typeutil` - [x] `cd pkg && go test -count=1 ./util/funcutil` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/util/importutilv2/parquet` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/proxy -run 'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/datanode/importv2` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/storage -run '<nullable-vector targeted tests>'` Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
## Summary - Enforce the nullable vector compact-storage contract across query, storage, import, segcore, and index build paths. - Keep `ValidData` as the logical row source and only read/write physical vector payload for valid rows. - Preserve master nullable `ArrayOfVector` behavior: it is supported as row-dense vector-array data with null-row placeholders, but it is intentionally excluded from compact nullable-vector helpers. issue: milvus-io#49881 issue: milvus-io#49974 issue: milvus-io#49992 ## What changed - Add shared nullable vector helpers for supported compact vector types and logical-to-physical compact row mapping. - Fix proxy search result reorder/order_by when output fields include nullable compact vectors. - Fix queryutil row count, row-size accounting, slice/order/merge paths for nullable compact vector payloads. - Fix storage payload reader/writer, row serde, record builder metadata, data sorter, and insert-data merge for nullable compact vectors. - Fix parquet sparse vector import and numpy nullable dense vector `ValidData` generation. - Fix C++ growing flush, external take output, result merge, and bitset transform for nullable compact vector rows. - Skip 0-row Knowhere build for all-null nullable vectors while preserving valid-data-only index metadata. - Keep nullable `ArrayOfVector` on master on its existing row-dense path; only V1 storage rejects nullable `ArrayOfVector`. - Add regression coverage in Go unit tests, C++ gtests, and go-client nullable vector e2e scenarios. ## Review notes - Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no longer rejects it at schema, proxy, JSON/CSV/Parquet import, or add-field validation. `ArrayOfVector` remains excluded from `IsSupportedNullableVectorType` because that helper means compact nullable vector payloads; `ArrayOfVector` uses row-dense data with empty placeholder rows for null logical rows. - Review 10 - nullable sparse vector paths were audited around the compact layout: `ValidData` is logical-row sized, while sparse `Contents` only stores valid physical rows. Reader/writer/serde/import/query reorder paths derive logical-to-physical offsets from `ValidData` instead of indexing sparse contents by logical row. - Storage V1 note - nullable `ArrayOfVector` is still rejected only in V1 storage-format paths. That does not remove master V2 nullable `ArrayOfVector` support. ## Test plan - [x] `git diff --check HEAD` - [x] `make static-check` - [x] `cmake --build cmake_build --target all_tests -j8` - [x] C++ `all_tests` targeted nullable vector / nullable VectorArray cases - [x] `go test -count=1 ./internal/util/queryutil` - [x] `cd pkg && go test -count=1 ./util/typeutil` - [x] `cd pkg && go test -count=1 ./util/funcutil` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/util/importutilv2/parquet` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/proxy -run 'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/datanode/importv2` - [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/storage -run '<nullable-vector targeted tests>'` Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com> Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
issue: milvus-io#45881 pr: milvus-io#49924 The 3.0 external take path normalizes dense byte-vector fields into Arrow binary arrays before converting them back to DataArray. ArrowToDataArray only handled FloatVector, so BinaryVector, Float16Vector, BFloat16Vector, and Int8Vector outputs lost their payload when callers took the non-fallback path. This backport adds the dense byte-vector conversion cases from the master fix and keeps nullable payloads compact by copying only valid physical rows while preserving logical valid_data. The 3.0 branch does not include the full master change set; sparse-vector support is left out because this PR only restores the dense external-table take output needed by the 3.0 feature branch. Validation: - source ~/.profile && make build-cpp-with-unittest - all_tests with six ExternalTakeTest nullable vector take cases - git diff --check - reset Milvus and run the focused Go client e2e: MINIO_ADDRESS=localhost:9000 MINIO_ACCESS_KEY=minioadmin MINIO_SECRET_KEY=minioadmin MINIO_BUCKET=a-bucket MINIO_ROOT_PATH=files go test -tags dynamic,test ./testcases -run '^TestExternalCollectionFloat16VectorTakeOutput$' -addr=http://localhost:19530 -timeout 10m -count=1 -v Result: PASS ok github.com/milvus-io/milvus/tests/go_client/testcases 6.332s Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: milvus-io#45881 pr: milvus-io#49924 The 3.0 external take path normalizes dense byte-vector fields into Arrow binary arrays before converting them back to DataArray. ArrowToDataArray only handled FloatVector, so BinaryVector, Float16Vector, BFloat16Vector, and Int8Vector outputs lost their payload when callers took the non-fallback path. This backport adds the dense byte-vector conversion cases from the master fix and keeps nullable payloads compact by copying only valid physical rows while preserving logical valid_data. The 3.0 branch does not include the full master change set; sparse-vector support is left out because this PR only restores the dense external-table take output needed by the 3.0 feature branch. Validation: - source ~/.profile && make build-cpp-with-unittest - all_tests with six ExternalTakeTest nullable vector take cases - git diff --check - reset Milvus and run the focused Go client e2e: MINIO_ADDRESS=localhost:9000 MINIO_ACCESS_KEY=minioadmin MINIO_SECRET_KEY=minioadmin MINIO_BUCKET=a-bucket MINIO_ROOT_PATH=files go test -tags dynamic,test ./testcases -run '^TestExternalCollectionFloat16VectorTakeOutput$' -addr=http://localhost:19530 -timeout 10m -count=1 -v Result: PASS ok github.com/milvus-io/milvus/tests/go_client/testcases 6.332s Signed-off-by: Wei Liu <wei.liu@zilliz.com>
pr: milvus-io#49924 MergeDataArray builds search output fields row by row after search reduce. On 3.0, byte-backed vector payloads still used assign(), so each valid row replaced the buffer written by earlier rows. A multi-row search result could therefore return valid_data for several rows while keeping only the last row payload. Backport the assign-to-append merge fix from PR milvus-io#49924 to the 3.0 branch. The test block is kept aligned with the upstream regression case; the broader unrelated nullable-vector changes from the original PR are omitted because this PR only needs the search merge fix for byte-backed vector outputs. Validation: - git diff --check - CLANG_FORMAT=/opt/homebrew/opt/llvm@15/bin/clang-format make cppcheck in a temporary clean worktree with the candidate diff committed - make build-cpp-with-unittest and ninja -C cmake_build all_tests were attempted, but local C++ builds fail before this test at Abseil std::partial_ordering/std::weak_ordering errors under C++17 Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #45881 pr: #49924 ## What changed Backport only the `ArrowToDataArray` dense byte-vector conversion subset from the master take fix [#49924](#49924). This is not a full backport of that PR; it copies the minimal part needed on 3.0 to fix external-table take output when the take path reads binary vector payloads. `ArrowToDataArray` now converts `BinaryVector`, `Float16Vector`, `BFloat16Vector`, and `Int8Vector` Arrow binary payloads back into `DataArray`. A focused Go client e2e test now covers FP16 vector output from an external collection through both query and search result paths. ## Root cause The external take path normalizes dense byte-vector fields into Arrow binary arrays. On 3.0, `ArrowToDataArray` only handled float vector output, so binary-vector style payloads could enter the optimized take output path without being converted back into the expected Milvus vector `DataArray`. For nullable fields, the fix copies only valid physical rows into the protobuf vector payload while preserving the logical `valid_data` bitmap. This PR intentionally excludes the unrelated parts of the master PR, including sparse vector support and other master-only changes. ## Validation - `source ~/.profile && make build-cpp-with-unittest` - `DYLD_LIBRARY_PATH=... ./internal/core/output/unittest/all_tests --gtest_filter='ExternalTakeTest.Nullable*VectorTakeUsesCompactData:ExternalTakeTest.TryTakeForRetrieve_NullableVectorUsesCompactData:ExternalTakeTest.TryTakeForSearch_NullableVectorUsesCompactData'` - `git diff --check` - Reset local Milvus, started standalone with `./bin/milvus run standalone --run-with-subprocess`, then ran: ```bash cd tests/go_client source ~/.profile source ../../scripts/setenv.sh MINIO_ADDRESS=localhost:9000 \ MINIO_ACCESS_KEY=minioadmin \ MINIO_SECRET_KEY=minioadmin \ MINIO_BUCKET=a-bucket \ MINIO_ROOT_PATH=files \ go test -tags dynamic,test ./testcases \ -run '^TestExternalCollectionFloat16VectorTakeOutput$' \ -addr=http://localhost:19530 \ -timeout 10m \ -count=1 -v ``` Result: ```text Job 466897675154097263: state=RefreshCompleted progress=100% Vector index created on embedding Vector index created on fp16_vec Collection loaded Response method=Query includes fp16_vec Float16Vector "KissLS4vMDE=" Response method=Search includes fp16_vec Float16Vector "KissLS4vMDE=" --- PASS: TestExternalCollectionFloat16VectorTakeOutput (5.15s) PASS ok github.com/milvus-io/milvus/tests/go_client/testcases 6.332s ``` --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Summary
ValidDataas the logical row source and only read/write physical vector payload for valid rows.ArrayOfVectorbehavior: it is supported as row-dense vector-array data with null-row placeholders, but it is intentionally excluded from compact nullable-vector helpers.issue: #49881
issue: #49974
issue: #49992
What changed
ValidDatageneration.ArrayOfVectoron master on its existing row-dense path; only V1 storage rejects nullableArrayOfVector.Review notes
ArrayOfVector. This PR no longer rejects it at schema, proxy, JSON/CSV/Parquet import, or add-field validation.ArrayOfVectorremains excluded fromIsSupportedNullableVectorTypebecause that helper means compact nullable vector payloads;ArrayOfVectoruses row-dense data with empty placeholder rows for null logical rows.ValidDatais logical-row sized, while sparseContentsonly stores valid physical rows. Reader/writer/serde/import/query reorder paths derive logical-to-physical offsets fromValidDatainstead of indexing sparse contents by logical row.ArrayOfVectoris still rejected only in V1 storage-format paths. That does not remove master V2 nullableArrayOfVectorsupport.Test plan
git diff --check HEADmake static-checkcmake --build cmake_build --target all_tests -j8all_teststargeted nullable vector / nullable VectorArray casesgo test -count=1 ./internal/util/queryutilcd pkg && go test -count=1 ./util/typeutilcd pkg && go test -count=1 ./util/funcutilgo test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/util/importutilv2/parquetgo test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/proxy -run 'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/datanode/importv2go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/storage -run '<nullable-vector targeted tests>'