Skip to content

fix: enforce nullable vector compact data invariants#49924

Merged
sre-ci-robot merged 1 commit into
milvus-io:masterfrom
marcelo-cjl:fix/nullable-vector-compact-paths
May 25, 2026
Merged

fix: enforce nullable vector compact data invariants#49924
sre-ci-robot merged 1 commit into
milvus-io:masterfrom
marcelo-cjl:fix/nullable-vector-compact-paths

Conversation

@marcelo-cjl

@marcelo-cjl marcelo-cjl commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Enforce the nullable vector compact-storage contract across query, storage, import, segcore, and index build paths.
  • Keep ValidData as the logical row source and only read/write physical vector payload for valid rows.
  • Preserve master nullable ArrayOfVector behavior: it is supported as row-dense vector-array data with null-row placeholders, but it is intentionally excluded from compact nullable-vector helpers.

issue: #49881
issue: #49974
issue: #49992

What changed

  • Add shared nullable vector helpers for supported compact vector types and logical-to-physical compact row mapping.
  • Fix proxy search result reorder/order_by when output fields include nullable compact vectors.
  • Fix queryutil row count, row-size accounting, slice/order/merge paths for nullable compact vector payloads.
  • Fix storage payload reader/writer, row serde, record builder metadata, data sorter, and insert-data merge for nullable compact vectors.
  • Fix parquet sparse vector import and numpy nullable dense vector ValidData generation.
  • Fix C++ growing flush, external take output, result merge, and bitset transform for nullable compact vector rows.
  • Skip 0-row Knowhere build for all-null nullable vectors while preserving valid-data-only index metadata.
  • Keep nullable ArrayOfVector on master on its existing row-dense path; only V1 storage rejects nullable ArrayOfVector.
  • Add regression coverage in Go unit tests, C++ gtests, and go-client nullable vector e2e scenarios.

Review notes

  • Review 3 / 4 - master supports nullable ArrayOfVector. This PR no longer rejects it at schema, proxy, JSON/CSV/Parquet import, or add-field validation. ArrayOfVector remains excluded from IsSupportedNullableVectorType because that helper means compact nullable vector payloads; ArrayOfVector uses row-dense data with empty placeholder rows for null logical rows.
  • Review 10 - nullable sparse vector paths were audited around the compact layout: ValidData is logical-row sized, while sparse Contents only stores valid physical rows. Reader/writer/serde/import/query reorder paths derive logical-to-physical offsets from ValidData instead of indexing sparse contents by logical row.
  • Storage V1 note - nullable ArrayOfVector is still rejected only in V1 storage-format paths. That does not remove master V2 nullable ArrayOfVector support.

Test plan

  • git diff --check HEAD
  • make static-check
  • cmake --build cmake_build --target all_tests -j8
  • C++ all_tests targeted nullable vector / nullable VectorArray cases
  • go test -count=1 ./internal/util/queryutil
  • cd pkg && go test -count=1 ./util/typeutil
  • cd pkg && go test -count=1 ./util/funcutil
  • go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/util/importutilv2/parquet
  • go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/proxy -run 'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'
  • go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/datanode/importv2
  • go test -tags dynamic,test -gcflags="all=-N -l" -count=1 ./internal/storage -run '<nullable-vector targeted tests>'

@sre-ci-robot sre-ci-robot requested review from bigsheeper and czs007 May 19, 2026 13:53
@sre-ci-robot sre-ci-robot added area/test size/XXL Denotes a PR that changes 1000+ lines. sig/testing labels May 19, 2026
@mergify

mergify Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

@marcelo-cjl Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco.

@mergify mergify Bot added needs-dco DCO is missing in this pull request. kind/bug Issues or changes related a bug labels May 19, 2026
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[ci-v2-notice]
Notice: New ci-v2 system is enabled for this PR.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-build-all // for ci-v2/build-all (multi-arch builds)
  • /ci-rerun-buildenv // for ci-v2/build-env (build milvus-env builder images)
  • /ci-rerun-ut-integration // for ci-v2/ut-integration, will rerun ci-v2/build
  • /ci-rerun-ut-go // for ci-v2/ut-go, will rerun ci-v2/build
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp, will rerun ci-v2/build
  • /ci-rerun-e2e-default // for ci-v2/e2e-default
  • /ci-rerun-e2e-amd // for ci-v2/e2e-amd (e2e pool dispatcher)
  • /ci-rerun-build-ut-cov // for ci-v2/build-ut-cov (build + unit tests in one pipeline)
  • /ci-rerun-gosdk // for ci-v2/go-sdk (Go SDK E2E tests, ARM)

If you have any questions or requests, please contact @zhikunyao.

@sre-ci-robot

Copy link
Copy Markdown
Contributor

❌ CI Loop Results eaccc20

Stage Result Duration Tests
✅ Build SUCCESS 9.6min -
❌ Code-Check FAILURE 5.7min -
✅ UT-CPP-Cov SUCCESS 36.6min 7718 passed

Total: 52min | Pipeline | Artifacts

@codecov

codecov Bot commented May 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.16827% with 432 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.58%. Comparing base (3851df6) to head (af862cb).
⚠️ Report is 15 commits behind head on master.

Files with missing lines Patch % Lines
pkg/util/typeutil/schema.go 50.44% 96 Missing and 15 partials ⚠️
internal/storage/insert_data.go 54.88% 41 Missing and 19 partials ⚠️
internal/storage/data_sorter.go 78.68% 37 Missing and 5 partials ⚠️
pkg/util/funcutil/func.go 60.00% 29 Missing and 9 partials ⚠️
internal/proxy/search_pipeline.go 69.67% 24 Missing and 13 partials ⚠️
internal/core/src/segcore/segment_c.cpp 77.27% 25 Missing ⚠️
...nternal/core/src/index/VectorIndexValidDataUtils.h 56.00% 22 Missing ⚠️
internal/util/importutilv2/parquet/field_reader.go 85.71% 14 Missing and 7 partials ⚠️
internal/core/src/index/VectorMemIndex.cpp 61.76% 13 Missing ⚠️
internal/util/indexcgowrapper/index.go 76.78% 10 Missing and 3 partials ⚠️
... and 11 more
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #49924       +/-   ##
===========================================
- Coverage   84.54%   78.58%    -5.96%     
===========================================
  Files         668     2219     +1551     
  Lines      113722   385930   +272208     
===========================================
+ Hits        96145   303282   +207137     
- Misses      17525    73402    +55877     
- Partials       52     9246     +9194     
Components Coverage Δ
Client 80.59% <ø> (∅)
Core 85.43% <91.28%> (+0.14%) ⬆️
Go 76.53% <72.72%> (∅)
Files with missing lines Coverage Δ
internal/core/src/common/FieldData.cpp 75.68% <100.00%> (+0.12%) ⬆️
internal/core/src/query/SearchOnGrowing.cpp 96.85% <100.00%> (ø)
internal/core/src/query/SearchOnIndex.cpp 80.64% <100.00%> (+0.64%) ⬆️
internal/core/src/query/SearchOnSealed.cpp 95.55% <100.00%> (+0.06%) ⬆️
...rc/query/SearchOnSealedIndexBitsetLifetimeTest.cpp 98.65% <100.00%> (+0.51%) ⬆️
internal/core/src/query/Utils.h 82.53% <100.00%> (+0.57%) ⬆️
...rc/segcore/ChunkedSegmentSealedBinlogIndexTest.cpp 98.34% <100.00%> (+0.01%) ⬆️
internal/core/src/segcore/FieldIndexing.cpp 61.32% <100.00%> (+0.27%) ⬆️
...ernal/core/src/segcore/SegmentGrowingIndexTest.cpp 98.85% <100.00%> (+0.29%) ⬆️
internal/core/src/segcore/Utils.cpp 70.10% <100.00%> (+1.34%) ⬆️
... and 33 more

... and 1534 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sre-ci-robot sre-ci-robot added the low-code-coverage add test-label from zhikun, diff coverage > 80% label May 19, 2026
@marcelo-cjl marcelo-cjl force-pushed the fix/nullable-vector-compact-paths branch from eaccc20 to db48a91 Compare May 20, 2026 15:36
@marcelo-cjl marcelo-cjl changed the title fix: handle nullable vector compact data paths fix: enforce nullable vector compact data invariants May 20, 2026
@sre-ci-robot

Copy link
Copy Markdown
Contributor

❌ CI Loop Results db48a91

Stage Result Duration Tests
✅ Build SUCCESS 11.9min -
❌ Code-Check FAILURE 1.7min -
❌ UT-CPP-Cov FAILURE 44.5min 6046 run, 1 failed

Total: 59min | Pipeline | Artifacts
Diff Coverage: CPP 79.0% (529 hit, 141 miss, 670 measurable lines)
Total Patch Coverage: 79.0% (529/670 measurable lines)

Failed Test Logs:

@marcelo-cjl marcelo-cjl force-pushed the fix/nullable-vector-compact-paths branch from db48a91 to cbaf5b7 Compare May 20, 2026 23:28
@mergify mergify Bot added dco-passed DCO check passed. and removed needs-dco DCO is missing in this pull request. labels May 20, 2026
@sre-ci-robot

Copy link
Copy Markdown
Contributor

❌ CI Loop Results cbaf5b7

Stage Result Duration Tests
✅ Build SUCCESS 9.0min -
✅ Code-Check SUCCESS 7.6min -
❌ UT-GO FAILURE 23.1min 1012 passed
✅ UT-Integration SUCCESS 23.8min 46 passed
✅ UT-CPP-Cov SUCCESS 38.1min 7731 passed

Total: 66min | Pipeline | Artifacts
Diff Coverage: CPP 89.7% (594 hit, 68 miss, 662 measurable lines)
Total Patch Coverage: 89.7% (594/662 measurable lines)

Failed Test Logs:

@sre-ci-robot

Copy link
Copy Markdown
Contributor

❌ CI Loop Results 3df125c

Stage Result Duration Tests
✅ Build SUCCESS 11.6min -
✅ Code-Check SUCCESS 7.2min -
❌ UT-GO FAILURE 25.4min 1012 passed
✅ UT-Integration SUCCESS 24.0min 46 passed
❌ UT-CPP-Cov FAILURE 44.4min 7733 run, 2 failed

Total: 71min | Pipeline | Artifacts
Diff Coverage: CPP 87.2% (699 hit, 103 miss, 802 measurable lines)
Total Patch Coverage: 87.2% (699/802 measurable lines)

Failed Test Logs:

@marcelo-cjl marcelo-cjl force-pushed the fix/nullable-vector-compact-paths branch from 3df125c to 7c07e1e Compare May 21, 2026 10:44
@sre-ci-robot sre-ci-robot added the area/dependency Pull requests that update a dependency file label May 21, 2026
@marcelo-cjl marcelo-cjl force-pushed the fix/nullable-vector-compact-paths branch from 7c07e1e to e2a85e0 Compare May 21, 2026 11:37
@sre-ci-robot

Copy link
Copy Markdown
Contributor

❌ CI Loop Results 7c07e1e

Stage Result Duration Tests
✅ Build SUCCESS 9.8min -
❌ Code-Check FAILURE 1.7min -
✅ UT-CPP-Cov SUCCESS 35.6min 7735 passed

Total: 51min | Pipeline | Artifacts
Diff Coverage: CPP 88.4% (860 hit, 113 miss, 973 measurable lines)
Total Patch Coverage: 88.4% (860/973 measurable lines)

return (count + 7) / 8;
}

inline size_t

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The valid-data count is serialized as size_t, which varies between 32-bit and 64-bit platforms. Since this is persisted index metadata, use uint64_t for a stable wire format. Additionally, validate both the count blob size (count_ptr->size == sizeof(uint64_t)) and the bitmap blob size (data_ptr->size >= (count+7)/8) before memcpy/bitmap reads to prevent out-of-bounds access on corrupt or truncated index data.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Count wire format is uint64_t now. Load also validates the count blob size and bitmap blob size before reading.

}

valid_data.found = true;
local_chunk_manager->Read(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same size_t persistence issue as in VectorIndexValidDataUtils.h. The disk index load and build paths serialize/deserialize the valid-data count as size_t. This must match the fixed uint64_t type across all persisted metadata paths. The VectorDiskAnnIndex::Load all-null branch that skips Knowhere deserialization also lacks a direct round-trip test (see separate testing issue).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Disk valid-data uses the same uint64_t wire count and file-size checks. Added all-null disk build/load coverage as well.


for _, field := range schema.GetFields() {
if field.GetDataType() != schemapb.DataType_Array && field.GetDataType() != schemapb.DataType_ArrayOfVector {
msg := fmt.Sprintf("fields in StructArrayField can only be array or array of vector, but field %s is %s", field.Name, field.DataType.String())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR architecturally rejects nullable ArrayOfVector at schema validation time (checkStructArrayFieldSchema). If this validation runs during collection metadata replay at rootcoord startup (via etcd), existing collections with nullable ArrayOfVector schemas will fail to load — a catastrophic upgrade failure. If it doesn't run on replay, runtime queries against those collections will hit new error branches. The PR author must confirm: does metadata loaded from etcd at rootcoord boot get revalidated through this function? Either answer has consequences that need explicit migration/deprecation handling.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed. RootCoord metadata boot/replay does not call checkStructArrayFieldSchema; it is used on create collection and alter add struct field. Existing metadata still loads. New schemas/import/proxy validation reject nullable ArrayOfVector. Added this to the PR notes.

Comment thread internal/proxy/util.go Outdated
if err := ValidateFieldsInStruct(subField, schema); err != nil {
return err
}
if subField.GetDataType() == schemapb.DataType_ArrayOfVector && (structArrayField.GetNullable() || subField.GetNullable()) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proxy validation now also rejects nullable ArrayOfVector. Combined with rootcoord rejection, this closes off the type at both CreateCollection and query time. The PR deletes prior proxy tests for nullable struct/vector cases — this is evidence of a product behavior change, not just test cleanup. The deletion should be justified in the PR description.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. This is intentional: ArrayOfVector has no nullable compact-storage contract in this PR. The old nullable cases were invalid under that contract; explicit rejection coverage and PR notes were added.

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
@marcelo-cjl marcelo-cjl force-pushed the fix/nullable-vector-compact-paths branch from b9463bf to af862cb Compare May 22, 2026 06:10
@sre-ci-robot

Copy link
Copy Markdown
Contributor

✅ CI Loop Results cb87177

Stage Result Duration Tests
✅ Build SUCCESS 9.6min -
✅ Code-Check SUCCESS 7.5min -
✅ UT-GO SUCCESS 22.6min 1012 passed
✅ UT-Integration SUCCESS 23.8min 46 passed
✅ UT-CPP-Cov SUCCESS 37.7min 7744 passed

Total: 68min | Pipeline | Artifacts

Overall Coverage: 70.7%
Diff Coverage: CPP 89.9% (927 hit, 104 miss, 1031 measurable lines) | Go 85.0% (781 hit, 138 miss, 919 measurable lines)
Total Patch Coverage: 87.6% (1708/1950 measurable lines)

@sre-ci-robot

Copy link
Copy Markdown
Contributor

✅ CI Loop Results b9463bf

Stage Result Duration Tests
✅ Build SUCCESS 9.9min -
✅ Code-Check SUCCESS 6.9min -
✅ UT-GO SUCCESS 22.3min 1012 passed
✅ UT-Integration SUCCESS 23.4min 46 passed
✅ UT-CPP-Cov SUCCESS 37.7min 7744 passed

Total: 67min | Pipeline | Artifacts

Overall Coverage: 70.7%
Diff Coverage: CPP 89.9% (927 hit, 104 miss, 1031 measurable lines) | Go 85.0% (781 hit, 138 miss, 919 measurable lines)
Total Patch Coverage: 87.6% (1708/1950 measurable lines)

@sre-ci-robot

Copy link
Copy Markdown
Contributor

✅ CI Loop Results af862cb

Stage Result Duration Tests
✅ Build SUCCESS 13.3min -
✅ Code-Check SUCCESS 6.8min -
✅ UT-GO SUCCESS 22.7min 1012 passed
✅ UT-Integration SUCCESS 24.5min 46 passed
✅ UT-CPP-Cov SUCCESS 53.5min 7744 passed

Total: 72min | Pipeline | Artifacts

Overall Coverage: 67.3%
Diff Coverage: CPP 89.9% (927 hit, 104 miss, 1031 measurable lines) | Go 75.0% (778 hit, 259 miss, 1037 measurable lines)
Total Patch Coverage: 82.4% (1705/2068 measurable lines)

@sre-ci-robot sre-ci-robot added the low-code-coverage add test-label from zhikun, diff coverage > 80% label May 22, 2026
@mergify mergify Bot added the ci-passed label May 22, 2026
@sre-ci-robot sre-ci-robot removed the low-code-coverage add test-label from zhikun, diff coverage > 80% label May 22, 2026
@liliu-z

liliu-z commented May 25, 2026

Copy link
Copy Markdown
Member

/lgtm
/approve

@sre-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liliu-z, marcelo-cjl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit c4154af into milvus-io:master May 25, 2026
18 of 19 checks passed
sre-ci-robot pushed a commit that referenced this pull request May 26, 2026
## Summary
- Cherrypick nullable vector compact-storage invariant fixes from #49924
to 2.6.
- Enforce the compact nullable vector contract: `ValidData` tracks
logical rows, while vector payloads contain only valid physical rows.
- Cover the affected storage, import, proxy, index build, and segcore
paths with regression tests.

pr: #49924
issue: #49881

## Test plan
- [x] `git diff --check HEAD`
- [x] `cd pkg && go test ./util/funcutil ./util/typeutil ./mq/msgstream`
targeted nullable compact vector tests

Local full build/static checks were not completed in this workspace:
- `make build-cpp` was blocked by generated v3 proto imports and an
existing CMake generator-cache mismatch.
- `make static-check` was blocked by stale/generated canalyzer C header
mismatch.

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
marcelo-cjl added a commit to marcelo-cjl/milvus that referenced this pull request May 26, 2026
- Enforce the nullable vector compact-storage contract across query,
storage, import, segcore, and index build paths.
- Keep `ValidData` as the logical row source and only read/write
physical vector payload for valid rows.
- Preserve master nullable `ArrayOfVector` behavior: it is supported as
row-dense vector-array data with null-row placeholders, but it is
intentionally excluded from compact nullable-vector helpers.

issue: milvus-io#49881
issue: milvus-io#49974
issue: milvus-io#49992

- Add shared nullable vector helpers for supported compact vector types
and logical-to-physical compact row mapping.
- Fix proxy search result reorder/order_by when output fields include
nullable compact vectors.
- Fix queryutil row count, row-size accounting, slice/order/merge paths
for nullable compact vector payloads.
- Fix storage payload reader/writer, row serde, record builder metadata,
data sorter, and insert-data merge for nullable compact vectors.
- Fix parquet sparse vector import and numpy nullable dense vector
`ValidData` generation.
- Fix C++ growing flush, external take output, result merge, and bitset
transform for nullable compact vector rows.
- Skip 0-row Knowhere build for all-null nullable vectors while
preserving valid-data-only index metadata.
- Keep nullable `ArrayOfVector` on master on its existing row-dense
path; only V1 storage rejects nullable `ArrayOfVector`.
- Add regression coverage in Go unit tests, C++ gtests, and go-client
nullable vector e2e scenarios.

- Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no
longer rejects it at schema, proxy, JSON/CSV/Parquet import, or
add-field validation. `ArrayOfVector` remains excluded from
`IsSupportedNullableVectorType` because that helper means compact
nullable vector payloads; `ArrayOfVector` uses row-dense data with empty
placeholder rows for null logical rows.
- Review 10 - nullable sparse vector paths were audited around the
compact layout: `ValidData` is logical-row sized, while sparse
`Contents` only stores valid physical rows.
Reader/writer/serde/import/query reorder paths derive
logical-to-physical offsets from `ValidData` instead of indexing sparse
contents by logical row.
- Storage V1 note - nullable `ArrayOfVector` is still rejected only in
V1 storage-format paths. That does not remove master V2 nullable
`ArrayOfVector` support.

- [x] `git diff --check HEAD`
- [x] `make static-check`
- [x] `cmake --build cmake_build --target all_tests -j8`
- [x] C++ `all_tests` targeted nullable vector / nullable VectorArray
cases
- [x] `go test -count=1 ./internal/util/queryutil`
- [x] `cd pkg && go test -count=1 ./util/typeutil`
- [x] `cd pkg && go test -count=1 ./util/funcutil`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/util/importutilv2/parquet`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/proxy -run
'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/datanode/importv2`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/storage -run '<nullable-vector targeted tests>'`

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
(cherry picked from commit c4154af)
Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
marcelo-cjl added a commit to marcelo-cjl/milvus that referenced this pull request May 26, 2026
- Enforce the nullable vector compact-storage contract across query,
storage, import, segcore, and index build paths.
- Keep `ValidData` as the logical row source and only read/write
physical vector payload for valid rows.
- Preserve master nullable `ArrayOfVector` behavior: it is supported as
row-dense vector-array data with null-row placeholders, but it is
intentionally excluded from compact nullable-vector helpers.

issue: milvus-io#49881
issue: milvus-io#49974
issue: milvus-io#49992

- Add shared nullable vector helpers for supported compact vector types
and logical-to-physical compact row mapping.
- Fix proxy search result reorder/order_by when output fields include
nullable compact vectors.
- Fix queryutil row count, row-size accounting, slice/order/merge paths
for nullable compact vector payloads.
- Fix storage payload reader/writer, row serde, record builder metadata,
data sorter, and insert-data merge for nullable compact vectors.
- Fix parquet sparse vector import and numpy nullable dense vector
`ValidData` generation.
- Fix C++ growing flush, external take output, result merge, and bitset
transform for nullable compact vector rows.
- Skip 0-row Knowhere build for all-null nullable vectors while
preserving valid-data-only index metadata.
- Keep nullable `ArrayOfVector` on master on its existing row-dense
path; only V1 storage rejects nullable `ArrayOfVector`.
- Add regression coverage in Go unit tests, C++ gtests, and go-client
nullable vector e2e scenarios.

- Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no
longer rejects it at schema, proxy, JSON/CSV/Parquet import, or
add-field validation. `ArrayOfVector` remains excluded from
`IsSupportedNullableVectorType` because that helper means compact
nullable vector payloads; `ArrayOfVector` uses row-dense data with empty
placeholder rows for null logical rows.
- Review 10 - nullable sparse vector paths were audited around the
compact layout: `ValidData` is logical-row sized, while sparse
`Contents` only stores valid physical rows.
Reader/writer/serde/import/query reorder paths derive
logical-to-physical offsets from `ValidData` instead of indexing sparse
contents by logical row.
- Storage V1 note - nullable `ArrayOfVector` is still rejected only in
V1 storage-format paths. That does not remove master V2 nullable
`ArrayOfVector` support.

- [x] `git diff --check HEAD`
- [x] `make static-check`
- [x] `cmake --build cmake_build --target all_tests -j8`
- [x] C++ `all_tests` targeted nullable vector / nullable VectorArray
cases
- [x] `go test -count=1 ./internal/util/queryutil`
- [x] `cd pkg && go test -count=1 ./util/typeutil`
- [x] `cd pkg && go test -count=1 ./util/funcutil`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/util/importutilv2/parquet`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/proxy -run
'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/datanode/importv2`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/storage -run '<nullable-vector targeted tests>'`

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
(cherry picked from commit c4154af)
Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
chyezh pushed a commit to chyezh/milvus that referenced this pull request May 26, 2026
## Summary

- Enforce the nullable vector compact-storage contract across query,
storage, import, segcore, and index build paths.
- Keep `ValidData` as the logical row source and only read/write
physical vector payload for valid rows.
- Preserve master nullable `ArrayOfVector` behavior: it is supported as
row-dense vector-array data with null-row placeholders, but it is
intentionally excluded from compact nullable-vector helpers.

issue: milvus-io#49881
issue: milvus-io#49974
issue: milvus-io#49992

## What changed

- Add shared nullable vector helpers for supported compact vector types
and logical-to-physical compact row mapping.
- Fix proxy search result reorder/order_by when output fields include
nullable compact vectors.
- Fix queryutil row count, row-size accounting, slice/order/merge paths
for nullable compact vector payloads.
- Fix storage payload reader/writer, row serde, record builder metadata,
data sorter, and insert-data merge for nullable compact vectors.
- Fix parquet sparse vector import and numpy nullable dense vector
`ValidData` generation.
- Fix C++ growing flush, external take output, result merge, and bitset
transform for nullable compact vector rows.
- Skip 0-row Knowhere build for all-null nullable vectors while
preserving valid-data-only index metadata.
- Keep nullable `ArrayOfVector` on master on its existing row-dense
path; only V1 storage rejects nullable `ArrayOfVector`.
- Add regression coverage in Go unit tests, C++ gtests, and go-client
nullable vector e2e scenarios.

## Review notes

- Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no
longer rejects it at schema, proxy, JSON/CSV/Parquet import, or
add-field validation. `ArrayOfVector` remains excluded from
`IsSupportedNullableVectorType` because that helper means compact
nullable vector payloads; `ArrayOfVector` uses row-dense data with empty
placeholder rows for null logical rows.
- Review 10 - nullable sparse vector paths were audited around the
compact layout: `ValidData` is logical-row sized, while sparse
`Contents` only stores valid physical rows.
Reader/writer/serde/import/query reorder paths derive
logical-to-physical offsets from `ValidData` instead of indexing sparse
contents by logical row.
- Storage V1 note - nullable `ArrayOfVector` is still rejected only in
V1 storage-format paths. That does not remove master V2 nullable
`ArrayOfVector` support.

## Test plan

- [x] `git diff --check HEAD`
- [x] `make static-check`
- [x] `cmake --build cmake_build --target all_tests -j8`
- [x] C++ `all_tests` targeted nullable vector / nullable VectorArray
cases
- [x] `go test -count=1 ./internal/util/queryutil`
- [x] `cd pkg && go test -count=1 ./util/typeutil`
- [x] `cd pkg && go test -count=1 ./util/funcutil`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/util/importutilv2/parquet`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/proxy -run
'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/datanode/importv2`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/storage -run '<nullable-vector targeted tests>'`

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
marcelo-cjl added a commit to marcelo-cjl/milvus that referenced this pull request May 26, 2026
- Enforce the nullable vector compact-storage contract across query,
storage, import, segcore, and index build paths.
- Keep `ValidData` as the logical row source and only read/write
physical vector payload for valid rows.
- Preserve master nullable `ArrayOfVector` behavior: it is supported as
row-dense vector-array data with null-row placeholders, but it is
intentionally excluded from compact nullable-vector helpers.

issue: milvus-io#49881
issue: milvus-io#49974
issue: milvus-io#49992

- Add shared nullable vector helpers for supported compact vector types
and logical-to-physical compact row mapping.
- Fix proxy search result reorder/order_by when output fields include
nullable compact vectors.
- Fix queryutil row count, row-size accounting, slice/order/merge paths
for nullable compact vector payloads.
- Fix storage payload reader/writer, row serde, record builder metadata,
data sorter, and insert-data merge for nullable compact vectors.
- Fix parquet sparse vector import and numpy nullable dense vector
`ValidData` generation.
- Fix C++ growing flush, external take output, result merge, and bitset
transform for nullable compact vector rows.
- Skip 0-row Knowhere build for all-null nullable vectors while
preserving valid-data-only index metadata.
- Keep nullable `ArrayOfVector` on master on its existing row-dense
path; only V1 storage rejects nullable `ArrayOfVector`.
- Add regression coverage in Go unit tests, C++ gtests, and go-client
nullable vector e2e scenarios.

- Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no
longer rejects it at schema, proxy, JSON/CSV/Parquet import, or
add-field validation. `ArrayOfVector` remains excluded from
`IsSupportedNullableVectorType` because that helper means compact
nullable vector payloads; `ArrayOfVector` uses row-dense data with empty
placeholder rows for null logical rows.
- Review 10 - nullable sparse vector paths were audited around the
compact layout: `ValidData` is logical-row sized, while sparse
`Contents` only stores valid physical rows.
Reader/writer/serde/import/query reorder paths derive
logical-to-physical offsets from `ValidData` instead of indexing sparse
contents by logical row.
- Storage V1 note - nullable `ArrayOfVector` is still rejected only in
V1 storage-format paths. That does not remove master V2 nullable
`ArrayOfVector` support.

- [x] `git diff --check HEAD`
- [x] `make static-check`
- [x] `cmake --build cmake_build --target all_tests -j8`
- [x] C++ `all_tests` targeted nullable vector / nullable VectorArray
cases
- [x] `go test -count=1 ./internal/util/queryutil`
- [x] `cd pkg && go test -count=1 ./util/typeutil`
- [x] `cd pkg && go test -count=1 ./util/funcutil`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/util/importutilv2/parquet`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/proxy -run
'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/datanode/importv2`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/storage -run '<nullable-vector targeted tests>'`

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
(cherry picked from commit c4154af)
Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
marcelo-cjl added a commit to marcelo-cjl/milvus that referenced this pull request May 26, 2026
- Enforce the nullable vector compact-storage contract across query,
storage, import, segcore, and index build paths.
- Keep `ValidData` as the logical row source and only read/write
physical vector payload for valid rows.
- Preserve master nullable `ArrayOfVector` behavior: it is supported as
row-dense vector-array data with null-row placeholders, but it is
intentionally excluded from compact nullable-vector helpers.

issue: milvus-io#49881
issue: milvus-io#49974
issue: milvus-io#49992

- Add shared nullable vector helpers for supported compact vector types
and logical-to-physical compact row mapping.
- Fix proxy search result reorder/order_by when output fields include
nullable compact vectors.
- Fix queryutil row count, row-size accounting, slice/order/merge paths
for nullable compact vector payloads.
- Fix storage payload reader/writer, row serde, record builder metadata,
data sorter, and insert-data merge for nullable compact vectors.
- Fix parquet sparse vector import and numpy nullable dense vector
`ValidData` generation.
- Fix C++ growing flush, external take output, result merge, and bitset
transform for nullable compact vector rows.
- Skip 0-row Knowhere build for all-null nullable vectors while
preserving valid-data-only index metadata.
- Keep nullable `ArrayOfVector` on master on its existing row-dense
path; only V1 storage rejects nullable `ArrayOfVector`.
- Add regression coverage in Go unit tests, C++ gtests, and go-client
nullable vector e2e scenarios.

- Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no
longer rejects it at schema, proxy, JSON/CSV/Parquet import, or
add-field validation. `ArrayOfVector` remains excluded from
`IsSupportedNullableVectorType` because that helper means compact
nullable vector payloads; `ArrayOfVector` uses row-dense data with empty
placeholder rows for null logical rows.
- Review 10 - nullable sparse vector paths were audited around the
compact layout: `ValidData` is logical-row sized, while sparse
`Contents` only stores valid physical rows.
Reader/writer/serde/import/query reorder paths derive
logical-to-physical offsets from `ValidData` instead of indexing sparse
contents by logical row.
- Storage V1 note - nullable `ArrayOfVector` is still rejected only in
V1 storage-format paths. That does not remove master V2 nullable
`ArrayOfVector` support.

- [x] `git diff --check HEAD`
- [x] `make static-check`
- [x] `cmake --build cmake_build --target all_tests -j8`
- [x] C++ `all_tests` targeted nullable vector / nullable VectorArray
cases
- [x] `go test -count=1 ./internal/util/queryutil`
- [x] `cd pkg && go test -count=1 ./util/typeutil`
- [x] `cd pkg && go test -count=1 ./util/funcutil`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/util/importutilv2/parquet`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/proxy -run
'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/datanode/importv2`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/storage -run '<nullable-vector targeted tests>'`

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
(cherry picked from commit c4154af)
Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
liorf95 pushed a commit to liorf95/milvus_23_jan that referenced this pull request May 27, 2026
## Summary

- Enforce the nullable vector compact-storage contract across query,
storage, import, segcore, and index build paths.
- Keep `ValidData` as the logical row source and only read/write
physical vector payload for valid rows.
- Preserve master nullable `ArrayOfVector` behavior: it is supported as
row-dense vector-array data with null-row placeholders, but it is
intentionally excluded from compact nullable-vector helpers.

issue: milvus-io#49881
issue: milvus-io#49974
issue: milvus-io#49992

## What changed

- Add shared nullable vector helpers for supported compact vector types
and logical-to-physical compact row mapping.
- Fix proxy search result reorder/order_by when output fields include
nullable compact vectors.
- Fix queryutil row count, row-size accounting, slice/order/merge paths
for nullable compact vector payloads.
- Fix storage payload reader/writer, row serde, record builder metadata,
data sorter, and insert-data merge for nullable compact vectors.
- Fix parquet sparse vector import and numpy nullable dense vector
`ValidData` generation.
- Fix C++ growing flush, external take output, result merge, and bitset
transform for nullable compact vector rows.
- Skip 0-row Knowhere build for all-null nullable vectors while
preserving valid-data-only index metadata.
- Keep nullable `ArrayOfVector` on master on its existing row-dense
path; only V1 storage rejects nullable `ArrayOfVector`.
- Add regression coverage in Go unit tests, C++ gtests, and go-client
nullable vector e2e scenarios.

## Review notes

- Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no
longer rejects it at schema, proxy, JSON/CSV/Parquet import, or
add-field validation. `ArrayOfVector` remains excluded from
`IsSupportedNullableVectorType` because that helper means compact
nullable vector payloads; `ArrayOfVector` uses row-dense data with empty
placeholder rows for null logical rows.
- Review 10 - nullable sparse vector paths were audited around the
compact layout: `ValidData` is logical-row sized, while sparse
`Contents` only stores valid physical rows.
Reader/writer/serde/import/query reorder paths derive
logical-to-physical offsets from `ValidData` instead of indexing sparse
contents by logical row.
- Storage V1 note - nullable `ArrayOfVector` is still rejected only in
V1 storage-format paths. That does not remove master V2 nullable
`ArrayOfVector` support.

## Test plan

- [x] `git diff --check HEAD`
- [x] `make static-check`
- [x] `cmake --build cmake_build --target all_tests -j8`
- [x] C++ `all_tests` targeted nullable vector / nullable VectorArray
cases
- [x] `go test -count=1 ./internal/util/queryutil`
- [x] `cd pkg && go test -count=1 ./util/typeutil`
- [x] `cd pkg && go test -count=1 ./util/funcutil`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/util/importutilv2/parquet`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/proxy -run
'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/datanode/importv2`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/storage -run '<nullable-vector targeted tests>'`

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
liorf95 pushed a commit to liorf95/milvus_23_jan that referenced this pull request May 28, 2026
## Summary

- Enforce the nullable vector compact-storage contract across query,
storage, import, segcore, and index build paths.
- Keep `ValidData` as the logical row source and only read/write
physical vector payload for valid rows.
- Preserve master nullable `ArrayOfVector` behavior: it is supported as
row-dense vector-array data with null-row placeholders, but it is
intentionally excluded from compact nullable-vector helpers.

issue: milvus-io#49881
issue: milvus-io#49974
issue: milvus-io#49992

## What changed

- Add shared nullable vector helpers for supported compact vector types
and logical-to-physical compact row mapping.
- Fix proxy search result reorder/order_by when output fields include
nullable compact vectors.
- Fix queryutil row count, row-size accounting, slice/order/merge paths
for nullable compact vector payloads.
- Fix storage payload reader/writer, row serde, record builder metadata,
data sorter, and insert-data merge for nullable compact vectors.
- Fix parquet sparse vector import and numpy nullable dense vector
`ValidData` generation.
- Fix C++ growing flush, external take output, result merge, and bitset
transform for nullable compact vector rows.
- Skip 0-row Knowhere build for all-null nullable vectors while
preserving valid-data-only index metadata.
- Keep nullable `ArrayOfVector` on master on its existing row-dense
path; only V1 storage rejects nullable `ArrayOfVector`.
- Add regression coverage in Go unit tests, C++ gtests, and go-client
nullable vector e2e scenarios.

## Review notes

- Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no
longer rejects it at schema, proxy, JSON/CSV/Parquet import, or
add-field validation. `ArrayOfVector` remains excluded from
`IsSupportedNullableVectorType` because that helper means compact
nullable vector payloads; `ArrayOfVector` uses row-dense data with empty
placeholder rows for null logical rows.
- Review 10 - nullable sparse vector paths were audited around the
compact layout: `ValidData` is logical-row sized, while sparse
`Contents` only stores valid physical rows.
Reader/writer/serde/import/query reorder paths derive
logical-to-physical offsets from `ValidData` instead of indexing sparse
contents by logical row.
- Storage V1 note - nullable `ArrayOfVector` is still rejected only in
V1 storage-format paths. That does not remove master V2 nullable
`ArrayOfVector` support.

## Test plan

- [x] `git diff --check HEAD`
- [x] `make static-check`
- [x] `cmake --build cmake_build --target all_tests -j8`
- [x] C++ `all_tests` targeted nullable vector / nullable VectorArray
cases
- [x] `go test -count=1 ./internal/util/queryutil`
- [x] `cd pkg && go test -count=1 ./util/typeutil`
- [x] `cd pkg && go test -count=1 ./util/funcutil`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/util/importutilv2/parquet`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/proxy -run
'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/datanode/importv2`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/storage -run '<nullable-vector targeted tests>'`

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
liorf95 pushed a commit to liorf95/milvus_23_jan that referenced this pull request May 28, 2026
## Summary

- Enforce the nullable vector compact-storage contract across query,
storage, import, segcore, and index build paths.
- Keep `ValidData` as the logical row source and only read/write
physical vector payload for valid rows.
- Preserve master nullable `ArrayOfVector` behavior: it is supported as
row-dense vector-array data with null-row placeholders, but it is
intentionally excluded from compact nullable-vector helpers.

issue: milvus-io#49881
issue: milvus-io#49974
issue: milvus-io#49992

## What changed

- Add shared nullable vector helpers for supported compact vector types
and logical-to-physical compact row mapping.
- Fix proxy search result reorder/order_by when output fields include
nullable compact vectors.
- Fix queryutil row count, row-size accounting, slice/order/merge paths
for nullable compact vector payloads.
- Fix storage payload reader/writer, row serde, record builder metadata,
data sorter, and insert-data merge for nullable compact vectors.
- Fix parquet sparse vector import and numpy nullable dense vector
`ValidData` generation.
- Fix C++ growing flush, external take output, result merge, and bitset
transform for nullable compact vector rows.
- Skip 0-row Knowhere build for all-null nullable vectors while
preserving valid-data-only index metadata.
- Keep nullable `ArrayOfVector` on master on its existing row-dense
path; only V1 storage rejects nullable `ArrayOfVector`.
- Add regression coverage in Go unit tests, C++ gtests, and go-client
nullable vector e2e scenarios.

## Review notes

- Review 3 / 4 - master supports nullable `ArrayOfVector`. This PR no
longer rejects it at schema, proxy, JSON/CSV/Parquet import, or
add-field validation. `ArrayOfVector` remains excluded from
`IsSupportedNullableVectorType` because that helper means compact
nullable vector payloads; `ArrayOfVector` uses row-dense data with empty
placeholder rows for null logical rows.
- Review 10 - nullable sparse vector paths were audited around the
compact layout: `ValidData` is logical-row sized, while sparse
`Contents` only stores valid physical rows.
Reader/writer/serde/import/query reorder paths derive
logical-to-physical offsets from `ValidData` instead of indexing sparse
contents by logical row.
- Storage V1 note - nullable `ArrayOfVector` is still rejected only in
V1 storage-format paths. That does not remove master V2 nullable
`ArrayOfVector` support.

## Test plan

- [x] `git diff --check HEAD`
- [x] `make static-check`
- [x] `cmake --build cmake_build --target all_tests -j8`
- [x] C++ `all_tests` targeted nullable vector / nullable VectorArray
cases
- [x] `go test -count=1 ./internal/util/queryutil`
- [x] `cd pkg && go test -count=1 ./util/typeutil`
- [x] `cd pkg && go test -count=1 ./util/funcutil`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/util/importutilv2/parquet`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/proxy -run
'Test_validateUtil_checkArrayOfVectorFieldData|TestFillWithNullValue_Geometry|Test_validateUtil_checkAligned|Test_validateUtil_Validate'`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/datanode/importv2`
- [x] `go test -tags dynamic,test -gcflags="all=-N -l" -count=1
./internal/storage -run '<nullable-vector targeted tests>'`

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Jun 10, 2026
issue: milvus-io#45881
pr: milvus-io#49924

The 3.0 external take path normalizes dense byte-vector fields
into Arrow binary arrays before converting them back to DataArray.
ArrowToDataArray only handled FloatVector, so BinaryVector,
Float16Vector, BFloat16Vector, and Int8Vector outputs lost their
payload when callers took the non-fallback path.

This backport adds the dense byte-vector conversion cases from the
master fix and keeps nullable payloads compact by copying only valid
physical rows while preserving logical valid_data. The 3.0 branch does
not include the full master change set; sparse-vector support is left
out because this PR only restores the dense external-table take output
needed by the 3.0 feature branch.

Validation:
- source ~/.profile && make build-cpp-with-unittest
- all_tests with six ExternalTakeTest nullable vector take cases
- git diff --check
- reset Milvus and run the focused Go client e2e:
  MINIO_ADDRESS=localhost:9000 MINIO_ACCESS_KEY=minioadmin
  MINIO_SECRET_KEY=minioadmin MINIO_BUCKET=a-bucket
  MINIO_ROOT_PATH=files go test -tags dynamic,test ./testcases
  -run '^TestExternalCollectionFloat16VectorTakeOutput$'
  -addr=http://localhost:19530 -timeout 10m -count=1 -v
  Result: PASS
  ok github.com/milvus-io/milvus/tests/go_client/testcases 6.332s

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Jun 11, 2026
issue: milvus-io#45881
pr: milvus-io#49924

The 3.0 external take path normalizes dense byte-vector fields
into Arrow binary arrays before converting them back to DataArray.
ArrowToDataArray only handled FloatVector, so BinaryVector,
Float16Vector, BFloat16Vector, and Int8Vector outputs lost their
payload when callers took the non-fallback path.

This backport adds the dense byte-vector conversion cases from the
master fix and keeps nullable payloads compact by copying only valid
physical rows while preserving logical valid_data. The 3.0 branch does
not include the full master change set; sparse-vector support is left
out because this PR only restores the dense external-table take output
needed by the 3.0 feature branch.

Validation:
- source ~/.profile && make build-cpp-with-unittest
- all_tests with six ExternalTakeTest nullable vector take cases
- git diff --check
- reset Milvus and run the focused Go client e2e:
  MINIO_ADDRESS=localhost:9000 MINIO_ACCESS_KEY=minioadmin
  MINIO_SECRET_KEY=minioadmin MINIO_BUCKET=a-bucket
  MINIO_ROOT_PATH=files go test -tags dynamic,test ./testcases
  -run '^TestExternalCollectionFloat16VectorTakeOutput$'
  -addr=http://localhost:19530 -timeout 10m -count=1 -v
  Result: PASS
  ok github.com/milvus-io/milvus/tests/go_client/testcases 6.332s

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Jun 11, 2026
pr: milvus-io#49924

MergeDataArray builds search output fields row by row after search
reduce. On 3.0, byte-backed vector payloads still used assign(), so
each valid row replaced the buffer written by earlier rows. A multi-row
search result could therefore return valid_data for several rows while
keeping only the last row payload.

Backport the assign-to-append merge fix from PR milvus-io#49924 to the 3.0
branch. The test block is kept aligned with the upstream regression
case; the broader unrelated nullable-vector changes from the original
PR are omitted because this PR only needs the search merge fix for
byte-backed vector outputs.

Validation:
- git diff --check
- CLANG_FORMAT=/opt/homebrew/opt/llvm@15/bin/clang-format make cppcheck
  in a temporary clean worktree with the candidate diff committed
- make build-cpp-with-unittest and ninja -C cmake_build all_tests were
  attempted, but local C++ builds fail before this test at Abseil
  std::partial_ordering/std::weak_ordering errors under C++17

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
sre-ci-robot pushed a commit that referenced this pull request Jun 12, 2026
issue: #45881
pr: #49924

## What changed

Backport only the `ArrowToDataArray` dense byte-vector conversion subset
from the master take fix
[#49924](#49924).
This is not a full backport of that PR; it copies the minimal part
needed
on 3.0 to fix external-table take output when the take path reads binary
vector payloads.

`ArrowToDataArray` now converts `BinaryVector`, `Float16Vector`,
`BFloat16Vector`, and `Int8Vector` Arrow binary payloads back into
`DataArray`.

A focused Go client e2e test now covers FP16 vector output from an
external collection through both query and search result paths.

## Root cause

The external take path normalizes dense byte-vector fields into Arrow
binary arrays. On 3.0, `ArrowToDataArray` only handled float vector
output, so binary-vector style payloads could enter the optimized take
output path without being converted back into the expected Milvus vector
`DataArray`.

For nullable fields, the fix copies only valid physical rows into the
protobuf vector payload while preserving the logical `valid_data`
bitmap.

This PR intentionally excludes the unrelated parts of the master PR,
including sparse vector support and other master-only changes.

## Validation

- `source ~/.profile && make build-cpp-with-unittest`
- `DYLD_LIBRARY_PATH=... ./internal/core/output/unittest/all_tests
--gtest_filter='ExternalTakeTest.Nullable*VectorTakeUsesCompactData:ExternalTakeTest.TryTakeForRetrieve_NullableVectorUsesCompactData:ExternalTakeTest.TryTakeForSearch_NullableVectorUsesCompactData'`
- `git diff --check`
- Reset local Milvus, started standalone with `./bin/milvus run
standalone --run-with-subprocess`, then ran:

```bash
cd tests/go_client
source ~/.profile
source ../../scripts/setenv.sh
MINIO_ADDRESS=localhost:9000 \
MINIO_ACCESS_KEY=minioadmin \
MINIO_SECRET_KEY=minioadmin \
MINIO_BUCKET=a-bucket \
MINIO_ROOT_PATH=files \
go test -tags dynamic,test ./testcases \
  -run '^TestExternalCollectionFloat16VectorTakeOutput$' \
  -addr=http://localhost:19530 \
  -timeout 10m \
  -count=1 -v
```

Result:

```text
Job 466897675154097263: state=RefreshCompleted progress=100%
Vector index created on embedding
Vector index created on fp16_vec
Collection loaded
Response method=Query includes fp16_vec Float16Vector "KissLS4vMDE="
Response method=Search includes fp16_vec Float16Vector "KissLS4vMDE="
--- PASS: TestExternalCollectionFloat16VectorTakeOutput (5.15s)
PASS
ok github.com/milvus-io/milvus/tests/go_client/testcases 6.332s
```

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved area/dependency Pull requests that update a dependency file area/test ci-passed dco-passed DCO check passed. kind/bug Issues or changes related a bug lgtm sig/testing size/XXL Denotes a PR that changes 1000+ lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants