Skip to content

RFC: index-resident metadata prefilter for vchordrq via INCLUDE columns #457

@spenc-r

Description

@spenc-r

Background

We run vector search over a ~6.2 M-row corpus of vector(1024)
candidates on PostgreSQL 18 with VectorChord. Most of our production
queries are cohort-shaped: a vector ORDER BY combined with several
mandatory equality / range / bitmask predicates over fixed-width
columns (feed, status, visibility, geo, time bucket, etc.). The
predicates typically cut the working set to 5–30 % of rows, but the
ORDER BY still needs ANN over the survivors.

Stock vchordrq.prefilter (the existing prefilter from #247, thanks!)
already helps by letting us avoid quicksort and scan the index. But
it's a heap-side prefilter: the IVF scan returns candidates, the
index fetches each candidate's full f32 vector, then the heap row is
checked, and rejected candidates' vector reads are wasted I/O.

For us this matters most on memory-constrained instances. With a 33 GB
vchord index that doesn't fit in shared_buffers, every wasted vector
fetch is a random page fault on the index, and at the cohort
selectivities we see (~10 %), that means 90 % of the IVF candidates
get read, scored, and discarded.

Proposed feature

Allow INCLUDE columns of bigint type on vchordrq indexes, store their
values in the H0 leaf tuples, and add a candidate-filter callback that
runs before the vector page fetch. Introduce a GUC
vchordrq.metadata_prefilter with three modes: off, reject_only
(definitely-false predicates short-circuit further work), and
covered_skip_heap (lossless predicates can also skip the heap
recheck).

In the IVF scan: for each H0 candidate, evaluate the compiled metadata
predicates against the candidate's INCLUDE values; if any predicate
proves "definitely false", skip the candidate without faulting in its
vector. Otherwise proceed as today (vector fetch + rerank + heap
recheck if the predicate is lossy).

Prototype + numbers

We have a working prototype in a private fork (it carries some
company-specific tooling and configuration in its history that we can't
publish). It implements all of the above. Happy to share the repo
privately with a maintainer if that would help review; we'd rather not
upstream-PR off that branch directly anyway, since it hardcodes our
domain's column names and would need to be generalized first.

Two boxes, two bottleneck regimes, identity recall verified vs the
no-metadata path on 20 random queries:

Clean attribution test (32 GiB / 4 vCPU, CPU-bound).
Stock 1.0.0 vs the prototype's metadata path on the same hardware,
with vchordrq.prefilter=on, enable_sort=off, and read_stream IO
applied to both runs (the only delta is the metadata feature itself
plus the wide-INCLUDE index):

config peak QPS p95 at peak
stock 1.0.0, plain index, GUCs tuned 5.68 (c=8) 2 324 ms
prototype, wide-INCLUDE + metadata_prefilter=reject_only 6.84 (c=4) 836 ms

That's +20 % peak QPS, −64 % p95. The QPS gain is modest because
this box is CPU-bound at c≥4 with only 4 vCPUs. The latency gain is
where the metadata reject earns its keep: sub-second p95 at the same
concurrency where stock-tuned is still over 2 seconds.

Memory-bound regime (16 vCPU host, 16 GiB cgroup cap). Same
prototype, toggling whether the metadata predicates are emitted:

config peak QPS @ c=16 p95 @ c=16
prototype, no metadata predicates emitted 11.86 2 635 ms
prototype, wide-metadata reject 37.79 732 ms

+218 % peak QPS, −72 % p95 under memory pressure. The gain is much
larger here because rejecting a candidate before its f32 vector page
faults in is exactly the right thing to do when shared_buffers can't
hold the index. The 33 GB index on this corpus is dominated by per-row
f32 vector copies on VectorTuple pages. Those copies exist solely so
rerank can avoid a heap fetch, and they're exactly what the metadata
reject lets us avoid reading in the first place.

A side note from the attribution work: the prototype's non-metadata
improvements (read_stream IO, prefetcher batching, search-loop
refactor) produced essentially zero throughput delta over stock 1.0.0
with the same GUCs applied. The metadata-prefilter feature is the only
novel contribution worth upstreaming.

Open questions for the maintainers

A couple of decisions are upstream-shaped and we'd value your steer
before writing the public PR:

  1. How users mark "lossless" vs "lossy" INCLUDE columns.
    Some metadata predicates prove the heap row matches (e.g. a
    stored bitmask passing (col & mask) = mask is exact); some only
    accelerate the reject case (e.g. a 64-bit hash where a non-match
    is conclusive but a match is "probably matches, recheck the heap").
    Lossless predicates let us skip the heap recheck; lossy ones can't.
    The most natural way to surface this in PG would be per-column
    opclass options. We'd define two bigint opclasses,
    bigint_lossless_ops and bigint_lossy_ops, and use them on the
    INCLUDE columns:

    CREATE INDEX … USING vchordrq (vec vector_ip_ops)
      INCLUDE (flags_col bigint_lossless_ops,
               hash_col  bigint_lossy_ops);

    That keeps the SQL declarative and uses standard PG plumbing. Any
    pushback on that approach, or a different shape you'd prefer?

  2. Tuple format change. The prototype adds a versioned metadata
    tail to FrozenTuple and AppendableTuple
    (METADATA_TAIL_VERSION = 2). For indexes built without INCLUDE
    columns, attr_count = 0 ⇒ no tail emitted ⇒ existing 1.x indexes
    read identically. Is that backward-compat shape acceptable, or
    would you rather the entire feature be gated behind a new index
    reloption / opclass so the on-disk format is unchanged for existing
    users?

  3. Tests. We'd plan to add ≥6 sqllogictest files in
    tests/vchordrq/ covering eq / IN / range / bitmask / NULL / mixed
    with ORDER BY, plus an identity-recall test against the
    no-metadata path. Anything else you'd want before review?

If this is on your roadmap or you'd rather take a different shape,
please let us know. We'd rather align early than rebuild later.

Acknowledgements

The prototype was implemented by @SamORichards (cc'd). I (@spenc-r) am happy to do the upstream-generalization work and own the
public PR; planning to add Sam as Co-authored-by: on each commit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions