`variant_schema`: Initial implementation by sdf-jkl · Pull Request #24 · datafusion-contrib/datafusion-variant

sdf-jkl · 2025-12-22T18:05:15Z

Which issue does this PR close?

closes variant_schema: Initial implementation #5.

Rationale for this change

Tried to implement schema_of_variant and schema_of_variant_agg into a single udf - variant_schema

What changes are included in this PR?

Adding a new ScalarUdf variant_schema to extract aggregate schema from a scalar Variant or VariantArray

Are these changes tested?

Tested scalar values with most types and arrays with conflicting and valid schemas.
Added some sqllogictests

sdf-jkl

A quick run after submitting the PR

sdf-jkl · 2025-12-22T18:19:07Z

src/variant_schema.rs

+///   - A field becomes VARIANT if its values are incompatible
+///
+#[derive(Debug, PartialEq, Eq, Clone)]
+enum VariantSchema {


Due to Variant not being a first-class type in Arrow I had to use enums to represent extracted types.

sdf-jkl · 2025-12-22T18:19:42Z

src/variant_schema.rs

+        Variant::TimestampNtzMicros(_) => DataType::Timestamp(TimeUnit::Microsecond, None),
+        Variant::TimestampNanos(_) => DataType::Timestamp(TimeUnit::Nanosecond, Some("utc".into())),
+        Variant::TimestampNtzNanos(_) => DataType::Timestamp(TimeUnit::Nanosecond, None),
+        _ => unreachable!("Should be only applied to Primitive Variant, not Object or List"),


Should probably remove unreachable!

src/variant_schema.rs

…on-variant into schema

sdf-jkl · 2026-02-27T21:31:26Z

@friendlymatthew I feel like this is such a mess.

I should split this into multiple PRs handling just the initial implementations of variant_schema and variant_schema_agg functions. Later add extras on top of them.

The order should be variant_schema -> variant_schema_agg -> type widening, etc.

friendlymatthew · 2026-03-02T10:03:34Z

@friendlymatthew I feel like this is such a mess.

I should split this into multiple PRs handling just the initial implementations of variant_schema and variant_schema_agg functions. Later add extras on top of them.

The order should be variant_schema -> variant_schema_agg -> type widening, etc.

Hi, can you explain why we need type widening?

sdf-jkl · 2026-03-02T14:24:51Z

From databricks docs for schema_of_variant_agg:

When two fields with the same name have a different type across records, Databricks uses the least common type. When no such type exists, the type is derived as a VARIANT. For example, INT and DOUBLE become DOUBLE, while TIMESTAMP and STRING become VARIANT.

We also want to do the same thing
for VariantList in the scalar version variant_schema.

friendlymatthew · 2026-03-02T14:43:06Z

From databricks docs for schema_of_variant_agg:

When two fields with the same name have a different type across records, Databricks uses the least common type. When no such type exists, the type is derived as a VARIANT. For example, INT and DOUBLE become DOUBLE, while TIMESTAMP and STRING become VARIANT.

We also want to do the same thing for VariantList in the scalar version variant_schema.

Makes sense to me. Let me know how you want to proceed. I'm fine with breaking this up into smaller PRs, pushing the scalar version first, then working on variant_agg and type widening later

sdf-jkl · 2026-03-02T14:49:14Z

I'll do that. Otherwise it's too much code to review.

sdf-jkl added 9 commits December 16, 2025 16:16

Single value schema works

22aaac0

ehh...

8eea36a

sort of works

2563ce8

replaced primitive enum with arrow DataType

4bb2f5a

Fully getting rid of custom structs/enums

7a2188f

replaced primitive enum with arrow DataType

09e0b92

sort of works

1a10082

Somewhat working code

800c0ba

cargo fmt

fdf831e

sdf-jkl mentioned this pull request Dec 22, 2025

variant_schema: Initial implementation #5

Open

sdf-jkl commented Dec 22, 2025

View reviewed changes

sdf-jkl added 18 commits December 28, 2025 23:34

add sqllogictests

3ee27ca

Splitting the function in two

3ff9f1a

Split functions work + sqllogictests

539fe1b

cargo fmt

97b0ddb

Redo variant_schema func

a78b9a6

Add variant_schema array test

6d461ac

Encode state VariantSchema to bytes state for AUDF

b5c3ec2

agg function quick stop if schema is Variant

bdf59b0

early fold for Variant in list schemas

30c70c4

cargo fmt

b593aea

tests cleanup

3db8535

Merge branch 'main' of https://github.com/datafusion-contrib/datafusi…

f5eb78d

…on-variant into schema

update dependency

a041c8f

Move date tests to slt

f395e40

Merge branch 'main' of https://github.com/datafusion-contrib/datafusi…

a6ee1d3

…on-variant into schema

more type widening support

6e9e73c

cargo fmt

3599850

use consts for decimal precision

825dd1b

sdf-jkl added 2 commits February 27, 2026 16:41

Avoid clones in udaf

8f96d0a

fmt

1dff2b6

sdf-jkl closed this Mar 2, 2026

sdf-jkl reopened this Mar 2, 2026

sdf-jkl closed this Mar 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`variant_schema`: Initial implementation#24

`variant_schema`: Initial implementation#24
sdf-jkl wants to merge 29 commits intodatafusion-contrib:mainfrom
sdf-jkl:schema

sdf-jkl commented Dec 22, 2025 •

edited

Loading

Uh oh!

sdf-jkl left a comment

Uh oh!

sdf-jkl Dec 22, 2025

Uh oh!

sdf-jkl Dec 22, 2025

Uh oh!

Uh oh!

sdf-jkl commented Feb 27, 2026

Uh oh!

friendlymatthew commented Mar 2, 2026

Uh oh!

sdf-jkl commented Mar 2, 2026

Uh oh!

friendlymatthew commented Mar 2, 2026

Uh oh!

sdf-jkl commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sdf-jkl commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Uh oh!

sdf-jkl left a comment

Choose a reason for hiding this comment

Uh oh!

sdf-jkl Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

sdf-jkl Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sdf-jkl commented Feb 27, 2026

Uh oh!

friendlymatthew commented Mar 2, 2026

Uh oh!

sdf-jkl commented Mar 2, 2026

Uh oh!

friendlymatthew commented Mar 2, 2026

Uh oh!

sdf-jkl commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sdf-jkl commented Dec 22, 2025 •

edited

Loading