variant_schema: Initial implementation#24
variant_schema: Initial implementation#24sdf-jkl wants to merge 29 commits intodatafusion-contrib:mainfrom
variant_schema: Initial implementation#24Conversation
sdf-jkl
left a comment
There was a problem hiding this comment.
A quick run after submitting the PR
src/variant_schema.rs
Outdated
| /// - A field becomes VARIANT if its values are incompatible | ||
| /// | ||
| #[derive(Debug, PartialEq, Eq, Clone)] | ||
| enum VariantSchema { |
There was a problem hiding this comment.
Due to Variant not being a first-class type in Arrow I had to use enums to represent extracted types.
| Variant::TimestampNtzMicros(_) => DataType::Timestamp(TimeUnit::Microsecond, None), | ||
| Variant::TimestampNanos(_) => DataType::Timestamp(TimeUnit::Nanosecond, Some("utc".into())), | ||
| Variant::TimestampNtzNanos(_) => DataType::Timestamp(TimeUnit::Nanosecond, None), | ||
| _ => unreachable!("Should be only applied to Primitive Variant, not Object or List"), |
There was a problem hiding this comment.
Should probably remove unreachable!
|
@friendlymatthew I feel like this is such a mess. I should split this into multiple PRs handling just the initial implementations of The order should be |
Hi, can you explain why we need type widening? |
|
From databricks docs for
We also want to do the same thing |
Makes sense to me. Let me know how you want to proceed. I'm fine with breaking this up into smaller PRs, pushing the scalar version first, then working on |
|
I'll do that. Otherwise it's too much code to review. |
Which issue does this PR close?
variant_schema: Initial implementation #5.Rationale for this change
Tried to implement
schema_of_variantandschema_of_variant_agginto a single udf -variant_schemaWhat changes are included in this PR?
Adding a new
ScalarUdfvariant_schemato extract aggregate schema from a scalarVariantorVariantArrayAre these changes tested?