Skip to content

Allow merging primitive dictionary values in concat and interleave kernels #7518

Closed
@asubiotto

Description

@asubiotto

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently, dictionary values are merged only if they are Utf8, Binary, LargeUtf8, and LargeBinary because the pointer equality closure construction in should_merge_dictionaries returns false to the outer function in the default arm:

pub fn should_merge_dictionary_values<K: ArrowDictionaryKeyType>(
dictionaries: &[&DictionaryArray<K>],
len: usize,
) -> bool {
use DataType::*;
let first_values = dictionaries[0].values().as_ref();
let ptr_eq: Box<PtrEq> = match first_values.data_type() {
Utf8 => Box::new(bytes_ptr_eq::<Utf8Type>),
LargeUtf8 => Box::new(bytes_ptr_eq::<LargeUtf8Type>),
Binary => Box::new(bytes_ptr_eq::<BinaryType>),
LargeBinary => Box::new(bytes_ptr_eq::<LargeBinaryType>),
_ => return false,

We've observed unnecessarily high memory usage when concatenating dictionaries with primitive values.

Describe the solution you'd like

should_merge_dictionaries should return true for other types as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions