fix: Re-implement some Parquet decode methods without `copy_nonoverlapping` #558

andygrove · 2024-06-11T19:40:11Z

Which issue does this PR close?

Part of #557

Rationale for this change

Parquet decoding when converting between different integral types was using copy_nonoverlapping without meeting the precondition that both pointers were properly aligned.

What changes are included in this PR?

Rewrite without using copy_nonoverlapping
Added unit tests in Rust

How are these changes tested?

New tests in Rust
Existing tests in Scala

andygrove · 2024-06-11T21:39:23Z

Comparison of safe code vs using unsafe read/write aligned:

pub fn copy_i32_to_i16(src: &[u8], dst: &mut [u8], num: usize) {
    debug_assert!(src.len() >= num * 4, "Source slice is too small");
    debug_assert!(dst.len() >= num * 2, "Destination slice is too small");

    for i in 0..num {
        let i32_value =
            i32::from_le_bytes([src[i * 4], src[i * 4 + 1], src[i * 4 + 2], src[i * 4 + 3]]);

        // Downcast to i16, potentially losing data
        let i16_value = i32_value as i16;
        let i16_bytes = i16_value.to_le_bytes();

        dst[i * 2] = i16_bytes[0];
        dst[i * 2 + 1] = i16_bytes[1];
    }
}

pub fn copy_i32_to_i16_unsafe(src: &[u8], dst: &mut [u8], num: usize) {
    debug_assert!(src.len() >= num * 4, "Source slice is too small");
    debug_assert!(dst.len() >= num * 2, "Destination slice is too small");

    let src_ptr = src.as_ptr() as *const i32;
    let dst_ptr = dst.as_mut_ptr() as *mut i16;
    unsafe {
        for i in 0..num {
            dst_ptr
                .add(i)
                .write_unaligned(src_ptr.add(i).read_unaligned() as i16);
        }
    }
}

parquet_decode/decode_i32_to_i16_safe
                        time:   [839.29 ns 839.57 ns 839.93 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
parquet_decode/decode_i32_to_i16_unsafe
                        time:   [68.269 ps 68.400 ps 68.593 ps]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

I am going to reimplement with the unaligned approach and see if that has any safety issues

parthchandra · 2024-06-11T21:49:35Z

Great benchmark Andy!
Is that really 839 nanoseconds versus 68 picoseconds or am I interpreting the units incorrectly?

parthchandra · 2024-06-11T21:52:41Z

How would one even measure picoseconds. I expect that is a pseudo µ ? So micro not pico ?

andygrove · 2024-06-11T23:21:33Z

Is that really 839 nanoseconds versus 68 picoseconds or am I interpreting the units incorrectly?

Yes, it really is. Here is the criterion code for displaying the time units.

pub fn time(ns: f64) -> String {
    if ns < 1.0 {
        format!("{:>6} ps", short(ns * 1e3))
    } else if ns < 10f64.powi(3) {
        format!("{:>6} ns", short(ns))
    } else if ns < 10f64.powi(6) {
        format!("{:>6} µs", short(ns / 1e3))
    } else if ns < 10f64.powi(9) {
        format!("{:>6} ms", short(ns / 1e6))
    } else {
        format!("{:>6} s", short(ns / 1e9))
    }
}

andygrove · 2024-06-11T23:35:10Z

Is that really 839 nanoseconds versus 68 picoseconds or am I interpreting the units incorrectly?

I am questioning the earlier results now. Latest benchmark comparing safe version of decode_i32_to_u16 vs unsafe version of decode_i32_to_i16. Now seeing 67 ns not 68 ps. Curious.

parquet_decode/decode_i32_to_i16
                        time:   [66.709 ns 67.213 ns 67.711 ns]
                        change: [-2.4520% -1.0484% +0.3895%] (p = 0.15 > 0.05)
                        No change in performance detected.
parquet_decode/decode_i32_to_u16
                        time:   [589.31 ns 590.62 ns 592.21 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

parthchandra · 2024-06-12T01:00:52Z

Now seeing 67 ns not 68 ps. Curious

That makes more sense to me. It's hard to measure in picoseconds if the system clock only provides nanosecond granularity.

andygrove · 2024-06-12T01:20:48Z

Once tests pass, I'll see if I can turn this back into macros to avoid so much boilerplate

andygrove · 2024-06-12T01:48:08Z

That makes more sense to me. It's hard to measure in picoseconds if the system clock only provides nanosecond granularity.

That time was based on running ~75 million iterations and taking the average time

andygrove · 2024-06-12T13:03:57Z

@parthchandra This is ready for review now

andygrove · 2024-06-12T14:24:57Z

core/src/parquet/read/values.rs

@@ -182,25 +182,6 @@ make_plain_dict_impl! { Int8Type, UInt8Type, Int16Type, UInt16Type, Int32Type, U
 make_plain_dict_impl! { Int32DateType, Int64Type, FloatType, FLBAType }
 make_plain_dict_impl! { DoubleType, Int64TimestampMillisType, Int64TimestampMicrosType }

-impl PlainDecoding for Int32To64Type {


The updated macro is based on this impl and has replaced this impl

andygrove · 2024-06-12T14:25:35Z

core/src/parquet/read/values.rs

-                    dst_offset += $type_size;
-                }
+                let dst_offset = dst.num_values * $type_width;
+                $copy_fn(&src.data[src.offset..], &mut dst_slice[dst_offset..], num);


note that we are calling a macro-generated function here

viirya · 2024-06-13T05:36:00Z

core/src/parquet/read/values.rs

+
+// unsigned type require double the width and zeroes are written for the second half
+// perhaps because they are implemented as the next size up signed type?
+impl_plain_decoding_int!(UInt8Type, copy_i32_to_u8, 2);


Hmm, why? I thought Int8 and UInt8 both are 8-bit width, no?

https://doc.rust-lang.org/book/ch03-02-data-types.html

This confused me as well, but I am using the same type widths as the original code:

make_int_variant_impl!(Int8Type, i8, 1); make_int_variant_impl!(UInt8Type, u8, 2); make_int_variant_impl!(Int16Type, i16, 2); make_int_variant_impl!(UInt16Type, u16, 4); make_int_variant_impl!(UInt32Type, u32, 8);

This seems to be specific to the way Parquet represents unsigned types. I think Parquet only supports signed types so u8 becomes i16, u16 becomes i32, and so on

Ah, I see. I forgot it. As Java doesn't support unsigned integer types, we need to use wider type to support it. For example, UInt8Array is read as Short, UInt32Array is read as Long, etc.

viirya · 2024-06-13T05:43:08Z

core/src/parquet/read/values.rs

+generate_cast_to_unsigned!(copy_i32_to_u32, i32, u32, 0_u32);
+
+macro_rules! generate_cast_to_signed {
+    ($name: ident, $src_type:ty, $dst_type:ty, $zero_value:expr) => {


Seems generate_cast_to_signed doesn't need zero_value.

…ain_decoding_int to the original name of make_int_variant_impl

viirya · 2024-06-13T21:15:45Z

I am questioning the earlier results now. Latest benchmark comparing safe version of decode_i32_to_u16 vs unsafe version of decode_i32_to_i16. Now seeing 67 ns not 68 ps. Curious.

So unsafe version is slower?

EDIT: I compared it with wrong item. It should compare with parquet_decode/decode_i32_to_i16_safe.

viirya · 2024-06-13T21:18:31Z

Parquet decoding when converting between different integral types was using copy_nonoverlapping without meeting the precondition that both pointers were properly aligned.

The newer Rust version requires alignments on copy_nonoverlapping call?

EDIT: I saw it now: https://doc.rust-lang.org/std/ptr/fn.copy_nonoverlapping.html

Behavior is undefined if any of the following conditions are violated:

Both src and dst must be properly aligned.

viirya · 2024-06-13T21:20:56Z

core/src/parquet/read/values.rs

+            unsafe {
+                for i in 0..num {
+                    dst_ptr
+                        .add(2 * i)
+                        .write_unaligned(src_ptr.add(i).read_unaligned() as $dst_type);
+                    // write zeroes
+                    dst_ptr.add(2 * i + 1).write_unaligned($zero_value);
+                }
            }


Is any possible we make source and destination aligned? So we can still use unsafe copy_nonoverlapping?

I think that would involve making an extra copy?

viirya

For addressing the issue of copy_nonoverlapping with unaligned allocations, this looks good to me. Although I'm wondering if we can enhance alignment of these allocations.

andygrove · 2024-06-13T21:23:40Z

The newer Rust version requires alignments on copy_nonoverlapping call?

Just to be clear, this was always the requirement, but Rust 1.78 added debug assertions to catch violations.

andygrove · 2024-06-13T21:38:36Z

Thanks for the review @viirya

…pping` (apache#558) * Re-implement int decode methods using safe code * fix * fix tests * more tests * fix * fix * re-implement using unsafe read/write unaligned and add benchmark * lint * macros * more macros * combine macros * replace another impl with the macro * fix a regression * remove zero_value arg from generate_cast_to_signed and rename impl_plain_decoding_int to the original name of make_int_variant_impl

andygrove added 4 commits June 11, 2024 12:48

Re-implement int decode methods using safe code

6513443

fix

6ce15ef

fix tests

6ca4c6b

more tests

378aabd

andygrove changed the title ~~fix: Re-implement int decode methods using safe code~~ fix: Re-implement some Parquet decode methods using safe code Jun 11, 2024

andygrove added 2 commits June 11, 2024 13:50

fix

328c947

fix

4006317

andygrove changed the title ~~fix: Re-implement some Parquet decode methods using safe code~~ fix: Re-implement some Parquet decode methods without copy_nonoverlapping Jun 11, 2024

re-implement using unsafe read/write unaligned and add benchmark

cdf644c

andygrove marked this pull request as ready for review June 12, 2024 00:38

lint

2b6fe84

andygrove added 5 commits June 11, 2024 20:00

macros

1c6c521

more macros

5ffe1ca

combine macros

f3725a1

replace another impl with the macro

b48730c

fix a regression

32557f6

andygrove commented Jun 12, 2024

View reviewed changes

andygrove requested review from viirya and sunchao June 12, 2024 15:20

andygrove mentioned this pull request Jun 12, 2024

chore: Upgrade to Rust 1.78 and fix UB issues in unsafe code #546

Merged

andygrove requested review from kazuyukitanimura and huaxingao June 12, 2024 21:02

viirya reviewed Jun 13, 2024

View reviewed changes

remove zero_value arg from generate_cast_to_signed and rename impl_pl…

f0828ae

…ain_decoding_int to the original name of make_int_variant_impl

viirya reviewed Jun 13, 2024

View reviewed changes

viirya approved these changes Jun 13, 2024

View reviewed changes

andygrove merged commit 2fc45a2 into apache:main Jun 13, 2024
43 checks passed

andygrove deleted the parquet-safe-decode branch June 13, 2024 21:38

andygrove mentioned this pull request Jun 14, 2024

Remove unsafe uses of copy_nonoverlapping #557

Closed

fix: Re-implement some Parquet decode methods without copy_nonoverlapping #558

fix: Re-implement some Parquet decode methods without copy_nonoverlapping #558

Uh oh!

Conversation

andygrove commented Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove commented Jun 11, 2024

Uh oh!

parthchandra commented Jun 11, 2024

Uh oh!

parthchandra commented Jun 11, 2024

Uh oh!

andygrove commented Jun 11, 2024

Uh oh!

andygrove commented Jun 11, 2024

Uh oh!

parthchandra commented Jun 12, 2024

Uh oh!

andygrove commented Jun 12, 2024

Uh oh!

andygrove commented Jun 12, 2024

Uh oh!

andygrove commented Jun 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Jun 13, 2024

Uh oh!

andygrove commented Jun 13, 2024

Uh oh!

Uh oh!

Uh oh!

fix: Re-implement some Parquet decode methods without `copy_nonoverlapping` #558

fix: Re-implement some Parquet decode methods without `copy_nonoverlapping` #558

andygrove commented Jun 11, 2024 •

edited

Loading

viirya commented Jun 13, 2024 •

edited

Loading

viirya commented Jun 13, 2024 •

edited

Loading