-
Notifications
You must be signed in to change notification settings - Fork 225
fix: Re-implement some Parquet decode methods without copy_nonoverlapping
#558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Comparison of safe code vs using unsafe read/write aligned: pub fn copy_i32_to_i16(src: &[u8], dst: &mut [u8], num: usize) {
debug_assert!(src.len() >= num * 4, "Source slice is too small");
debug_assert!(dst.len() >= num * 2, "Destination slice is too small");
for i in 0..num {
let i32_value =
i32::from_le_bytes([src[i * 4], src[i * 4 + 1], src[i * 4 + 2], src[i * 4 + 3]]);
// Downcast to i16, potentially losing data
let i16_value = i32_value as i16;
let i16_bytes = i16_value.to_le_bytes();
dst[i * 2] = i16_bytes[0];
dst[i * 2 + 1] = i16_bytes[1];
}
}
pub fn copy_i32_to_i16_unsafe(src: &[u8], dst: &mut [u8], num: usize) {
debug_assert!(src.len() >= num * 4, "Source slice is too small");
debug_assert!(dst.len() >= num * 2, "Destination slice is too small");
let src_ptr = src.as_ptr() as *const i32;
let dst_ptr = dst.as_mut_ptr() as *mut i16;
unsafe {
for i in 0..num {
dst_ptr
.add(i)
.write_unaligned(src_ptr.add(i).read_unaligned() as i16);
}
}
}
I am going to reimplement with the unaligned approach and see if that has any safety issues |
Great benchmark Andy! |
How would one even measure picoseconds. I expect that is a pseudo µ ? So |
copy_nonoverlapping
Yes, it really is. Here is the criterion code for displaying the time units. pub fn time(ns: f64) -> String {
if ns < 1.0 {
format!("{:>6} ps", short(ns * 1e3))
} else if ns < 10f64.powi(3) {
format!("{:>6} ns", short(ns))
} else if ns < 10f64.powi(6) {
format!("{:>6} µs", short(ns / 1e3))
} else if ns < 10f64.powi(9) {
format!("{:>6} ms", short(ns / 1e6))
} else {
format!("{:>6} s", short(ns / 1e9))
}
} |
I am questioning the earlier results now. Latest benchmark comparing safe version of
|
That makes more sense to me. It's hard to measure in picoseconds if the system clock only provides nanosecond granularity. |
Once tests pass, I'll see if I can turn this back into macros to avoid so much boilerplate |
That time was based on running ~75 million iterations and taking the average time |
@parthchandra This is ready for review now |
@@ -182,25 +182,6 @@ make_plain_dict_impl! { Int8Type, UInt8Type, Int16Type, UInt16Type, Int32Type, U | |||
make_plain_dict_impl! { Int32DateType, Int64Type, FloatType, FLBAType } | |||
make_plain_dict_impl! { DoubleType, Int64TimestampMillisType, Int64TimestampMicrosType } | |||
|
|||
impl PlainDecoding for Int32To64Type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated macro is based on this impl and has replaced this impl
dst_offset += $type_size; | ||
} | ||
let dst_offset = dst.num_values * $type_width; | ||
$copy_fn(&src.data[src.offset..], &mut dst_slice[dst_offset..], num); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that we are calling a macro-generated function here
core/src/parquet/read/values.rs
Outdated
|
||
// unsigned type require double the width and zeroes are written for the second half | ||
// perhaps because they are implemented as the next size up signed type? | ||
impl_plain_decoding_int!(UInt8Type, copy_i32_to_u8, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, why? I thought Int8 and UInt8 both are 8-bit width, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This confused me as well, but I am using the same type widths as the original code:
make_int_variant_impl!(Int8Type, i8, 1);
make_int_variant_impl!(UInt8Type, u8, 2);
make_int_variant_impl!(Int16Type, i16, 2);
make_int_variant_impl!(UInt16Type, u16, 4);
make_int_variant_impl!(UInt32Type, u32, 8);
This seems to be specific to the way Parquet represents unsigned types. I think Parquet only supports signed types so u8 becomes i16, u16 becomes i32, and so on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I forgot it. As Java doesn't support unsigned integer types, we need to use wider type to support it. For example, UInt8Array is read as Short, UInt32Array is read as Long, etc.
core/src/parquet/read/values.rs
Outdated
generate_cast_to_unsigned!(copy_i32_to_u32, i32, u32, 0_u32); | ||
|
||
macro_rules! generate_cast_to_signed { | ||
($name: ident, $src_type:ty, $dst_type:ty, $zero_value:expr) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems generate_cast_to_signed
doesn't need zero_value
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
…ain_decoding_int to the original name of make_int_variant_impl
So unsafe version is slower? EDIT: I compared it with wrong item. It should compare with |
The newer Rust version requires alignments on EDIT: I saw it now: https://doc.rust-lang.org/std/ptr/fn.copy_nonoverlapping.html
|
unsafe { | ||
for i in 0..num { | ||
dst_ptr | ||
.add(2 * i) | ||
.write_unaligned(src_ptr.add(i).read_unaligned() as $dst_type); | ||
// write zeroes | ||
dst_ptr.add(2 * i + 1).write_unaligned($zero_value); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is any possible we make source and destination aligned? So we can still use unsafe copy_nonoverlapping
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would involve making an extra copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For addressing the issue of copy_nonoverlapping
with unaligned allocations, this looks good to me. Although I'm wondering if we can enhance alignment of these allocations.
Just to be clear, this was always the requirement, but Rust 1.78 added debug assertions to catch violations. |
Thanks for the review @viirya |
…pping` (apache#558) * Re-implement int decode methods using safe code * fix * fix tests * more tests * fix * fix * re-implement using unsafe read/write unaligned and add benchmark * lint * macros * more macros * combine macros * replace another impl with the macro * fix a regression * remove zero_value arg from generate_cast_to_signed and rename impl_plain_decoding_int to the original name of make_int_variant_impl
Which issue does this PR close?
Part of #557
Rationale for this change
Parquet decoding when converting between different integral types was using
copy_nonoverlapping
without meeting the precondition that both pointers were properly aligned.What changes are included in this PR?
copy_nonoverlapping
How are these changes tested?