Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions arrow-buffer/src/buffer/mutable.rs
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,10 @@ impl MutableBuffer {
// exits.
#[inline(always)]
pub fn reserve(&mut self, additional: usize) {
let required_cap = self.len + additional;
let required_cap = self
.len
.checked_add(additional)
.expect("buffer length overflow");
if required_cap > self.layout.size() {
let new_capacity = bit_util::round_upto_multiple_of_64(required_cap);
let new_capacity = std::cmp::max(new_capacity, self.layout.size() * 2);
Expand Down Expand Up @@ -583,7 +586,11 @@ impl MutableBuffer {
/// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
#[inline]
pub fn extend_zeros(&mut self, additional: usize) {
self.resize(self.len + additional, 0);
let new_len = self
.len
.checked_add(additional)
.expect("buffer length overflow");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing internal panics is not ideal, tho clearly better than UB. How should library users code defensively to avoid panic, and how can we make their life easier?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree panics are not good and should be avoided when possible

We had a bit of a philosophical debate about this earlier (when to panic vs Error) and the conclusion we came to got codified in this doc, which I think is relevant here: https://github.com/apache/arrow-rs#guidelines-for-panic-vs-result

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, do we force all downstream consumers to check for what is very likely an error that will never happen? I think the answer depends on opinion

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... arguably the guidelines say we should be returning an error here (because asking for too many entries is a form of invalid input), but it certainly complicates the API. I don't know if there's a way to provide a fallible version of this API, for paranoid consumers to use?

I do agree it should be a very rare error, but I've also been unpleasantly surprised at how often 32-bit StringArray offsets blow up in practice.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are 64-bit values right? Probably ok to leave it as a panic because last I knew most hardware cannot physically index more than 48 bits of virtual memory and most operating systems cap the size of any one memory mapping to a few TB of contiguous virtual address space (even one not backed by memory).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think usize is 64 bits on 64-bit architectures and 32 bits on 32-bit architectures

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree it should be a very rare error, but I've also been unpleasantly surprised at how often 32-bit StringArray offsets blow up in practice.

Yeah, i32 (2GB strings) is shockingly common

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there's a way to provide a fallible version of this API, for paranoid consumers to use?

I mean we could add a try_extend_zeros or something 🤔

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree it might be nice to add some try_ functions and then document that the current versions might panic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed a ticket to consider adding new try_ variants

I also pushed a bunch of documentation updates to this PR to document that the APIs panic in certain cases

self.resize(new_len, 0);
}

/// # Safety
Expand Down
24 changes: 24 additions & 0 deletions arrow-buffer/src/builder/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -439,4 +439,28 @@ mod tests {
let slice = builder.as_slice_mut();
assert_eq!(slice.len(), 222);
}

#[test]
#[should_panic(expected = "buffer length overflow")]
fn reserve_length_overflow() {
let mut builder = BufferBuilder::<u8>::new(1);
builder.append(0);
builder.reserve(usize::MAX);
}

#[test]
#[should_panic(expected = "buffer length overflow")]
fn append_n_zeroed_length_overflow() {
let mut builder = BufferBuilder::<u64>::new(1);
builder.append_n_zeroed(1);
builder.append_n_zeroed(usize::MAX / mem::size_of::<u64>());
}

#[test]
#[should_panic(expected = "buffer length overflow")]
fn advance_length_overflow() {
let mut builder = BufferBuilder::<u64>::new(1);
builder.advance(1);
builder.advance(usize::MAX / mem::size_of::<u64>());
}
}
Loading