Skip to content

Tracking Issue for bufreader_peek #128405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 4 tasks
lolbinarycat opened this issue Jul 30, 2024 · 19 comments
Open
2 of 4 tasks

Tracking Issue for bufreader_peek #128405

lolbinarycat opened this issue Jul 30, 2024 · 19 comments
Assignees
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC F-bufreader_peek `#![feature(bufreader_peek)]` T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@lolbinarycat
Copy link
Contributor

lolbinarycat commented Jul 30, 2024

Feature gate: #![feature(bufreader_peek)]

This is a tracking issue for the BufReader::peek method.

Public API

// std::io

impl BufReader {
    fn peek(&mut self, n: usize) -> io::Result<&[u8]>
}

Steps / History

Unresolved Questions

  • what should happen if n > capacity? options are: short slice, error, or panic.
  • should the buffer be completely filled, or should it only read the number of bytes requested?

Footnotes

  1. https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html

@lolbinarycat lolbinarycat added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jul 30, 2024
@lolbinarycat lolbinarycat changed the title Tracking Issue for XXX Tracking Issue for bufreader_peek Jul 30, 2024
@lolbinarycat
Copy link
Contributor Author

@rustbot claim

starting work on implementation

@jieyouxu jieyouxu added the F-bufreader_peek `#![feature(bufreader_peek)]` label Jul 31, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 6, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 6, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 6, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Aug 7, 2024
Rollup merge of rust-lang#128406 - lolbinarycat:bufreader_peek, r=Mark-Simulacrum

implement BufReader::peek

Part of rust-lang#128405
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Aug 7, 2024
@lolbinarycat
Copy link
Contributor Author

calling peek should allow you to call consume with the same or a smaller n value. currently consume is only guaranteed to be well-behaved when calling it after fill_buf.

workingjubilee added a commit to workingjubilee/rustc that referenced this issue Aug 29, 2024
…d, r=workingjubilee

allow BufReader::peek to be called on unsized types

rust-lang#128405
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Aug 29, 2024
…d, r=workingjubilee

allow BufReader::peek to be called on unsized types

rust-lang#128405
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Aug 30, 2024
…d, r=workingjubilee

allow BufReader::peek to be called on unsized types

rust-lang#128405
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Aug 30, 2024
…d, r=workingjubilee

allow BufReader::peek to be called on unsized types

rust-lang#128405
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Aug 31, 2024
…d, r=workingjubilee

allow BufReader::peek to be called on unsized types

rust-lang#128405
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Aug 31, 2024
…d, r=workingjubilee

allow BufReader::peek to be called on unsized types

rust-lang#128405
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 31, 2024
…d, r=workingjubilee

allow BufReader::peek to be called on unsized types

rust-lang#128405
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 31, 2024
…d, r=workingjubilee

allow BufReader::peek to be called on unsized types

rust-lang#128405
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Aug 31, 2024
Rollup merge of rust-lang#129675 - lolbinarycat:bufreader_peek_unsized, r=workingjubilee

allow BufReader::peek to be called on unsized types

rust-lang#128405
jieyouxu added a commit to jieyouxu/rust that referenced this issue Mar 5, 2025
Fix crash in BufReader::peek()

`bufreader_peek` tracking issue: rust-lang#128405

This fixes a logic error in `Buffer::read_more()` that would make `BufReader::peek()` expose uninitialized data and/or segfault if `read_more()` was called with a partially-full buffer and a non-empty inner reader.
jieyouxu added a commit to jieyouxu/rust that referenced this issue Mar 5, 2025
Fix crash in BufReader::peek()

`bufreader_peek` tracking issue: rust-lang#128405

This fixes a logic error in `Buffer::read_more()` that would make `BufReader::peek()` expose uninitialized data and/or segfault if `read_more()` was called with a partially-full buffer and a non-empty inner reader.
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Mar 7, 2025
Fix crash in BufReader::peek()

`bufreader_peek` tracking issue: rust-lang#128405

This fixes a logic error in `Buffer::read_more()` that would make `BufReader::peek()` expose uninitialized data and/or segfault if `read_more()` was called with a partially-full buffer and a non-empty inner reader.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 7, 2025
Fix crash in BufReader::peek()

`bufreader_peek` tracking issue: rust-lang#128405

This fixes a logic error in `Buffer::read_more()` that would make `BufReader::peek()` expose uninitialized data and/or segfault if `read_more()` was called with a partially-full buffer and a non-empty inner reader.
jhpratt added a commit to jhpratt/rust that referenced this issue Mar 7, 2025
Fix crash in BufReader::peek()

`bufreader_peek` tracking issue: rust-lang#128405

This fixes a logic error in `Buffer::read_more()` that would make `BufReader::peek()` expose uninitialized data and/or segfault if `read_more()` was called with a partially-full buffer and a non-empty inner reader.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Mar 7, 2025
Rollup merge of rust-lang#137832 - wgwoods:fix-bufreader-peek, r=joboet

Fix crash in BufReader::peek()

`bufreader_peek` tracking issue: rust-lang#128405

This fixes a logic error in `Buffer::read_more()` that would make `BufReader::peek()` expose uninitialized data and/or segfault if `read_more()` was called with a partially-full buffer and a non-empty inner reader.
github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this issue Mar 14, 2025
Fix crash in BufReader::peek()

`bufreader_peek` tracking issue: rust-lang#128405

This fixes a logic error in `Buffer::read_more()` that would make `BufReader::peek()` expose uninitialized data and/or segfault if `read_more()` was called with a partially-full buffer and a non-empty inner reader.
jieyouxu added a commit to jieyouxu/rust that referenced this issue Mar 16, 2025
…onsume, r=Mark-Simulacrum

doc: clarify that consume can be called after BufReader::peek

tracking issue rust-lang#128405
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Mar 16, 2025
Rollup merge of rust-lang#137890 - lolbinarycat:docs-bufreader-peek-consume, r=Mark-Simulacrum

doc: clarify that consume can be called after BufReader::peek

tracking issue rust-lang#128405
github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this issue Mar 19, 2025
…onsume, r=Mark-Simulacrum

doc: clarify that consume can be called after BufReader::peek

tracking issue rust-lang#128405
@Kixunil
Copy link
Contributor

Kixunil commented Mar 29, 2025

This came up when decoding base64 (and same thing with hex or anything similar). Some insight:

  • purpose: if the data in buffer happens to be shorter than what can be decoded then errors may cause data loss and avoiding it makes code very complex, error-prone, possibly slow and it's still imperfect - the decoder needs to keep a few bytes in an internal buffer and dropping the decoder loses it; thus a method like this is very much needed
  • yes, abstracting is needed. It's not great to make internal copies when &[u8], io::Cursor, or some kind of custom buffer is used
  • I'm not sure about panicking in peek but there should be a way to know the buffer size upfront, so the decoder constructor could panic upfront rather than in peek; note that &[u8] has infinite buffer because returning fewer bytes than requested is EOF (and can be handled by base64)

@lolbinarycat
Copy link
Contributor Author

@Kixunil you can retrieve the buffer size by calling BufReader::capacity.

I'm not sure if you're providing feedback on the feature as implemented or are advocating for its eventual stabilization.

@Kixunil
Copy link
Contributor

Kixunil commented Mar 30, 2025

I know, but I can't do that for any BufRead implementor.

I'm advocating for abstracting it into its own trait and implementing it for other types. I'm also saying "this is not useless".

@lolbinarycat
Copy link
Contributor Author

The problem is BufRead doesn't have any idea of "capacity". By making this an inherent method on BufReader we can provide stricter guarantees.

@Kixunil
Copy link
Contributor

Kixunil commented Mar 30, 2025

Yes, we would need something like:

pub trait Peek: BufRead {
    /// Returns the maximum number of bytes that can be held in an internal buffer.
    ///
    /// Returns `None` for "infinity". That doesn't mean there will be infinite bytes available. `peek` can still return fewer than requested amount of bytes if the stream is at end. It only means that any requested amount is valid.
    fn capacity(&self) -> Option<usize>;

    /// Fills the internal buffer to have *at least* `requested_bytes` bytes if possible.
    ///
    /// This will return a slice that has length of at least `requested_bytes` unless the stream reached the end or the internal buffer is full.
    /// If the internal buffer is longer the reader may request more bytes than requested an return them too.
    /// Call the consume() method to inform the reader about how many bytes were consumed.
    fn peek(&mut self, requested_bytes: usize) -> io::Result<&[u8]>;
}

// impl Peek for &[u8], Cursor, BufReader

Then the decoders would use this bound instead of purely BufRead.

@lolbinarycat
Copy link
Contributor Author

You would have to open a new ACP for that.

@fintelia
Copy link
Contributor

Sorry if this is the wrong place to ask, but have alternative names been considered?

Currently, the fill_buf method ensures that the internal buffer contains at least one byte, but notably doesn't guarantee that the buffer ends up full.

Meanwhile peek sounds like it does even less work. But actually it is a more aggressive version of fill_buf that fills the internal buffer even if there are still some bytes left in it.

I'm not sure what the most intuitive name would be, but perhaps something like fill_buf_all or fill_buf_exact?

@lolbinarycat
Copy link
Contributor Author

The difference is that fill_buf is a low-level primitive, meant as a building block for other operations, while peek is a useful function by itself.

The fact that peek technically "does more work" sometimes is irrelevant IMO, since it only does that work if it needs to in order to fulfill its semantics.

While their implementations are similar, their semantics and intended usecase are not.

Using the term "peek" for "look ahead in a stream without advancing it" is a well-established programming convention. At a quick glance i found peek functions in Go and C++ with nearly identical semantics, and I'm sure there's many more.

Rust typically follows existing naming conbentions where applicable, it's why print isn't called write_formatted.

@Kixunil
Copy link
Contributor

Kixunil commented Mar 31, 2025

I don't want to get too philosophical but since you can implement fill_buf by simply passing 1 to peek and then calling buffer I'd argue fill_buf is higher-level than peek.

Anyway, the name peek does make sense. But it's not the name I expected to mean "fill at least n bytes". I did find it by looking for exactly what it can do but it looked confusing and I had to click through and carefully read the doc to realize it's doing what I need.

But that can be improved by changing the short doc to say "Attempts to fill the buffer with at least n bytes".

@lolbinarycat
Copy link
Contributor Author

I don't want to get too philosophical but since you can implement fill_buf by simply passing 1 to peek and then calling buffer I'd argue fill_buf is higher-level than peek.

Actually a lot of low-level functions can be implemented trivially if you already have a higher-level function. If you already have an implementation for Write::write_all, that is also a valid implementation for Write::write. The important distinction is that the lower level function provides less guarantees.

Anyway, the name peek does make sense. But it's not the name I expected to mean "fill at least n bytes". I did find it by looking for exactly what it can do but it looked confusing and I had to click through and carefully read the doc to realize it's doing what I need.

I would argue this is an X/Y problem. Filling the buffer is not itself a useful operation, but is instead the means to an end. I'm curious why you would want to fill the buffer if not to look ahead in the stream?

@fintelia
Copy link
Contributor

fintelia commented Mar 31, 2025

I would argue this is an X/Y problem. Filling the buffer is not itself a useful operation, but is instead the means to an end. I'm curious why you would want to fill the buffer if not to look ahead in the stream?

It is a question of consistent terminology. The existing fill_buf method could totally have been called peek or peek_one instead given that it already returns the underlying buffer and the prior art from other languages. But knowing that there's a method that only partially fills the buffer, it is natural to look for a similarly named method that completely fills the buffer. Either peek_one/peek or fill_buf/fill_buf_all would make sense to me. But only the latter can be adopted in a backwards compatible way

@Kixunil
Copy link
Contributor

Kixunil commented Mar 31, 2025

Write::write_all, that is also a valid implementation for Write::write

Only if you're writing one byte at a time which is quite less performant compared to my peek example. Anyway, you're right and a better example is one can implement subtraction as addition with some bit operations but also addition as subtraction with some bit operations.

why you would want to fill the buffer if not to look ahead in the stream?

To have at least 4 bytes of base64 to pass into a decoding function. (The decoding function can take more but taking less would confuse it.)

The existing fill_buf method could totally have been called peek or peek_one instead given that it already returns the underlying buffer and the prior art from other languages. But knowing that there's a method that only partially fills the buffer, it is natural to look for a similarly named method that completely fills the buffer. Either peek_one/peek or fill_buf/fill_buf_all would make sense to me.

Exactly. peek makes it look like the operation is very different but they are in fact very similar.

@lolbinarycat
Copy link
Contributor Author

It is a question of consistent terminology. The existing fill_buf method could totally have been called peek or peek_one instead given that it already returns the underlying buffer and the prior art from other languages. But knowing that there's a method that only partially fills the buffer, it is natural to look for a similarly named method that completely fills the buffer. Either peek_one/peek or fill_buf/fill_buf_all would make sense to me. But only the latter can be adopted in a backwards compatible way

peek does not neccarily completely fill the buffer. Even if it did, that fact would be an internal implementation detail.

@lolbinarycat
Copy link
Contributor Author

Write::write_all, that is also a valid implementation for Write::write

Only if you're writing one byte at a time which is quite less performant compared to my peek example. Anyway, you're right and a better example is one can implement subtraction as addition with some bit operations but also addition as subtraction with some bit operations.

What I mean is that if you have a writer that never performs partial writes, you can implement write_all and write identically. Indeed, this is how the Vec<u8> impl works.

why you would want to fill the buffer if not to look ahead in the stream?

To have at least 4 bytes of base64 to pass into a decoding function. (The decoding function can take more but taking less would confuse it.)

Why not just read 4 bytes at a time?

The existing fill_buf method could totally have been called peek or peek_one instead given that it already returns the underlying buffer and the prior art from other languages. But knowing that there's a method that only partially fills the buffer, it is natural to look for a similarly named method that completely fills the buffer. Either peek_one/peek or fill_buf/fill_buf_all would make sense to me.

Exactly. peek makes it look like the operation is very different but they are in fact very similar.

Aren't most operations on buffered streams very similar? peek could just as easily be called read_no_consume, and that would be just as correct as fill_buf_at_least. Should we also rename read to fill_buf_and_consume, since that's what it's equivalent to?

Even if fill_buf_at_least was a better name (which I don't believe it is), is it so much better that it is worth it to back through the ACP process, rename the feature, and break all code that is currently using peek?

Even if I wanted it renamed, I have no authority in that decision.

The only productive options you have at this point are to either to suggest a documentation change that clarifies this while also matching the semantics of peek, or to file an ACP.

@Kixunil
Copy link
Contributor

Kixunil commented Apr 1, 2025

Why not just read 4 bytes at a time?

  • Issues around zero-initializing the buffer
  • Intermediate copy even when slice is used (might be optimizable, IDK)
  • Cannot SIMD
  • If the reader is dyn then each chunk incurs the cost of dynamic dispatch

Even if fill_buf_at_least was a better name (which I don't believe it is), is it so much better that it is worth it to back through the ACP process, rename the feature, and break all code that is currently using peek?

I'm not excited about either of those but I could imagine doing so if there is any better.

@the8472
Copy link
Member

the8472 commented Apr 29, 2025

ACP 569 which proposes to generalize this to a trait also proposes a bit more general API in that the peek method returns "at least n" rather than "exactly n" bytes.
It seems like peek could get that generalization too.

@lolbinarycat
Copy link
Contributor Author

@the8472 yeah, I didn't consider stuff like slices when I was writing my original proposal, eliminating that copy within generic code does seem worthwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC F-bufreader_peek `#![feature(bufreader_peek)]` T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants