Skip to content

proposal: os: add iterator variant of File.ReadDir #70084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bradfitz opened this issue Oct 28, 2024 · 39 comments
Open

proposal: os: add iterator variant of File.ReadDir #70084

bradfitz opened this issue Oct 28, 2024 · 39 comments
Labels
Milestone

Comments

@bradfitz
Copy link
Contributor

Proposal Details

Today we have:

https://pkg.go.dev/os#File.ReadDir etc

func (f *File) ReadDir(n int) ([]DirEntry, error)

That n feels so antiquated now that we have iterators! I propose that we add an iterator-based variant. 😄

/cc @neild

@gopherbot gopherbot added this to the Proposal milestone Oct 28, 2024
@gabyhelp
Copy link

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@bradfitz
Copy link
Contributor Author

(Note that this is complementary to #64341, which goes recursively)

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Oct 28, 2024
@ianlancetaylor
Copy link
Member

You left out the most interesting part of the proposal: what should the new method be called?

Also, should it return iter.Seq2[DirEntry, error] or should it return (iter.Seq[DirEntry], error) ?

@mateusz834
Copy link
Member

mateusz834 commented Oct 29, 2024

Also, should it return iter.Seq2[DirEntry, error] or should it return (iter.Seq[DirEntry], error) ?

Or even (iter.Seq2[DirEntry, error], error)

EDIT: I was thinking about the global os.ReadDir, this probably matters only there.

@gazerro
Copy link
Contributor

gazerro commented Oct 29, 2024

You should be able to iterate over entries in both directory order and sorted by filename.

Having an iterator return an error can help you remember to handle errors, but it can also make the code a bit messier if error handling isn’t just about returning the error.

That said, here’s my proposal:

// DirEntries returns a collection of the contents in the directory associated
// with file. It always returns a non-nil collection. If f is nil or not a
// directory, the iterator methods return an empty sequence, and a call to Err
// will return the previously occurred error.
func (f *File) DirEntries() *DirEntries

type DirEntries struct {
	// contains filtered or unexported fields
}

func (d *DirEntries) All() iter.Seq[DirEntry]

func (d *DirEntries) Sorted() iter.Seq[DirEntry]

func (d *DirEntries) Names() iter.Seq[string]

func (d *DirEntries) SortedNames() iter.Seq[string]

func (d *DirEntries) Err() error

and used as shown below:

entries := f.DirEntries()
for entry := range entries.All() {
        // ...
}
if err := entries.Err(); err != nil {
        // handle the error
}

@mateusz834
Copy link
Member

I personally do not like having a separate Err method, it is easy to forget to use it.

@earthboundkid
Copy link
Contributor

I personally do not like having a separate Err method, it is easy to forget to use it.

That is a very fair criticism, but I am writing my own iterating wrapper around fs.WalkFunc/filepath.WalkFunc, and I have found there's not a good alternative. My current API takes an error handler callback and has a method to return the last error, but it's not totally satisfying. It's also a fairly fiddly API with a lot of decision points, and the more I work on it, I'm not actually sure it makes sense to move it into the standard library as opposed to just letting third parties handle it.

@jimmyfrasche
Copy link
Member

iter.Seq2[T, error] makes sense when there can be an error per iteration and you can continue the loop after any such error. Otherwise, the error is only non-nil at most once and you need to write the error handling code that is logically after the loop inside the loop.

There are many places where it's possible to forget to check an error in Go. For example, if you have a iter.Seq2[T, error] you can write for t := range seq

@mateusz834
Copy link
Member

There are many places where it's possible to forget to check an error in Go. For example, if you have a iter.Seq2[T, error] you can write for t := range seq

I thought that the conclusion of #65236 was to introduce a vet check for this, but it does not seem to be added.

#65236 (comment)

@neild
Copy link
Contributor

neild commented Oct 29, 2024

We have entirely too many ways to list a directory's contents:

  • os.ReadDir
  • os.File.ReadDir
  • os.File.Readdir
  • os.File.Readdirnames
  • io/fs.ReadDir
  • io/fs.ReadDirFile.ReadDir
  • io/fs.ReadDirFS.ReadDir
  • io/fs.WalkDir
  • path/filepath.Walk
  • path/filepath.WalkDir

Inconsistencies abound: Some functions return the full directory listing, some are iterative and return a chunk, some sort results, some don't.

And yet we don't seem to have enough functions, because there are gaps: Walk is less efficient than WalkDir because it calls lstat on each file; WalkDir is less efficient than Walk because it needs to read each directory into memory to sort its contents. Walk/WalkDir do a preorder traversal, but some operations (like RemoveAll) require a postorder traversal.

This isn't necessarily an argument against adding an iterator variant of ReadDir, but I think that we need to have a clear understanding on how new directory listing functions fit into the existing mess.

(It's incredibly tempting to try to propose One New API that subsumes all the existing ones--flat directory listing, tree walking, pre- or post-order traversal, sorted or unsorted, a traversal-resistant way to stat or open files, room for expansion with whatever we forgot.)

@mateusz834
Copy link
Member

iter.Seq2[T, error] makes sense when there can be an error per iteration and you can continue the loop after any such error. Otherwise, the error is only non-nil at most once and you need to write the error handling code that is logically after the loop inside the loop.

That is a very fair criticism, but I am writing my own iterating wrapper around fs.WalkFunc/filepath.WalkFunc, and I have found there's not a good alternative. My current API takes an error handler callback and has a method to return the last error, but it's not totally satisfying.

An idea that comes to my mind, this kind of helper can be added to the iter/errors package:

func ErrorAbsorber[T any](iter iter.Seq2[T, error], out *error) iter.Seq[T] {
	return func(yield func(T) bool) {
		for v, err := range iter {
			if err != nil {
				*out = err
				break
			}
			if !yield(v) {
				return
			}
		}
	}
}

Usage:

var err error
for _ = range ErrorAbsorber(ErrorIter(), &err) {
        // logic
}
if err != nil {
        // error handling logic
}

This might help in such cases to move the error handling logic easily from the loop body.

@bradfitz
Copy link
Contributor Author

Also, should it return iter.Seq2[DirEntry, error] or should it return (iter.Seq[DirEntry], error)?

I think it'd have to be closer to the former. Because the implementation may involve multiple system calls at runtime in the middle of the stream, the file system might hit corruption and error out in the middle (after the latter signature already returned an iterator and a nil error), and you'd need some way to yield that to the iterating caller.

That's assuming it's unsorted. And I am kinda assuming it'd need to be unsorted to be interesting because we already have https://pkg.go.dev/os#ReadDir which buffers it all up to sort. People can use that today if that's what they want.

@jonlundy
Copy link

This might help in such cases to move the error handling logic easily from the loop body.

This brings you back to:

for _ = range it.Iter() {
        // logic
}
if err := it.Err(); err != nil {
        // error handling logic
}

Which follows the convention we have already in scanners.

@bradfitz
Copy link
Contributor Author

Per chat with some Go folk today, a few of us didn't like the general pattern of iter.Seq2[T, error] because it makes it too easy for callers to ignore errors.

In this particular case, iter.Seq2[DirEntry, error] is safer because DirEntry is an interface and a caller ignoring the error would panic using the nil interface, but if we adopted that pattern more broadly it's not safe, as many zero values are valid or easily ignorable. e.g. imagine a stream of integers in iter.Seq2[int, error] ... if the caller did for n := range seq, they'd get a zero at the end.... is that a real zero, or a zero with an error they discarded?

So we should probably do something like iter.Seq[SomeStructWithErrorOrValue[DirEntry]] ,

type SomeStructWithErrorOrValue[T any] struct {
    Err error // exactly one of Err or V is set
    V   T     // only valid if Err is nil
}

... and then we went off into naming tangents and where such a type would live, etc.

@mateusz834
Copy link
Member

Per chat with some Go folk today, a few of us didn't like the general pattern of iter.Seq2[T, error] because it makes it too easy for callers to ignore errors.

This can be solved by a vet check, see
#70084 (comment)

bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084)
to try it out, and uses it in the lineread package, changing that
package to return iterators: sometimes over []byte (when the input is
all in memory), but sometimes iterators over results of []byte, if
errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <[email protected]>
bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084)
to try it out, and uses it in the lineread package, changing that
package to return iterators: sometimes over []byte (when the input is
all in memory), but sometimes iterators over results of []byte, if
errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <[email protected]>
@bradfitz
Copy link
Contributor Author

bradfitz commented Nov 5, 2024

We ended up with a "result" type (named after Rust/Swift's) in a result package: https://go.dev/play/p/WzWDeZV42Qr

Then we can write code that produces errors in the middle of an iterator.

e.g. here's an iterator over lines of a file:

// File returns an iterator that reads lines from the named file.
func File(name string) iter.Seq[result.Of[[]byte]] {
	f, err := os.Open(name)
	return func(yield func(result.Of[[]byte]) bool) {
		if err != nil {
			yield(result.Error[[]byte](err))
			return
		}
		defer f.Close()
		bs := bufio.NewScanner(f)
		for bs.Scan() {
			if !yield(result.Value(bs.Bytes())) {
				return
			}
		}
		if err := bs.Err(); err != nil {
			yield(result.Error[[]byte](err))
		}
	}
}

And now callers can't so easily ignore errors, as is common with people using bufio.Scanner or database/sql.Rows, etc.

At least ignoring errors is obvious on the page now.

bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084)
to try it out, and uses it in the lineread package, changing that
package to return iterators: sometimes over []byte (when the input is
all in memory), but sometimes iterators over results of []byte, if
errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <[email protected]>
bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084)
to try it out, and uses it in the lineread package, changing that
package to return iterators: sometimes over []byte (when the input is
all in memory), but sometimes iterators over results of []byte, if
errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <[email protected]>
bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084)
to try it out, and uses it in the lineread package, changing that
package to return iterators: sometimes over []byte (when the input is
all in memory), but sometimes iterators over results of []byte, if
errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <[email protected]>
bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084) to
try it out, and uses it in the new lineutil package (replacing the old
lineread package), changing that package to return iterators:
sometimes over []byte (when the input is all in memory), but sometimes
iterators over results of []byte, if errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <[email protected]>
bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084) to
try it out, and uses it in the new lineutil package (replacing the old
lineread package), changing that package to return iterators:
sometimes over []byte (when the input is all in memory), but sometimes
iterators over results of []byte, if errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <[email protected]>
bradfitz added a commit to tailscale/tailscale that referenced this issue Nov 5, 2024
This adds a new generic result type (motivated by golang/go#70084) to
try it out, and uses it in the new lineutil package (replacing the old
lineread package), changing that package to return iterators:
sometimes over []byte (when the input is all in memory), but sometimes
iterators over results of []byte, if errors might happen at runtime.

Updates #12912
Updates golang/go#70084

Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <[email protected]>
@jba
Copy link
Contributor

jba commented Nov 6, 2024

There is another solution:

entries, errf := f.Dirs()
for x := range entries {
    ...
}
if err := errf(); err != nil {
    return err
}

This is strictly better than putting an Err method on the iterator, because you can't forget about errf: the compiler will warn about an unused variable. For more detail, see #65236 (comment).

@jaloren
Copy link

jaloren commented Nov 6, 2024

@jba i suspect that a lot of apis are going to return an iterator directly so that consumers can range with just a single function call inlined in the for range statement. since it was decided that its not a compiler err to ignore the second value with seq2, we are back where we started.

I do lean more towards something like brads result type because it seems more straight forward and go like.

@mateusz834
Copy link
Member

@jba Personally i would prefer making this behavior more explicit, with something like: #70084 (comment) (it could also return an func() error instead).

@jba
Copy link
Contributor

jba commented Nov 6, 2024

since it was decided that its not a compiler err to ignore the second value with seq2, we are back where we started.

I don't understand this. File.Dirs returns (iter.Seq[T], func() error). You can't put that after range. Same with the return types (iter.Seq2[K, V], func() error).

@jba
Copy link
Contributor

jba commented Nov 6, 2024

Personally i would prefer making this behavior more explicit

In the code

var err error
for _ = range ErrorAbsorber(ErrorIter(), &err) {
        // logic
}
if err != nil {
        // error handling logic
}

Passing a pointer to err counts as using it, so the compiler won't give you an error if you omit the check at the bottom. My suggestion doesn't have that problem.

You could have a function that takes an iter.Seq2[T, error] and returns (iter.Seq[T], func() error), making it closer to my suggestion.

That still leaves the problem of what to do if your iterator returns two values and an error.

@AnomalRoil
Copy link

Would having it be an io.Closer be an option?
We should already be used to check error on Close, and it could use the extra go vet tooling to check we do. 😋
It does sound a bit weird since this isn't technically io, but it still is tangential to it and I don't feel it surprising to have to Close a directory...

That being said, I feel that the code above using a Result is not something I'd refer to someone saying "this is idiomatic Go". Having
The idea of returning an errf as @jba proposed is appealing, as it looks slightly more idiomatic, and prevents misuse by not encouraging people to "just plug it in a for i := range statement", which is nice.

The above example of an ErrorAbsorber also feels weird, first of all passing it an error as a reference just feels wrong and I wouldn't expect it to break on the first error from its name. Next even if the name was errors.UntilNext, its usage seems to involve some sort of loop, where error handling for these errors happens anyway and then execution resumes... So I'm not convinced it would actually simplify the said handling compared to just using a iter.Seq2[T, error], in which you could always choose to wrap all errors with errors.Join if you care about eventual errors only, or you could choose to do more complex handling...

As a user doing some coding and lots of code review, I'd be happy with iter.Seq2[T, error] being an io.Closer, with that being called out in its documentation, or maybe having an Err method, as long as we get some sort of go vet tooling around it... because an unused Err method is not so easy to spot in code review and not as typical as a missing Close.
But having a (iter.Seq2[T, error], func() Error) feels nicer since this would be very easy to catch in code review, while leaving a lot of flexibility for error handling to the users.

@seankhliao
Copy link
Member

We can force errors to be handled / ignored explicitly with iter.Seq2[error, DirEntry]

@mateusz834
Copy link
Member

We can force errors to be handled / ignored explicitly with iter.Seq2[error, DirEntry]

Not in every case: 😄

for range iter2 {
}

@deitrix
Copy link

deitrix commented Dec 14, 2024

We ended up with a "result" type (named after Rust/Swift's) in a result package: https://go.dev/play/p/WzWDeZV42Qr

One use case this doesn't address is the adaptation of fallible sequences to infallible sequences, such that they can be consumed by something expecting iter.Seq[T].

Being able to collect up any error that occurs in a fallible sequence to then later check, like with bufio.Scanner, seems like the only reasonable approach to me (the alternative being to declare two variants of every function I have that accepts an iterator: one for iter.Seq[T], and another for iter.Seq[T, error]/iter.Seq[result.Of[T]])

The API for such a pattern could be:

package result

type Seq[T any] = iter.Seq[Of[T]]

func UnwrapSeq[T any](dst *error, seq Seq[T]) iter.Seq[T]

Or alternatively, keeping with the idiom set out by types like bufio.Scanner:

func UnwrapSeq[T any](seq Seq[T]) ErrorSeq[T]

type ErrorSeq[T any] interface {
	All() iter.Seq[T]
	Err() error
}

Both of which would collect/join any error that occurs during iteration, to be later checked once iteration has finished.

@apparentlymart
Copy link

There is some parallel discussion about this over in #70631, FWIW. In that case the proposal is about a fallible version of slices.Collect, which of course overlaps with deciding what the pattern is for fallible sequences in the first place.

The two patterns that seem to keep coming up in these discussions are a sequence where each item can potentially include an error or an object that holds both a seemingly-infallible sequence and an error describing whether something went wrong at any point in reading the sequence.

For the sake of discussion I'm going to give them some names and shapes here:

package iter

type ErrSeq[T any] = Seq2[T, error]

type SeqWithErr[T any] interface {
    Items() Seq[T]
    Err() error
}

(I picked these two specific shapes for the sake of example. I acknowledge that there are multiple ways to design both of these discussed above -- a new result.Of instead of a Seq2, returning *error or func () error) instead of using an interface, ... -- but the main point of this comment is the relationship and interop between the "each item has an error" and "the overall sequence has an error" models, rather than the exact representation of each.)

Since the error-tracking models for these are slightly different, I think it's only possible to adapt between them by making some assumptions:

  1. A SeqWithErr can be adapted into an ErrSeq by assuming either that once an error has been reported it should be re-reported on every subsequent item until the sequence ends or that the sequence should artificially end at the first error.
  2. An ErrSeq can be adapted into a SeqWithErr by assuming that the individual item errors are descriptive enough to still make sense when reported all together as part of a single errors.Join at the end, or by artificially ending at the first error to return only that one.

In particular I note that adapting a SeqWithErr into an ErrSeq and then adapting that ErrSeq back into SeqWithErr (or vice-versa) does not get you back where you started: Since SeqWithErr does not track the correlation between items and errors, that transformation is lossy and so it isn't possible to recover the original ErrSeq from it.

Nonetheless here are some potential implementations of such adapters, just to be clear about what assumptions I'm making in the above.

package iter

// The following function names are terrible, but bikeshedding doesn't seem
// necessary at this stage.

func ErrSeqToSeqWithErr[T any](errSeq ErrSeq[T]) SeqWithErr[T] {
      return seqWithErrAdapter[T]{seq: errSeq}
}

func SeqWithErrToErrSeq[T any](seqWithErr SeqWithErr[T]) ErrSeq[T] {
     return func (yield func(T, error) bool) {
          for item := range seqWithErr.Items() {
              err := seqWithErr.Err()
              if !yield(item, err) {
                  return
              }
          }
     }
}

type seqWithErrAdapter[T any] struct {
    seq ErrSeq[T]
    err  error
}

func (s seqWithErrAdapter[T]) Items() Seq[T] {
    // (Calling this method more than once would cause some
    // interesting results, and the resulting sequences are not
    // concurrency-safe.)
    return func(yield func(T) bool) {
        for item, err := range s.seq {
            s.err = errors.Join(s.err, err)
            if !yield(item) {
                return
            }
        }
    }
}

func (s seqWithErrAdapter[T]) Err() error {
    return s.err
}

Having both of these seems unfortunate, and the lossiness when adapting between them even more so. But nonetheless I can't help but think that even if the stdlib only provided and used one the other would still be used in some third-party libraries, because both of these models are valid/convenient for certain shapes of problem. 😖

@apparentlymart
Copy link

apparentlymart commented Dec 14, 2024

My long comment above notwithstanding, all of these tricks to find some way to stuff error handling into something where it doesn't necessarily fit takes me back to where I was in the original iterators discussion where I asserted that iter.Seq/iter.Seq2 seem tailored for convenient iteration over infallible sequences but don't seem nearly as good a fit for anything that is fallible at each step.

I suspect this won't be a popular opinion, but I think I'd prefer to skip trying to stuff error handling into the sequence model altogether and instead move to generalize another pattern that's already common in Go: io.Reader.

package iter

type SeqReader[T any] interface {
    ReadSeq() (T, error)
}

// EndOfSequence is analogous to io.EOF for marking that the end of an
// [io.SeqReader] sequence has been reached.
var EndOfSequence error

The caller's use of this would have a very similar shape to reading from io.Reader, including error handling at every step:

r := os.ReadDir() // returns iter.SeqReader[os.DirEntry]
for {
    entry, err := r.ReadSeq()
    if err == iter.EndOfSequence {
        break
    }
    if err != nil {
      
    }
    // handle entry
}

To me, this feels completely clear as to what is going on: it's repeated calls to a single fallible function in a loop, with each iteration either handling an error or handling an item, just like we're accustomed to with io.Reader and other similar designs. I will concede that the EndOfSequence error is somewhat clunky, but no moreso than io.EOF when working with io.Reader.

The slices.CollectError proposed in #70631 would then be analogous to io.ReadAll, for convenient consumption of everything up until the first error:

package slices

func CollectError[T any](r iter.SeqReader[T]) ([]T, error) {
    var s []T
    for {
        item, err := r.ReadSeq()
        if err == iter.EndOfSequence {
            break
        }
        if err != nil {
            return s, err
        }
        s = append(s, item)
    }
    return s, nil
}

It should also be trivial to adapt an iter.Seq[T] into an iter.SeqReader[T] that just returns no errors until the sequence is exhausted and then returns iter.EndOfSequence. The opposite is trickier due to being lossy in a similar way to what I was discussing in my previous comment, but still possible if you really need to use an iter.SeqReader with an API that only supports iter.Seq and you're willing to make some assumptions about how errors ought to be treated.

I'm honestly not really convinced that it's worth solving this problem with iter.Seq at all, given how familiar the above examples are from experience with io.Reader.


Update: I wrote a small package which includes SeqReader and a set of adapter functions between that and iter.Seq/iter.Seq2, and also io.Reader: github.com/apparentlymart/go-seqreader/seqreader

My main goal in writing this was to show that SeqReader could be used as a direct representation of the idea of a sequence where reads can fail, and then most of the other patterns discussed in this issue can be implemented generically in terms of it, whether in stdlib or elsewhere.

(There are examples in the rendered documentation for this package so you can see what these different adapters look like in use.)


Independent Second Update: Over in #71203 there is a proposal for a new specialized syntax for error-handling control flow.

If something like that were adopted, I think it would go some way to improving the ergonomics of this SeqReader design:

for {
    item := seqReader.ReadSeq() ? {
        if err == ErrEndOfSeq {
            break
        }
        // handle err
    }
    // handle item
}

It's not a huge change, but it makes the ErrEndOfSeq situation part of the single error handler block rather than a separate independent if statement (at least, how I'd wrtten it in the example in my prototype).

If it incorporated my additional proposal in #71203 (comment) then it would also be a compile-time error to implicitly fall out of the error-handling block into the main loop body, making it easier to mentally ignore the content of the error-handling block and trust that the code that follows can only run when item is valid.

@jba
Copy link
Contributor

jba commented Dec 20, 2024

The two patterns that seem to keep coming up in these discussions are a sequence where each item can potentially include an error or an object that holds both a seemingly-infallible sequence and an error describing whether something went wrong at any point in reading the sequence.

There is also a third pattern:

iter, errf := NewFallibleIterator(...)
for x := range iter {
    ...
}
if err := errf(); err != nil {
    return err
}

If you have that, you don't need slices.CollectError. You can just write:

iter, errf := NewFallibleIterator(...)
s := slices.Collect(iter)
if err := errf(); err != nil {
    return err
}

@earthboundkid
Copy link
Contributor

earthboundkid commented Dec 20, 2024

The errf is clever in that it forces the checking of the error via the unused variable mechanism, but it feels a little awkward to me.

There's a philosophical question around what is the difference between a one method interface and a concrete function type. Why accept an io.Reader instead of func([]byte) (int, error)? For that matter, why accept io.ReadCloser instead of accepting two callbacks, func([]byte) (int, error) and func() error?

I think the practical difference is that even if you only need one method from an interface, using an interface instead of a concrete lets you do type promotions, like checking for io.WriterTo or accepting a Reader but checking for Close as well. Less of a practical difference and more of a DX difference is that the Read and the Close method are likely to be tightly linked (but not always, or else we wouldn't have io.NopCloser!), so if you pass one, you probably also want to pass the other in a bundle.

So, to me, it feels like it would be more natural to do:

obj := NewFallibleIterator(...)
for x := range obj.All() {
    ...
}
if err := obj.Err(); err != nil {
    return err
}

But if you do it that way, you don't get the guarantee of checking for errors. However, that's just a tooling problem. If we make up a new type that we want to always be checked, we won't have false positive problems with vet, and we can just do a vet check, no?

@DeedleFake
Copy link

But if you do it that way, you don't get the guarantee of checking for errors. However, that's just a tooling problem. If we make up a new type that we want to always be checked, we won't have false positive problems with vet, and we can just do a vet check, no?

I'm not sure how vet would check for that. What if you had a situation like this?

iterator := NewIterator()
for v := range iterator.All() {
  // ...
}
err := extractAndModifyErrorForSomeReason(iterator)
if err != nil {
  // ...
}

How would vet know that the error was, in fact, being checked?

@apparentlymart
Copy link

@earthboundkid indeed it is true that my SeqReader could instead have been written this way:

type SeqReader[T any] func () (T, error)

I must admit that I started with an interface only because I was starting with io.Reader as the inspiration, but when I later wrote that package to prototype it I did end up also declaring SeqReadCloser and so the benefit of being able to implement multiple interfaces at once quickly became useful. So now, retroactively, I will adopt your reasoning for it even though I wasn't thinking about it that way at the time. 😀


I don't suppose it matters too much at this point but I also want to clarify that in my most recent earlier comments I was considering it essentially equivalent to either return a single value that can produce an iter.Seq and an error or to return each of them as separate results like in the iter, errf or iter, errPtr variants: for what I was talking about in that comment in particular the important distinction was whether each iteration step can return its own error or if there is a single error covering the entire iteration. (Notwithstanding that it might be an errors.Join result or similar: the errors are still not each associated with specific individual iteration steps.)

However, I do concede that there are other reasons why those options are different, including the forcing of error handling. My intent with the SeqReader interface was both to allow each iteration step to have its own error and more strongly encourage properly handling it, by following the same patterns we follow for error handling elsewhere instead of trying to invent a new iterator-specific error handling pattern.

@apparentlymart
Copy link

Another consequence of the decision for whether the error is a per-iteration-step thing or a whole-iteration thing is when sequences are used with helper functions like the xiter.Filter/xiter.Filter2 described in #61898.

If each iteration step has its own error then a user of Filter2 can decide for themselves whether to pass through an error or to ignore it. But if errors are collected out-of-band in some other place and retrieved afterwards then an xiter.Filter user doesn't have the option of ignoring errors because the errors are invisible to anyone consuming the sequence.

(Of course, my alternative of introducing a new SeqReader interface has its own variation of that consequence: it would presumably encourage yet another helper function for filtering SeqReader-based sequences: func FilterReader[V any](func (V, error) bool, seq iter.FallibleSeq[V]) iter.FallibleSeq[V]. It's essentially just Filter2 again but with the second type forced to be error, which is annoying.)

@earthboundkid
Copy link
Contributor

Move the check to a runtime finalizer that panics if Err() was never called? 😆 Okay, that's probably not a good idea. 😄 I'm not sure what the best solution is here. I just don't like the idea of doing something conceptually ugly just to make the tool happy, but maybe that's the simplest solution?

@jba
Copy link
Contributor

jba commented Dec 23, 2024

that's just a tooling problem

We have a lot of experience with bufio.Scanner that convinced us that a vet check won't solve the problem. For example, sometimes the construction of the iterator and the error check are far removed from each other. Consider an abstraction over bufio.Scanner: the constructor calls bufio.NewScanner, and some other cleanup method calls Err.

@earthboundkid
Copy link
Contributor

some other cleanup method calls Err

Wouldn't an errf function suffer from the same problem unless you do the runtime finalizer panic thing? Once you pass it to another place, it's hard to ensure it will ever be called.

@DeedleFake
Copy link

Technically vet doesn't need to ensure it. It's just supposed to help flag common cases. You could define "using" it as either calling it or passing it somewhere else.

@enetx

This comment was marked as off-topic.

@vyq

This comment was marked as off-topic.

@jba
Copy link
Contributor

jba commented Dec 29, 2024

some other cleanup method calls Err

Wouldn't an errf function suffer from the same problem unless you do the runtime finalizer panic thing? Once you pass it to another place, it's hard to ensure it will ever be called.

That's true. But the argument just repeats one level up: it's more likely that the abstraction author will remember to write a Close method if there is a second return value. That is, with a FallibleIterator.Err method, the constructor for the abstraction is going to look like this:

func NewWrapper(args) {
    return &Wrapper{iter: NewFallibleIterator(...)}
}

Here it is easy to forget to write a Close method that calls iter.Err().

With an error function, if you write

func NewWrapper(args) {
     iter, errf := NewFallibleIterator(...)
    return &Wrapper{iter:iter}
}

then the code won't compile. The author is forced to deal with the error function, and the only obvious choice is to store it too and call it somewhere else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests