-
Notifications
You must be signed in to change notification settings - Fork 18k
proposal: os: add iterator variant of File.ReadDir #70084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related Issues and Documentation (Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
(Note that this is complementary to #64341, which goes recursively) |
You left out the most interesting part of the proposal: what should the new method be called? Also, should it return |
Or even EDIT: I was thinking about the global |
You should be able to iterate over entries in both directory order and sorted by filename. Having an iterator return an error can help you remember to handle errors, but it can also make the code a bit messier if error handling isn’t just about returning the error. That said, here’s my proposal: // DirEntries returns a collection of the contents in the directory associated
// with file. It always returns a non-nil collection. If f is nil or not a
// directory, the iterator methods return an empty sequence, and a call to Err
// will return the previously occurred error.
func (f *File) DirEntries() *DirEntries
type DirEntries struct {
// contains filtered or unexported fields
}
func (d *DirEntries) All() iter.Seq[DirEntry]
func (d *DirEntries) Sorted() iter.Seq[DirEntry]
func (d *DirEntries) Names() iter.Seq[string]
func (d *DirEntries) SortedNames() iter.Seq[string]
func (d *DirEntries) Err() error and used as shown below: entries := f.DirEntries()
for entry := range entries.All() {
// ...
}
if err := entries.Err(); err != nil {
// handle the error
} |
I personally do not like having a separate |
That is a very fair criticism, but I am writing my own iterating wrapper around fs.WalkFunc/filepath.WalkFunc, and I have found there's not a good alternative. My current API takes an error handler callback and has a method to return the last error, but it's not totally satisfying. It's also a fairly fiddly API with a lot of decision points, and the more I work on it, I'm not actually sure it makes sense to move it into the standard library as opposed to just letting third parties handle it. |
There are many places where it's possible to forget to check an error in Go. For example, if you have a |
I thought that the conclusion of #65236 was to introduce a vet check for this, but it does not seem to be added. |
We have entirely too many ways to list a directory's contents:
Inconsistencies abound: Some functions return the full directory listing, some are iterative and return a chunk, some sort results, some don't. And yet we don't seem to have enough functions, because there are gaps: Walk is less efficient than WalkDir because it calls lstat on each file; WalkDir is less efficient than Walk because it needs to read each directory into memory to sort its contents. Walk/WalkDir do a preorder traversal, but some operations (like RemoveAll) require a postorder traversal. This isn't necessarily an argument against adding an iterator variant of ReadDir, but I think that we need to have a clear understanding on how new directory listing functions fit into the existing mess. (It's incredibly tempting to try to propose One New API that subsumes all the existing ones--flat directory listing, tree walking, pre- or post-order traversal, sorted or unsorted, a traversal-resistant way to stat or open files, room for expansion with whatever we forgot.) |
An idea that comes to my mind, this kind of helper can be added to the func ErrorAbsorber[T any](iter iter.Seq2[T, error], out *error) iter.Seq[T] {
return func(yield func(T) bool) {
for v, err := range iter {
if err != nil {
*out = err
break
}
if !yield(v) {
return
}
}
}
} Usage: var err error
for _ = range ErrorAbsorber(ErrorIter(), &err) {
// logic
}
if err != nil {
// error handling logic
} This might help in such cases to move the error handling logic easily from the loop body. |
I think it'd have to be closer to the former. Because the implementation may involve multiple system calls at runtime in the middle of the stream, the file system might hit corruption and error out in the middle (after the latter signature already returned an iterator and a nil error), and you'd need some way to yield that to the iterating caller. That's assuming it's unsorted. And I am kinda assuming it'd need to be unsorted to be interesting because we already have https://pkg.go.dev/os#ReadDir which buffers it all up to sort. People can use that today if that's what they want. |
This brings you back to: for _ = range it.Iter() {
// logic
}
if err := it.Err(); err != nil {
// error handling logic
} Which follows the convention we have already in scanners. |
Per chat with some Go folk today, a few of us didn't like the general pattern of In this particular case, So we should probably do something like type SomeStructWithErrorOrValue[T any] struct {
Err error // exactly one of Err or V is set
V T // only valid if Err is nil
} ... and then we went off into naming tangents and where such a type would live, etc. |
This can be solved by a vet check, see |
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the lineread package, changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <[email protected]>
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the lineread package, changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <[email protected]>
We ended up with a "result" type (named after Rust/Swift's) in a Then we can write code that produces errors in the middle of an iterator. e.g. here's an iterator over lines of a file: // File returns an iterator that reads lines from the named file.
func File(name string) iter.Seq[result.Of[[]byte]] {
f, err := os.Open(name)
return func(yield func(result.Of[[]byte]) bool) {
if err != nil {
yield(result.Error[[]byte](err))
return
}
defer f.Close()
bs := bufio.NewScanner(f)
for bs.Scan() {
if !yield(result.Value(bs.Bytes())) {
return
}
}
if err := bs.Err(); err != nil {
yield(result.Error[[]byte](err))
}
}
} And now callers can't so easily ignore errors, as is common with people using At least ignoring errors is obvious on the page now. |
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the lineread package, changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <[email protected]>
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the lineread package, changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <[email protected]>
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the lineread package, changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <[email protected]>
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the new lineutil package (replacing the old lineread package), changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <[email protected]>
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the new lineutil package (replacing the old lineread package), changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <[email protected]>
This adds a new generic result type (motivated by golang/go#70084) to try it out, and uses it in the new lineutil package (replacing the old lineread package), changing that package to return iterators: sometimes over []byte (when the input is all in memory), but sometimes iterators over results of []byte, if errors might happen at runtime. Updates #12912 Updates golang/go#70084 Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83 Signed-off-by: Brad Fitzpatrick <[email protected]>
There is another solution:
This is strictly better than putting an |
@jba i suspect that a lot of apis are going to return an iterator directly so that consumers can range with just a single function call inlined in the for range statement. since it was decided that its not a compiler err to ignore the second value with seq2, we are back where we started. I do lean more towards something like brads result type because it seems more straight forward and go like. |
@jba Personally i would prefer making this behavior more explicit, with something like: #70084 (comment) (it could also return an |
I don't understand this. |
In the code
Passing a pointer to You could have a function that takes an That still leaves the problem of what to do if your iterator returns two values and an error. |
Would having it be an That being said, I feel that the code above using a The above example of an As a user doing some coding and lots of code review, I'd be happy with |
We can force errors to be handled / ignored explicitly with |
Not in every case: 😄 for range iter2 {
} |
One use case this doesn't address is the adaptation of fallible sequences to infallible sequences, such that they can be consumed by something expecting Being able to collect up any error that occurs in a fallible sequence to then later check, like with The API for such a pattern could be: package result
type Seq[T any] = iter.Seq[Of[T]]
func UnwrapSeq[T any](dst *error, seq Seq[T]) iter.Seq[T] Or alternatively, keeping with the idiom set out by types like func UnwrapSeq[T any](seq Seq[T]) ErrorSeq[T]
type ErrorSeq[T any] interface {
All() iter.Seq[T]
Err() error
} Both of which would collect/join any error that occurs during iteration, to be later checked once iteration has finished. |
There is some parallel discussion about this over in #70631, FWIW. In that case the proposal is about a fallible version of The two patterns that seem to keep coming up in these discussions are a sequence where each item can potentially include an error or an object that holds both a seemingly-infallible sequence and an For the sake of discussion I'm going to give them some names and shapes here: package iter
type ErrSeq[T any] = Seq2[T, error]
type SeqWithErr[T any] interface {
Items() Seq[T]
Err() error
} (I picked these two specific shapes for the sake of example. I acknowledge that there are multiple ways to design both of these discussed above -- a new Since the error-tracking models for these are slightly different, I think it's only possible to adapt between them by making some assumptions:
In particular I note that adapting a Nonetheless here are some potential implementations of such adapters, just to be clear about what assumptions I'm making in the above. package iter
// The following function names are terrible, but bikeshedding doesn't seem
// necessary at this stage.
func ErrSeqToSeqWithErr[T any](errSeq ErrSeq[T]) SeqWithErr[T] {
return seqWithErrAdapter[T]{seq: errSeq}
}
func SeqWithErrToErrSeq[T any](seqWithErr SeqWithErr[T]) ErrSeq[T] {
return func (yield func(T, error) bool) {
for item := range seqWithErr.Items() {
err := seqWithErr.Err()
if !yield(item, err) {
return
}
}
}
}
type seqWithErrAdapter[T any] struct {
seq ErrSeq[T]
err error
}
func (s seqWithErrAdapter[T]) Items() Seq[T] {
// (Calling this method more than once would cause some
// interesting results, and the resulting sequences are not
// concurrency-safe.)
return func(yield func(T) bool) {
for item, err := range s.seq {
s.err = errors.Join(s.err, err)
if !yield(item) {
return
}
}
}
}
func (s seqWithErrAdapter[T]) Err() error {
return s.err
} Having both of these seems unfortunate, and the lossiness when adapting between them even more so. But nonetheless I can't help but think that even if the stdlib only provided and used one the other would still be used in some third-party libraries, because both of these models are valid/convenient for certain shapes of problem. 😖 |
My long comment above notwithstanding, all of these tricks to find some way to stuff error handling into something where it doesn't necessarily fit takes me back to where I was in the original iterators discussion where I asserted that I suspect this won't be a popular opinion, but I think I'd prefer to skip trying to stuff error handling into the sequence model altogether and instead move to generalize another pattern that's already common in Go: package iter
type SeqReader[T any] interface {
ReadSeq() (T, error)
}
// EndOfSequence is analogous to io.EOF for marking that the end of an
// [io.SeqReader] sequence has been reached.
var EndOfSequence error The caller's use of this would have a very similar shape to reading from r := os.ReadDir() // returns iter.SeqReader[os.DirEntry]
for {
entry, err := r.ReadSeq()
if err == iter.EndOfSequence {
break
}
if err != nil {
}
// handle entry
} To me, this feels completely clear as to what is going on: it's repeated calls to a single fallible function in a loop, with each iteration either handling an error or handling an item, just like we're accustomed to with The package slices
func CollectError[T any](r iter.SeqReader[T]) ([]T, error) {
var s []T
for {
item, err := r.ReadSeq()
if err == iter.EndOfSequence {
break
}
if err != nil {
return s, err
}
s = append(s, item)
}
return s, nil
} It should also be trivial to adapt an I'm honestly not really convinced that it's worth solving this problem with Update: I wrote a small package which includes My main goal in writing this was to show that (There are examples in the rendered documentation for this package so you can see what these different adapters look like in use.) Independent Second Update: Over in #71203 there is a proposal for a new specialized syntax for error-handling control flow. If something like that were adopted, I think it would go some way to improving the ergonomics of this for {
item := seqReader.ReadSeq() ? {
if err == ErrEndOfSeq {
break
}
// handle err
}
// handle item
} It's not a huge change, but it makes the If it incorporated my additional proposal in #71203 (comment) then it would also be a compile-time error to implicitly fall out of the error-handling block into the main loop body, making it easier to mentally ignore the content of the error-handling block and trust that the code that follows can only run when |
There is also a third pattern:
If you have that, you don't need
|
The There's a philosophical question around what is the difference between a one method interface and a concrete function type. Why accept an I think the practical difference is that even if you only need one method from an interface, using an interface instead of a concrete lets you do type promotions, like checking for So, to me, it feels like it would be more natural to do: obj := NewFallibleIterator(...)
for x := range obj.All() {
...
}
if err := obj.Err(); err != nil {
return err
} But if you do it that way, you don't get the guarantee of checking for errors. However, that's just a tooling problem. If we make up a new type that we want to always be checked, we won't have false positive problems with vet, and we can just do a vet check, no? |
I'm not sure how vet would check for that. What if you had a situation like this? iterator := NewIterator()
for v := range iterator.All() {
// ...
}
err := extractAndModifyErrorForSomeReason(iterator)
if err != nil {
// ...
} How would vet know that the error was, in fact, being checked? |
@earthboundkid indeed it is true that my type SeqReader[T any] func () (T, error) I must admit that I started with an interface only because I was starting with I don't suppose it matters too much at this point but I also want to clarify that in my most recent earlier comments I was considering it essentially equivalent to either return a single value that can produce an However, I do concede that there are other reasons why those options are different, including the forcing of error handling. My intent with the |
Another consequence of the decision for whether the error is a per-iteration-step thing or a whole-iteration thing is when sequences are used with helper functions like the If each iteration step has its own error then a user of (Of course, my alternative of introducing a new |
Move the check to a runtime finalizer that panics if Err() was never called? 😆 Okay, that's probably not a good idea. 😄 I'm not sure what the best solution is here. I just don't like the idea of doing something conceptually ugly just to make the tool happy, but maybe that's the simplest solution? |
We have a lot of experience with |
Wouldn't an errf function suffer from the same problem unless you do the runtime finalizer panic thing? Once you pass it to another place, it's hard to ensure it will ever be called. |
Technically vet doesn't need to ensure it. It's just supposed to help flag common cases. You could define "using" it as either calling it or passing it somewhere else. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
That's true. But the argument just repeats one level up: it's more likely that the abstraction author will remember to write a Close method if there is a second return value. That is, with a
Here it is easy to forget to write a Close method that calls With an error function, if you write
then the code won't compile. The author is forced to deal with the error function, and the only obvious choice is to store it too and call it somewhere else. |
Proposal Details
Today we have:
https://pkg.go.dev/os#File.ReadDir etc
That
n
feels so antiquated now that we have iterators! I propose that we add an iterator-based variant. 😄/cc @neild
The text was updated successfully, but these errors were encountered: