-
Notifications
You must be signed in to change notification settings - Fork 166
Fixing HeaderIterDP's __len__ function #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
self.length = min(source_len, self.limit) | ||
return self.length | ||
except TypeError: | ||
warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is raising a warning and returning the best guess (i.e. self.limit
) the right decision here? It can be triggered by things like list(header_dp)
if header_dp
hasn't been fully traversed once yet, which might confuse the users. The alternative is to raise an exception and not return anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
# TODO(123): __len__ Test: returns the length of source when it is less than the limit | ||
# header_dp = source_dp.header(30) | ||
# self.assertEqual(20, len(header_dp)) | ||
# __len__ Test: returns the length of source when it is less than the limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These four tests here show what behaviors are expected
|
||
def __iter__(self) -> Iterator[T_co]: | ||
for i, value in enumerate(self.source_datapipe): | ||
if i < self.limit: | ||
yield value | ||
else: | ||
break | ||
self.length = min(i + 1, self.limit) # We know length with certainty when we reach here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i + 1
for the case when source_datapipe
has a length smaller than the limit, so the loop exits with i = len(source_datapipe) - 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a minor bug that i
would be unavailable if self.source_datapipe
is empty. Let's declare a variable before the iteration to represent i
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do it in a pythonic way
self.length = min(i + 1, self.limit) # We know length with certainty when we reach here | |
i = 0 | |
for value in self.source_datapipe: | |
i += 1 | |
if i <= self.limit: | |
yield value | |
else: | |
break | |
self.length = min(i, self.limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except one nit comment.
|
||
def __iter__(self) -> Iterator[T_co]: | ||
for i, value in enumerate(self.source_datapipe): | ||
if i < self.limit: | ||
yield value | ||
else: | ||
break | ||
self.length = min(i + 1, self.limit) # We know length with certainty when we reach here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a minor bug that i
would be unavailable if self.source_datapipe
is empty. Let's declare a variable before the iteration to represent i
.
self.length = min(source_len, self.limit) | ||
return self.length | ||
except TypeError: | ||
warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
@NivekT has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
The previous implementation simply return the limit as the length for `HeaderIterDataPipe`. This updated implementation takes it a step further to account for the different possible scenarios. Fixes #123 Fixes #134 Differential Revision: [D33589168](https://our.internmc.facebook.com/intern/diff/D33589168) [ghstack-poisoned]
@NivekT has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Stack from ghstack:
The previous implementation simply return the limit as the length for
HeaderIterDataPipe
. This updated implementation takes it a step further to account for the different possible scenarios.Fixes #123
Fixes #134
Differential Revision: D33589168