Skip to content

Fixing HeaderIterDP's __len__ function #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

NivekT
Copy link
Contributor

@NivekT NivekT commented Jan 14, 2022

Stack from ghstack:

The previous implementation simply return the limit as the length for HeaderIterDataPipe. This updated implementation takes it a step further to account for the different possible scenarios.

Fixes #123
Fixes #134

Differential Revision: D33589168

NivekT added a commit that referenced this pull request Jan 14, 2022
ghstack-source-id: 908ae56
Pull Request resolved: #166
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 14, 2022
self.length = min(source_len, self.limit)
return self.length
except TypeError:
warn(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is raising a warning and returning the best guess (i.e. self.limit) the right decision here? It can be triggered by things like list(header_dp) if header_dp hasn't been fully traversed once yet, which might confuse the users. The alternative is to raise an exception and not return anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

# TODO(123): __len__ Test: returns the length of source when it is less than the limit
# header_dp = source_dp.header(30)
# self.assertEqual(20, len(header_dp))
# __len__ Test: returns the length of source when it is less than the limit
Copy link
Contributor Author

@NivekT NivekT Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These four tests here show what behaviors are expected


def __iter__(self) -> Iterator[T_co]:
for i, value in enumerate(self.source_datapipe):
if i < self.limit:
yield value
else:
break
self.length = min(i + 1, self.limit) # We know length with certainty when we reach here
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i + 1 for the case when source_datapipe has a length smaller than the limit, so the loop exits with i = len(source_datapipe) - 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a minor bug that i would be unavailable if self.source_datapipe is empty. Let's declare a variable before the iteration to represent i.

Copy link
Contributor

@ejguan ejguan Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do it in a pythonic way

Suggested change
self.length = min(i + 1, self.limit) # We know length with certainty when we reach here
i = 0
for value in self.source_datapipe:
i += 1
if i <= self.limit:
yield value
else:
break
self.length = min(i, self.limit)

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one nit comment.


def __iter__(self) -> Iterator[T_co]:
for i, value in enumerate(self.source_datapipe):
if i < self.limit:
yield value
else:
break
self.length = min(i + 1, self.limit) # We know length with certainty when we reach here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a minor bug that i would be unavailable if self.source_datapipe is empty. Let's declare a variable before the iteration to represent i.

self.length = min(source_len, self.limit)
return self.length
except TypeError:
warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

The previous implementation simply return the limit as the length for `HeaderIterDataPipe`. This updated implementation takes it a step further to account for the different possible scenarios. 

Fixes #123
Fixes #134

[ghstack-poisoned]
NivekT added a commit that referenced this pull request Jan 14, 2022
ghstack-source-id: c8feaf7
Pull Request resolved: #166
@NivekT
Copy link
Contributor Author

NivekT commented Jan 14, 2022

@NivekT has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

The previous implementation simply return the limit as the length for `HeaderIterDataPipe`. This updated implementation takes it a step further to account for the different possible scenarios. 

Fixes #123
Fixes #134

Differential Revision: [D33589168](https://our.internmc.facebook.com/intern/diff/D33589168)

[ghstack-poisoned]
NivekT added a commit that referenced this pull request Jan 14, 2022
ghstack-source-id: fa36944
Pull Request resolved: #166
@NivekT
Copy link
Contributor Author

NivekT commented Jan 14, 2022

@NivekT has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot facebook-github-bot deleted the gh/NivekT/23/head branch January 22, 2022 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
3 participants