Skip to content

[executorch][flat_tensor] DataMap implementation #7900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 6, 2025

Conversation

lucylq
Copy link
Contributor

@lucylq lucylq commented Jan 23, 2025

Stack from ghstack (oldest at bottom):

DataMap implementation that

  • Loads a flat_tensor file
  • Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
  • Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.

  • If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: D67064580

DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jan 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7900

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit e1c396f with merge base 821a2fe (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 23, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67064580

lucylq added a commit that referenced this pull request Jan 23, 2025
DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

ghstack-source-id: 262756077
Pull Request resolved: #7900
DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67064580

lucylq added a commit that referenced this pull request Jan 23, 2025
Pull Request resolved: #7900

DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)
ghstack-source-id: 262759344
@lucylq lucylq added the release notes: api Changes to public facing apis (any interfaces, pybinded runtime methods, etc.) label Jan 23, 2025
DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67064580

lucylq added a commit that referenced this pull request Jan 23, 2025
Pull Request resolved: #7900

DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.
ghstack-source-id: 262759344

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)
DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67064580

lucylq added a commit that referenced this pull request Jan 24, 2025
Pull Request resolved: #7900

DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.
ghstack-source-id: 262941711

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)
DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

[ghstack-poisoned]
lucylq added a commit that referenced this pull request Jan 24, 2025
Pull Request resolved: #7900

DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {key: tensor} and {key: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.
ghstack-source-id: 262957949

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67064580

executorch::runtime::FreeableBuffer _flat_tensor_data;

// Map of name to {segment index, offset, TensorLayout}.
std::unordered_map<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not super opinionated on this, but one thing to consider by having this state as part of the class is you force it to keep dangling around after all the method inits are done which is wasteful. I'm not really super sold on this being a huge latency win either rather then just constructing these things as they are needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this sounds good. Esp. for training case where there aren't many tensors and a linear search (or logn if tensors are sorted) isn't time consuming. If there are latency issues later down the track we can create an impl with a map.

int segment_size = s_data_segment->Get(0)->size();
assert(segment_size <= segment_data_size);

Result<FreeableBuffer> _data_ro = loader->load(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I was trying to figure out how I wanted to handle this for mutable state. Here is what Ive come up with.

Lazily load segments, but internally keep a list of all the loaded segments. If the first time you need to load a segment is through "get_data" then put it in the preserve lis. If the first time you need to load a segment is through load_into then instead you just call data_loader load into which doesnt keep it hanging around.

DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67064580

lucylq added a commit that referenced this pull request Jan 31, 2025
Pull Request resolved: #7900

DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {key: tensor} and {key: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.
ghstack-source-id: 264110292

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)
DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67064580

lucylq added a commit that referenced this pull request Feb 4, 2025
Pull Request resolved: #7900

DataMap implementation that
* Loads a flat_tensor file
* Makes tensor information available via the named_data_map.h interface.

ghstack-source-id: 264501132

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)
DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67064580

lucylq added a commit that referenced this pull request Feb 5, 2025
Pull Request resolved: #7900

DataMap implementation that
* Loads a flat_tensor file
* Makes tensor information available via the named_data_map.h interface.

TODO: in a later diff, update the ET runtime to hold onto the FreeableBuffers returned by the NDM. Then, the NDM will not persist the segment. T214294528

ghstack-source-id: 264501132

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)
DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67064580

lucylq added a commit that referenced this pull request Feb 5, 2025
Pull Request resolved: #7900

DataMap implementation that
* Loads a flat_tensor file
* Makes tensor information available via the named_data_map.h interface.

TODO: in a later diff, update the ET runtime to hold onto the FreeableBuffers returned by the NDM. Then, the NDM will not persist the segment. T214294528

ghstack-source-id: 264850001

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)
DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67064580

lucylq added a commit that referenced this pull request Feb 5, 2025
Pull Request resolved: #7900

DataMap implementation that
* Loads a flat_tensor file
* Makes tensor information available via the named_data_map.h interface.

TODO: in a later diff, update the ET runtime to hold onto the FreeableBuffers returned by the NDM. Then, the NDM will not persist the segment. T214294528

ghstack-source-id: 264871691

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)
DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67064580

lucylq added a commit that referenced this pull request Feb 5, 2025
Pull Request resolved: #7900

DataMap implementation that
* Loads a flat_tensor file
* Makes tensor information available via the named_data_map.h interface.

TODO: in a later diff, update the ET runtime to hold onto the FreeableBuffers returned by the NDM. Then, the NDM will not persist the segment. T214294528

ghstack-source-id: 264905837

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)
DataMap implementation that
* Loads a flat_tensor file
* Populates a map with {fqn: tensor} and {fqn: TensorLayout}.
* Makes tensor information available via the named_data_map.h interface.

For now, DataMap doesn't store the DataLoader.
- If/when tensors are in their own segments, DataMap should also store a DataLoader.

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67064580

lucylq added a commit that referenced this pull request Feb 5, 2025
Pull Request resolved: #7900

DataMap implementation that
* Loads a flat_tensor file
* Makes tensor information available via the named_data_map.h interface.

TODO: in a later diff, update the ET runtime to hold onto the FreeableBuffers returned by the NDM. Then, the NDM will not persist the segment. T214294528

ghstack-source-id: 264916795

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)
@facebook-github-bot facebook-github-bot merged commit eb94973 into gh/lucylq/32/base Feb 6, 2025
46 of 47 checks passed
@facebook-github-bot facebook-github-bot deleted the gh/lucylq/32/head branch February 6, 2025 01:53
kirklandsign added a commit that referenced this pull request Feb 6, 2025
Pull Request resolved: #7900

DataMap implementation that
* Loads a flat_tensor file
* Makes tensor information available via the named_data_map.h interface.

TODO: in a later diff, update the ET runtime to hold onto the FreeableBuffers returned by the NDM. Then, the NDM will not persist the segment. T214294528

ghstack-source-id: 264916795

Differential Revision: [D67064580](https://our.internmc.facebook.com/intern/diff/D67064580/)

---------

Co-authored-by: lucylq <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported release notes: api Changes to public facing apis (any interfaces, pybinded runtime methods, etc.)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants