Skip to content

How original pytorch calls xla's ops? #1385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alanzhai219 opened this issue Nov 19, 2019 · 8 comments
Closed

How original pytorch calls xla's ops? #1385

alanzhai219 opened this issue Nov 19, 2019 · 8 comments
Assignees
Labels
question stale Has not had recent activity

Comments

@alanzhai219
Copy link

❓ Questions and Help

Recently, I am looking into pytorch/xla code but I am confused with some things.

  • How original pytorch calls xla's ops?

Is there pytorch-xla internal mechanism?

Any reply will be much appreciated. THX

@dlibenzi
Copy link
Collaborator

Yes, we implement an XLA ATEN backend.
A PyTorch op gets mapped to a PyTorch/XLA IR node, then the IR graph gets compiled into an XLA computation, and executed using the XRT API (which communicates with the TPU HW).

@taylanbil
Copy link
Collaborator

@alanzhai219, do you need any more info on this?

@alanzhai219
Copy link
Author

alanzhai219 commented Nov 22, 2019

@alanzhai219, do you need any more info on this?
@taylanbil @dlibenzi Thanks for your kindly and helpful reply.
After going through the lastest xla codes, I understand the ops mapping between xla and PyTorch. But I still have some questions.

  1. In a very early version, xla calls such BuildComputationProgram/BuildComputation/... to map and build and fuse. But In the latest version, I can't figure out how to do that?
  2. If a model has an unsupported OPS in XLA, will the model graph be divided into TPU->CPU->TPU? if so, how and where to implement such the graph segmentation?
  3. I find many tensorflow calls in XLA. When will xla call tensoflow methods?
  4. Which can be the input graph property of xla, jit_traced_graph or not-jit_traced_graph or both? Why?

Thanks☺️

@ailzhang
Copy link
Contributor

@alanzhai219
For #1, in the early stage of pytorch/TPU integration we accepted JIT traced graph and build the XLA computation graph according to it.
But we switched to be a Tensor backend of PyTorch eager* mode early this year. In another word, Pytorch/XLA now takes user's python code(stuff you would write for CPU/CUDA) and do the tracing & graph building on our own. So here's a short answer for #4: we don't take jit traced graph as input for now (might change later), but we build graph out of pytorch eager code.

For #2, for unsupported ops in XLA, we punt it back to CPU and then send back to TPU. This part of logic is auto generated, if you have a build locally, you can find in torch_xla/csrc/aten_xla_type_default.cpp.

For #3, the tensorflow calls in XLA are expected since XLA has some interfaces provided to TF. And PyTorch/XLA might reuse the same interface if applicable.

Hope this is helpful.

Also I'm curious are you looking for a particular part or just trying to understand the codebase better? ;)

@alanzhai219
Copy link
Author

@ailzhang Thanks for your reply. just try to understand the codebase and figure out the relationship between PyTorch backend extension and XLA.😁

@alanzhai219
Copy link
Author

@ailzhang @taylanbil @dlibenzi I have a question why XLATensor has no storage while PyTorch Tensor has such storage. As a result, how does op deal with such no-storage XLATensor? I thought it for two days and can't understand it.
Thanks

@stale
Copy link

stale bot commented Dec 28, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Has not had recent activity label Dec 28, 2019
@dlibenzi
Copy link
Collaborator

@ailzhang @taylanbil @dlibenzi I have a question why XLATensor has no storage while PyTorch Tensor has such storage. As a result, how does op deal with such no-storage XLATensor? I thought it for two days and can't understand it.
Thanks

Having a storage is not a requirement for PyTorch, as long as you intercept the proper ATEN hooks and deal with the ops.
We deal with them in graph mode, which requires no storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question stale Has not had recent activity
Projects
None yet
Development

No branches or pull requests

5 participants