Skip to content

replace device param with bounds_check_warning of inputs_to_device function #3831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

tiankongdeguiji
Copy link
Contributor

@tiankongdeguiji tiankongdeguiji commented Mar 17, 2025

When using fbgemm-gpu version 1.1.0 or later, the fx trace model will be bound to the device cuda:0. As a result, deploying the model to a CPU or a different device, such as cuda:1, is not possible. We can reproduce this with the case in #3830

we fix it, the fbgemm_gpu_split_table_batched_embeddings_ops_inference_inputs_to_device do not need device(type='cuda', index=0)) as input.

···
getitem_1 = _fx_trec_unwrap_kjt[0]
    getitem_2 = _fx_trec_unwrap_kjt[1];  _fx_trec_unwrap_kjt = None
    _tensor_constant0 = self._tensor_constant0
    inputs_to_device = fbgemm_gpu_split_table_batched_embeddings_ops_inference_inputs_to_device(getitem_1, getitem_2, None, _tensor_constant0);  getitem_1 = getitem_2 = _tensor_constant0 = None
    getitem_3 = inputs_to_device[0]
    getitem_4 = inputs_to_device[1]
    getitem_5 = inputs_to_device[2];  inputs_to_device = None
    _tensor_constant1 = self._tensor_constant1
    _tensor_constant0_1 = self._tensor_constant0
    bounds_check_indices = torch.ops.fbgemm.bounds_check_indices(_tensor_constant1, getitem_3, getitem_4, 1, _tensor_constant0_1, getitem_5);  _tensor_constant1 = _tensor_constant0_1 = bounds_check_indices = None
    _tensor_constant2 = self._tensor_constant2
    _tensor_constant3 = self._tensor_constant3
···

Copy link

netlify bot commented Mar 17, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 0a3db61
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67d81fd9ee95a000082b8493
😎 Deploy Preview https://deploy-preview-3831--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@tiankongdeguiji
Copy link
Contributor Author

hi, @842974287 @q10 @jiayisuse @aporialiao could you take a look?

@facebook-github-bot
Copy link
Contributor

@q10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@q10
Copy link
Contributor

q10 commented Mar 18, 2025

Hi @tiankongdeguiji , thanks for opening this PR. We will look into this and get back to you if we have questions, and if no questions, we will merge the PR.

@tiankongdeguiji
Copy link
Contributor Author

Hi @tiankongdeguiji , thanks for opening this PR. We will look into this and get back to you if we have questions, and if no questions, we will merge the PR.

thx!

@facebook-github-bot
Copy link
Contributor

@q10 merged this pull request in c01a227.

liligwu pushed a commit to ROCm/FBGEMM that referenced this pull request Mar 19, 2025
…nction (pytorch#3831)

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/930

When using fbgemm-gpu version 1.1.0 or later, the fx trace model will be bound to the device cuda:0. As a result, deploying the model to a CPU or a different device, such as cuda:1, is not possible. We can reproduce this with the case in pytorch#3830

we fix it, the `fbgemm_gpu_split_table_batched_embeddings_ops_inference_inputs_to_device` do not need `device(type='cuda', index=0))` as input.
```
···
getitem_1 = _fx_trec_unwrap_kjt[0]
    getitem_2 = _fx_trec_unwrap_kjt[1];  _fx_trec_unwrap_kjt = None
    _tensor_constant0 = self._tensor_constant0
    inputs_to_device = fbgemm_gpu_split_table_batched_embeddings_ops_inference_inputs_to_device(getitem_1, getitem_2, None, _tensor_constant0);  getitem_1 = getitem_2 = _tensor_constant0 = None
    getitem_3 = inputs_to_device[0]
    getitem_4 = inputs_to_device[1]
    getitem_5 = inputs_to_device[2];  inputs_to_device = None
    _tensor_constant1 = self._tensor_constant1
    _tensor_constant0_1 = self._tensor_constant0
    bounds_check_indices = torch.ops.fbgemm.bounds_check_indices(_tensor_constant1, getitem_3, getitem_4, 1, _tensor_constant0_1, getitem_5);  _tensor_constant1 = _tensor_constant0_1 = bounds_check_indices = None
    _tensor_constant2 = self._tensor_constant2
    _tensor_constant3 = self._tensor_constant3
···
```

Pull Request resolved: pytorch#3831

Reviewed By: sryap

Differential Revision: D71370666

Pulled By: q10

fbshipit-source-id: e8f65a534bf8235534ff861d1f135497f4660820
q10 pushed a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
…nction (pytorch#930)

Summary:
Pull Request resolved: facebookresearch/FBGEMM#930

When using fbgemm-gpu version 1.1.0 or later, the fx trace model will be bound to the device cuda:0. As a result, deploying the model to a CPU or a different device, such as cuda:1, is not possible. We can reproduce this with the case in pytorch#3830

we fix it, the `fbgemm_gpu_split_table_batched_embeddings_ops_inference_inputs_to_device` do not need `device(type='cuda', index=0))` as input.
```
···
getitem_1 = _fx_trec_unwrap_kjt[0]
    getitem_2 = _fx_trec_unwrap_kjt[1];  _fx_trec_unwrap_kjt = None
    _tensor_constant0 = self._tensor_constant0
    inputs_to_device = fbgemm_gpu_split_table_batched_embeddings_ops_inference_inputs_to_device(getitem_1, getitem_2, None, _tensor_constant0);  getitem_1 = getitem_2 = _tensor_constant0 = None
    getitem_3 = inputs_to_device[0]
    getitem_4 = inputs_to_device[1]
    getitem_5 = inputs_to_device[2];  inputs_to_device = None
    _tensor_constant1 = self._tensor_constant1
    _tensor_constant0_1 = self._tensor_constant0
    bounds_check_indices = torch.ops.fbgemm.bounds_check_indices(_tensor_constant1, getitem_3, getitem_4, 1, _tensor_constant0_1, getitem_5);  _tensor_constant1 = _tensor_constant0_1 = bounds_check_indices = None
    _tensor_constant2 = self._tensor_constant2
    _tensor_constant3 = self._tensor_constant3
···
```

X-link: pytorch#3831

Reviewed By: sryap

Differential Revision: D71370666

Pulled By: q10

fbshipit-source-id: e8f65a534bf8235534ff861d1f135497f4660820
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants