-
Notifications
You must be signed in to change notification settings - Fork 617
replace device param with bounds_check_warning of inputs_to_device function #3831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace device param with bounds_check_warning of inputs_to_device function #3831
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
hi, @842974287 @q10 @jiayisuse @aporialiao could you take a look? |
@q10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Hi @tiankongdeguiji , thanks for opening this PR. We will look into this and get back to you if we have questions, and if no questions, we will merge the PR. |
thx! |
…nction (pytorch#3831) Summary: X-link: https://github.com/facebookresearch/FBGEMM/pull/930 When using fbgemm-gpu version 1.1.0 or later, the fx trace model will be bound to the device cuda:0. As a result, deploying the model to a CPU or a different device, such as cuda:1, is not possible. We can reproduce this with the case in pytorch#3830 we fix it, the `fbgemm_gpu_split_table_batched_embeddings_ops_inference_inputs_to_device` do not need `device(type='cuda', index=0))` as input. ``` ··· getitem_1 = _fx_trec_unwrap_kjt[0] getitem_2 = _fx_trec_unwrap_kjt[1]; _fx_trec_unwrap_kjt = None _tensor_constant0 = self._tensor_constant0 inputs_to_device = fbgemm_gpu_split_table_batched_embeddings_ops_inference_inputs_to_device(getitem_1, getitem_2, None, _tensor_constant0); getitem_1 = getitem_2 = _tensor_constant0 = None getitem_3 = inputs_to_device[0] getitem_4 = inputs_to_device[1] getitem_5 = inputs_to_device[2]; inputs_to_device = None _tensor_constant1 = self._tensor_constant1 _tensor_constant0_1 = self._tensor_constant0 bounds_check_indices = torch.ops.fbgemm.bounds_check_indices(_tensor_constant1, getitem_3, getitem_4, 1, _tensor_constant0_1, getitem_5); _tensor_constant1 = _tensor_constant0_1 = bounds_check_indices = None _tensor_constant2 = self._tensor_constant2 _tensor_constant3 = self._tensor_constant3 ··· ``` Pull Request resolved: pytorch#3831 Reviewed By: sryap Differential Revision: D71370666 Pulled By: q10 fbshipit-source-id: e8f65a534bf8235534ff861d1f135497f4660820
…nction (pytorch#930) Summary: Pull Request resolved: facebookresearch/FBGEMM#930 When using fbgemm-gpu version 1.1.0 or later, the fx trace model will be bound to the device cuda:0. As a result, deploying the model to a CPU or a different device, such as cuda:1, is not possible. We can reproduce this with the case in pytorch#3830 we fix it, the `fbgemm_gpu_split_table_batched_embeddings_ops_inference_inputs_to_device` do not need `device(type='cuda', index=0))` as input. ``` ··· getitem_1 = _fx_trec_unwrap_kjt[0] getitem_2 = _fx_trec_unwrap_kjt[1]; _fx_trec_unwrap_kjt = None _tensor_constant0 = self._tensor_constant0 inputs_to_device = fbgemm_gpu_split_table_batched_embeddings_ops_inference_inputs_to_device(getitem_1, getitem_2, None, _tensor_constant0); getitem_1 = getitem_2 = _tensor_constant0 = None getitem_3 = inputs_to_device[0] getitem_4 = inputs_to_device[1] getitem_5 = inputs_to_device[2]; inputs_to_device = None _tensor_constant1 = self._tensor_constant1 _tensor_constant0_1 = self._tensor_constant0 bounds_check_indices = torch.ops.fbgemm.bounds_check_indices(_tensor_constant1, getitem_3, getitem_4, 1, _tensor_constant0_1, getitem_5); _tensor_constant1 = _tensor_constant0_1 = bounds_check_indices = None _tensor_constant2 = self._tensor_constant2 _tensor_constant3 = self._tensor_constant3 ··· ``` X-link: pytorch#3831 Reviewed By: sryap Differential Revision: D71370666 Pulled By: q10 fbshipit-source-id: e8f65a534bf8235534ff861d1f135497f4660820
When using fbgemm-gpu version 1.1.0 or later, the fx trace model will be bound to the device cuda:0. As a result, deploying the model to a CPU or a different device, such as cuda:1, is not possible. We can reproduce this with the case in #3830
we fix it, the
fbgemm_gpu_split_table_batched_embeddings_ops_inference_inputs_to_device
do not needdevice(type='cuda', index=0))
as input.