Adding GPU acceleration to encode_jpeg #8375

deekay42 · 2024-04-12T23:15:46Z

Summary:
I'm adding GPU support to the existing torchvision.io.encode_jpeg function. If the input tensors are on the GPU, the CUDA version will be used and the CPU version otherwise.
Additionally, I'm adding a new function torchvision.io.encode_jpegs (plural) with uses a fused kernel and may be faster than successive calls to the singular version which incurs kernel launch overhead for each call.
If it's alright, I'll be happy to refactor decode_jpeg to follow this convention in a follow up PR.

Test Plan:

pytest test -vvv
ufmt format torchvision
flake8 torchvision

Reviewers:
@NicolasHug

pytorch-bot · 2024-04-12T23:15:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8375

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 37 Pending

As of commit 6c1c9fe with merge base 5181a85 ():

NEW FAILURES - The following jobs have failed:

Lint / bc (gh)
Lint / c-source / linux-job (gh)
RuntimeError: Command docker exec -t f235feac97a0911ad7f3ed72a6f44ae2bcb65329313ff9e9ca3911eabb577287 /exec failed with exit code 1
Lint / python-source-and-configs / linux-job (gh)
test/test_image.py:14:1: F401 'torchvision.io.image.encode_jpegs' imported but unused
Lint / python-types / linux-job (gh)
RuntimeError: Command docker exec -t 017b0fa9ba4bca25862758347ed549a4030349c28b0fc112caa9474c46ecd178 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

NicolasHug

Thanks for submitting the PR @deekay42 .

Before going deeper into the reviews we should try to make sure the existing tests are running / passing.

It looks like there are some build failures in various jobs. You can see the CI job either

at the bottom of Adding GPU acceleration to encode_jpeg #8375
In Adding GPU acceleration to encode_jpeg #8375 (comment) and in the corresponding hud page: https://hud.pytorch.org/pr/pytorch/vision/8375

There are a lot of failures, some which are more important than others. I made comments below which correspond to some of those failures.

The failure that seem the most concerning at this stage is the one that looks like

2024-04-15T09:56:21.6662102Z AttributeError: '_OpNamespace' 'image' object has no attribute '_jpeg_version'

It comes from here and the failure suggests that the image extension (where you added the new decoder) doesn't build properly with this PR.

How have you been testing those changes so far? Was it on a devvm/fbcode (with buck etc.)? If yes, it might be why you haven't seen those failure happen locally yet?

NicolasHug · 2024-04-15T12:47:46Z

torchvision/io/image.py

+    return output
+
+
+def encode_jpegs(inputs: list[torch.Tensor], quality: int = 75) -> list[torch.Tensor]:


Here and everywhere else, this should be List
otherwise this is causing errors like

2024-04-15T10:06:23.0758306Z def encode_jpegs(inputs: list[torch.Tensor], quality: int = 75) -> list[torch.Tensor]: 2024-04-15T10:06:23.0759045Z TypeError: 'type' object is not subscriptable

Suggested change

def encode_jpegs(inputs: list[torch.Tensor], quality: int = 75) -> list[torch.Tensor]:

def encode_jpegs(inputs: List[torch.Tensor], quality: int = 75) -> List[torch.Tensor]:

Hey Nicolas, thanks for taking a look!
I used a devGPU and didn't use buck or any internal tools. The PR builds and tests fine locally without any errors. I'm guessing this is because I'm using Python 3.11, but this test uses 3.8 (subscript for standard containers was introducted in PEP 585 which was released with python 3.9). Should be an easy fix.

torchvision/csrc/io/image/cuda/encode_jpeg_cuda.cpp

NicolasHug

Thanks @deekay42 , I made a first pass over the code. This looks pretty good already!

NicolasHug · 2024-04-18T12:56:55Z

test/test_image.py

@@ -483,6 +559,53 @@ def test_encode_jpeg_errors():
    with pytest.raises(RuntimeError, match="Input data should be a 3-dimensional tensor"):
        encode_jpeg(torch.empty((100, 100), dtype=torch.uint8))

+@needs_cuda
+def test_encode_jpeg_cuda_errors():


test_encode_jpeg_cuda_errors and test_encode_jpegs_cuda_errors are currently sandwiched between test_encode_jpeg_errors and test_encode_jpeg which isn't great as it scatters related tests (we're not necessarily doing a great job at that in that file already).

But it might best to keep test_encode_jpeg_cuda_errors and test_encode_jpegs_cuda_errors just below test_encode_jpeg[s]_cuda to have related tests grouped together?

NicolasHug · 2024-04-18T13:03:31Z

test/test_image.py

+    # next we check to see if the image is encodeable by PIL and
+    # if it isn't we don't include it in the test
+    try:
+        decoded_image_pil = F.to_pil_image(decoded_image_tv)
+        buf = io.BytesIO()
+        decoded_image_pil.save(buf, format="JPEG", quality=75)
+    except Exception:
+        # If PIL can't encode the image we can't either
+        return
+
+    encoded_jpeg_cuda_tv = encode_fn(decoded_image_tv.cuda().contiguous(), quality=75)
+    decoded_jpeg_cuda_tv = decode_jpeg(encoded_jpeg_cuda_tv.cpu())


Instead of using PIL for that, we can just directly check whether the image is CMYK, similar to what we're doing here:

vision/test/test_image.py

Lines 425 to 426 in 2ae6a6d

if "cmyk" in img_path:

pytest.xfail("Decoding a CMYK jpeg isn't supported")

Suggested change

# next we check to see if the image is encodeable by PIL and

# if it isn't we don't include it in the test

try:

decoded_image_pil = F.to_pil_image(decoded_image_tv)

buf = io.BytesIO()

decoded_image_pil.save(buf, format="JPEG", quality=75)

except Exception:

# If PIL can't encode the image we can't either

return

encoded_jpeg_cuda_tv = encode_fn(decoded_image_tv.cuda().contiguous(), quality=75)

decoded_jpeg_cuda_tv = decode_jpeg(encoded_jpeg_cuda_tv.cpu())

if "cmyk" in img_path:

pytest.xfail("Encoding a CMYK jpeg isn't supported")

NicolasHug · 2024-04-18T13:24:36Z

torchvision/csrc/io/image/cuda/decode_jpeg_cuda.cpp

-          create_status);
-    }
-  });
+  std::call_once(::nvjpeg_handle_creation_flag, nvjpeg_init);


Noob Q: IIUC ::something means the name resolution starts at the global namespace rather than at the local namespace. But why is it needed here? There cannot be another nvjpeg_handle_creation_flag, right?

It's not strictly needed here but it makes it extra explicit that this flag is in the global namespace

NicolasHug · 2024-04-18T13:25:21Z

torchvision/csrc/io/image/cuda/encode_jpeg_cuda.cpp

+  cudaStream_t stream = at::cuda::getCurrentCUDAStream(device.index());
+
+  // Create global nvJPEG handle
+  std::call_once(::nvjpeg_handle_creation_flag, nvjpeg_init);


Same Q as above about the need for :: here?

NicolasHug · 2024-04-18T13:50:48Z

torchvision/csrc/io/image/cuda/encode_jpeg_cuda.cpp

+  // For some reason I couldn't get nvjpegEncodeImage to work for grayscale
+  // images but nvjpegEncodeYUV seems to work fine. Annoyingly however,
+  // nvjpegEncodeYUV requires the source image plane pitch to be divisible by 8
+  // so we must pad the image if needed:
+  // https://github.com/NVIDIA/cuda-samples/issues/23#issuecomment-559283013


Thanks for these details. The problem with the padding is that the resulting encoded image is incorrect: it has undesirable padded values, and its shape is also different from the original image. E.g. this simple test would fail:

orig = torch.randint(0, 256, size=(1, 99, 99), dtype=torch.uint8).cuda() decoded = decode_jpeg(encode_jpeg(orig).cpu()) assert decoded.shape == orig.shape # Fails: torch.Size([1, 99, 104]) != torch.Size([1, 99, 99])

It's best not go with that solution. We can try to find a way to support grayscale images but in the meantime, we can just raise a loud TORCH_CHECK error if the image is grayscale.

Ok, I'll remove for now.

NicolasHug · 2024-04-18T15:14:57Z

torchvision/csrc/io/image/cuda/encode_jpeg_cuda.cpp

+    // Due to the required input format of nvjpeg tensors must be contiguous in
+    // memory. We could do the conversion to contiguous here but that comes with
+    // a performance penalty which will be transparent to the user. Better to
+    // make this explicit and push the conversion to user code.
+    TORCH_CHECK(
+        image.is_contiguous(),
+        "All input tensors must be contiguous. Call tensor.contiguous() before calling this function.")


I understand the intent, but pytorch ops tend to do that kind of implicit conversion on behalf of the user when there's no other alternative (example:

vision/torchvision/csrc/ops/cpu/nms_kernel.cpp

Lines 23 to 26 in 2ae6a6d

auto x1_t = dets.select(1, 0).contiguous();

auto y1_t = dets.select(1, 1).contiguous();

auto x2_t = dets.select(1, 2).contiguous();

auto y2_t = dets.select(1, 3).contiguous();

).

One way to look at it is, if the user has non-contiguous image, there are only 3 scenarios:

a loud error

an implicit conversion (by us)

an explicit conversion (by the user)

and 3. result in the exact same perf, so I'm not sure we're saving them from anything by raisin a loud error. If anything this might create a sub-optimal user-experience.

So here I'd recommend to just do the conversion on behalf of the user.

Makes sense.

NicolasHug · 2024-04-18T15:32:13Z

torchvision/csrc/io/image/cuda/encode_jpeg_cuda.cpp

+        "The number of channels should be 1 or 3, got: ", image.size(0));
+  }
+
+  torch::Device device = images[0].device();


Should we assert that all images are on the same device as well?

Right now, if we have [img0, img1] respectively on cuda:0 and cuda:1, the encoded output for img1 will be allocated on cuda:0 which is probably not desirable?

Added some validation and a test case.

NicolasHug · 2024-04-18T15:38:04Z

torchvision/csrc/io/image/cuda/encode_jpeg_cuda.cpp

+    const cudaStream_t stream,
+    const torch::Device& device,
+    const nvjpegEncoderState_t nv_enc_state,
+    const nvjpegEncoderParams_t nv_enc_params) {


Should we pass all these as const refs?

NicolasHug · 2024-04-18T15:55:50Z

torchvision/csrc/io/image/cuda/encode_jpeg_cuda.cpp

+  nvjpegStatus_t samplingSetResult = nvjpegEncoderParamsSetSamplingFactors(
+      nv_enc_params, channels == 1 ? NVJPEG_CSS_GRAY : NVJPEG_CSS_444, stream);


Curious about this call as I don't see it in https://docs.nvidia.com/cuda/nvjpeg/index.html#jpeg-encoding-example, why is it needed?

This is to handle grayscale images, but won't be needed, now that grayscale is out of scope (for now)

NicolasHug · 2024-04-18T16:03:08Z

torchvision/io/image.py

+    input: torch.Tensor,
+    mode: ImageReadMode = ImageReadMode.UNCHANGED,
+    apply_exif_orientation: bool = False,


Nit: best to avoid that kind of unrelated formatting changes (I assume those come from the internal linter?) as they distract from the review and they also add noise to git blame.

You can keep them here as those aren't too disruptive, just letting you know for future PRs.

Got it - yea, I think that was the linter.

facebook-github-bot added the cla signed label Apr 12, 2024

NicolasHug reviewed Apr 15, 2024

View reviewed changes

deekay42 force-pushed the main branch from 8e0b850 to 7127f3a Compare April 15, 2024 18:30

deekay42 marked this pull request as draft April 15, 2024 18:31

deekay42 force-pushed the main branch 2 times, most recently from 1033dc4 to 7181f4c Compare April 16, 2024 20:56

NicolasHug reviewed Apr 17, 2024

View reviewed changes

torchvision/csrc/io/image/cuda/encode_jpeg_cuda.cpp Outdated Show resolved Hide resolved

deekay42 force-pushed the main branch from 7181f4c to a3b3111 Compare April 17, 2024 22:10

NicolasHug mentioned this pull request Apr 18, 2024

Update classify_prs notebook #8383

Merged

NicolasHug reviewed Apr 18, 2024

View reviewed changes

deekay42 force-pushed the main branch 2 times, most recently from 7ebb13f to 6c1c9fe Compare April 23, 2024 04:55

deekay42 closed this Apr 23, 2024

deekay42 force-pushed the main branch from 6c1c9fe to 5181a85 Compare April 23, 2024 05:06

NicolasHug mentioned this pull request Apr 29, 2024

Adding GPU acceleration to encode_jpeg #8391

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding GPU acceleration to encode_jpeg #8375

Adding GPU acceleration to encode_jpeg #8375

deekay42 commented Apr 12, 2024 •

edited

Loading

pytorch-bot bot commented Apr 12, 2024 •

edited

Loading

NicolasHug left a comment

NicolasHug Apr 15, 2024

deekay42 Apr 15, 2024

NicolasHug left a comment

NicolasHug Apr 18, 2024

NicolasHug Apr 18, 2024

NicolasHug Apr 18, 2024

deekay42 Apr 19, 2024

NicolasHug Apr 18, 2024

NicolasHug Apr 18, 2024

deekay42 Apr 19, 2024

NicolasHug Apr 18, 2024

deekay42 Apr 20, 2024

NicolasHug Apr 18, 2024

deekay42 Apr 23, 2024

NicolasHug Apr 18, 2024

deekay42 Apr 23, 2024

NicolasHug Apr 18, 2024

deekay42 Apr 23, 2024

NicolasHug Apr 18, 2024

deekay42 Apr 23, 2024

		return output


		def encode_jpegs(inputs: list[torch.Tensor], quality: int = 75) -> list[torch.Tensor]:

	if "cmyk" in img_path:
	pytest.xfail("Decoding a CMYK jpeg isn't supported")

	auto x1_t = dets.select(1, 0).contiguous();
	auto y1_t = dets.select(1, 1).contiguous();
	auto x2_t = dets.select(1, 2).contiguous();
	auto y2_t = dets.select(1, 3).contiguous();

		nvjpegStatus_t samplingSetResult = nvjpegEncoderParamsSetSamplingFactors(
		nv_enc_params, channels == 1 ? NVJPEG_CSS_GRAY : NVJPEG_CSS_444, stream);

Adding GPU acceleration to encode_jpeg #8375

Adding GPU acceleration to encode_jpeg #8375

Conversation

deekay42 commented Apr 12, 2024 • edited Loading

pytorch-bot bot commented Apr 12, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8375

❌ 4 New Failures, 37 Pending

NicolasHug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deekay42 commented Apr 12, 2024 •

edited

Loading

pytorch-bot bot commented Apr 12, 2024 •

edited

Loading