Skip to content

VideoFrame.createImageBitmap returns an ImageBitmap with coded resolution rather than display resolution #119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jamespearce2006 opened this issue Dec 22, 2020 · 25 comments

Comments

@jamespearce2006
Copy link

It seems the createImageBitmap method is returning an ImageBitmap with slightly incorrect dimensions.
For example, in my case, I'm getting video frames that have the following properties:

codedHeight: 368
codedWidth: 640
cropHeight: 360
cropLeft: 0
cropTop: 0
cropWidth: 640
displayHeight: 360
displayWidth: 640
duration: null
format: null
planes: null
timestamp: 1

I'd expect the resulting ImageBitmap to honor the crop data and be the display resolution - 640x360. In reality, I'm getting an ImageBitmap that is 640x368, and the last 8 lines aren't valid data.

@jchen10
Copy link

jchen10 commented Dec 23, 2020

This is something to be clarified. Currently for hardware decoded video frames, chrome uses coded size for the created ImageBitmap, while it's crop size for soft video frames.

@jamespearce2006
Copy link
Author

Presumably, that's a temporary thing? Having an API behavior vary depending on the implementation of the underlying decoder seems like a bad thing.
It seems to me that if an app developer is calling createImageBitmap, that means they intend to render the output. In that case, I imagine they'd always want the intended display size, not the coded size. If it's a performance concern, perhaps a parameter could be added to createImageBitmap so the caller can indicate whether they want the cropped or coded size.

In the meantime, what's the best way of cropping the ImageBitmap? I'm rendering using WebGL, and texSubImage2D doesn't seem to work (the call doesn't include crop dimensions, but fails with an ImageBitmap anyway).

@jchen10
Copy link

jchen10 commented Dec 24, 2020

The WebCodecs spec should further clarify this. A parameter indicating whether callers want the cropped or coded size may be a good choice. @sandersdan @chcunningham

For the texSubImage2D problem, probably you need to use WebGL2 context to crop an ImageBitmap. Alternatively you can simply use texImage2D, and set the texture coordinates cropped.

For WebGL, there is another option, WebGL_webcodecs_video_frame and Sample. You can skip the
intermediate ImageBitmap step with it. It's still in early stage, supporting hardware video frames only now. Software video frames will be added soon. Try it if you want to be an early bird. :)

@jamespearce2006
Copy link
Author

Thanks for the info. I'd be strongly in favor of having some way of extracting the cropped frame as an imagebitmap. I would imagine >80% of other API consumers would too.

WRT skipping the intermediate ImageBitmap - unfortunately, my use case requires me to buffer a second or more of decoded frames. It seems only a handful of undestroyed video frames are allowed before the decoder will stall, which isn't surprising. Obtaining the ImageBitmap seems the best approach for me.

@sandersdan
Copy link
Contributor

We last discussed this in December and at that time the preference was to be as user-friendly as possible without sacrificing substantial performance. I believe that means:

  • Always apply the crop
  • Do not scale to display size
  • Do not apply color space conversion (not yet exposed by WebCodecs VideoFrame)
  • Do not apply rotation (not yet exposed by WebCodecs VideoFrame)

This does mean that an ImageBitmap created in this way can't be used reliably without reference to the VideoFrame's display size, color space, and rotation metadata. I would predict that many developers would assume 1:1 PAR, sRGB, and 0deg, at least one of which will usually be wrong.

Any feedback on these points would be appreciated!

@chcunningham
Copy link
Collaborator

chcunningham commented Jan 6, 2021

Having an API behavior vary depending on the implementation of the underlying decoder seems like a bad thing.

Definitely!

It seems to me that if an app developer is calling createImageBitmap, that means they intend to render the output. In that case, I imagine they'd always want the intended display size, not the coded size.

I like this reasoning.

I believe that means:

  • Always apply the crop

Sounds like everyone is in agreement to at least apply the crop. We fix chrome's behavior wrt gpu-backed frames. Happy to also clarify the spec. For now, I lean toward not making it user-configurable and just doing the intuitive thing. LMK if I'm overlooking a use-case for the alternative.

@jamespearce2006
Copy link
Author

Sounds good to me!

@sandersdan
Copy link
Contributor

We discussed this again today and there was more support for an easy-to-use API now that we have a separate WebGL path. In summary:

  • Always apply the crop.
  • Always scale to display size (this is a no-op for most VideoFrames anyway).
  • Rotation should be applied.
  • Color space is unclear. BT.709 -> sRGB would have to happen for most frames, but it would reduce confusion about black levels.

@dustinkerstein
Copy link

Quick comment regarding:

  • Always scale to display size (this is a no-op for most VideoFrames anyway).

This is desirable most of the time, but I believe it may cause issues for WebGL rendering on mobile devices which have a relatively low limit on the max texture size. Ie. For equirectangular 360 video which is 2:1 DAR, a 4k video's display resolution ends up being 4320 which is beyond the limits of most mobile devices. Would it be possible to consider making this scaling to display size a user configurable option?

@sandersdan
Copy link
Contributor

For equirectangular 360 video which is 2:1 DAR

Hmm, I forgot that that was a relatively common thing to do. Is there any reason to suspect that such uses won't prefer using WEBGL_webcodecs_video_frame anyway though?

Another alternative is to just lie in the VideoDecoderConfig and claim that the display size is exactly the crop size. That seems like it might be a good enough workaround to not need its own API?

@dustinkerstein
Copy link

Hmm, I forgot that that was a relatively common thing to do. Is there any reason to suspect that such uses won't prefer using WEBGL_webcodecs_video_frame anyway though?

Another alternative is to just lie in the VideoDecoderConfig and claim that the display size is exactly the crop size. That seems like it might be a good enough workaround to not need its own API?

Ok cool. WEBGL_webcodecs_video_frame does sound like it would handle my use-case as it provides direct access to the texture, and your proposed workaround seems like a good solution for folks that want the encoded texture size via createImageBitmap. Thanks!

@jamespearce2006
Copy link
Author

I actually have an intermediate use case...
I don't want to use the video frame, as I need to do quite a lot of buffering, and typically need to hold onto more than a second's worth of frames.
However, I don't really want to use createImageBitmap either, as I already have shaders to do the color space conversion, etc.

Will there be a way of copying the YUV buffers from video frame without calling createImageBitmap? I think the API spec mentions this, but I didn't notice any mechanism currently implemented.

@sandersdan
Copy link
Contributor

sandersdan commented Jan 8, 2021

The VideoFrame.planes mechanism is implemented in Chrome only for frames that are backed by mappable memory. We may be able to implement readback of texture-backed frames using the same mechanism, but readback can be complex and there is a risk that we could get the pixel format, plane offsets, etc. incorrect if we try to guess them before attempting the readback operation. I expect that over time Chrome will have fewer and fewer cases that are texture-backed, so it may not be prioritized, or we may propose a separate API for it.

As a workaround, it should be possible to use the WEBGL_webcodecs_video_frame extension to read back the planes as well, the same as for other textures.

as I need to do quite a lot of buffering, and typically need to hold onto more than a second's worth of frames.

FYI it's not actually guaranteed that images created by createImageBitmap() are separate from the underlying frames, it may be necessary to close() them to release the underlying frame. This isn't finalized yet but is required for a zero-copy implementation.

@jamespearce2006
Copy link
Author

Thanks Dan. I have to confess, though, I'm not sure I fully understand the implications of your comment on createImageBitmap().

My use case requires me to buffer about 60 frames (although the resolution is typically quite low). My current implementation uses a wasm decoder and generates YUV planes as ArrayBuffers. These are buffered, and when/if needed will be rendered via WebGL, using a fragment shader to do colour space conversion.

I've successfully got this working with web-codecs. My initial thought was to just buffer the video frames that are output from the decoder. This didn't work though, as the decoder stalls after about 10 undestroyed video frames. I've switched to using createImageBitmap(), and now buffer the ImageBitmaps. This works, but I worry it's introducing unnecessary overhead by eagerly doing the colour space conversion, when there's no guarantee that a frame will end up being rendered. I'd rather keep the YUV and do the conversion in a shader when needed.

Is there any advice you can offer in this case, or is my approach already the best strategy?

@sandersdan
Copy link
Contributor

Your approach may or may not work, depending on details of the particular WebCodecs implementation you are using. In Chrome's current implementation, you approach may or may not work depending on the platform; in some cases a zero-copy path may be used and the ImageBitmaps will hold a clone() of the VideoFrame until they are closed.

We could specify that createImageBitmap() always copies, but then we can't have a performant (zero-copy) implementation.

There isn't really a way out of copying; in most cases decoded frames are owned by the decoder and the only way to buffer many of them is to make a copy.

WebCodecs does not currently have an explicit copy or readback API. I am investigating and may propose one soon. The priority is relatively low given that there are workarounds available.

On the point of color space conversions, I've just noticed that CreateImageBitmapOptions already includes a colorSpaceConversion field, we should probably just make the behavior configurable using that.

@jamespearce2006
Copy link
Author

So, should I be calling VideoFrame.clone() and then just hold onto the cloned video frames? The spec says the cloned video frames have a separate lifetime and are deep copies.
If not, what are the workarounds you mention?

@sandersdan
Copy link
Contributor

sandersdan commented Jan 8, 2021

@chcunningham: This is not the expected behavior of clone(); I think that I think the wording in the spec is incorrect. The Clone Frame algorithm does have a note that describes the expected behavior, but the steps are not clear and the destroy() steps don't match up.

Cloned frames do have their own lifetimes in the sense that destroy() on one does not destroy the other, but it is required to destroy() all clones.

Chrome's implementation of clone() never copies the underlying data.

@sandersdan
Copy link
Contributor

sandersdan commented Jan 8, 2021

@jamespearce2006: The workarounds are using ImageBitmap or WebGL_webcodecs_video_frame to get the data into WebGL, then reading back the image data.

@jamespearce2006
Copy link
Author

Would it be out of the question to add an option to createImageBitmap() to perform a copy, and guarantee it's not a clone of the VideoFrame?

@sandersdan
Copy link
Contributor

sandersdan commented Jan 8, 2021

Would it be out of the question to add an option to createImageBitmap() to perform a copy, and guarantee it's not a clone of the VideoFrame?

I'm unsure. If it's on VideoFrame then that seems fine, but the original expectation was to deprecate it and the existing window.createImageBitmap() API would take over responsibility. It was simply easier to keep the spec and implementation self-contained to start.

If we implement an explicit copy API we resolve the core problem, but copy() + createImageBitmap() could be two copies. I'll bring this up in our next meeting.

@jamespearce2006
Copy link
Author

Thanks. I can think of a lot of cases where getting access to the underlying buffer and/or holding the data for a period of time will be needed, so I'd like to lobby to increase the priority of the copy/readback API. ImageBitmaps are working for me currently, but I'm slightly concerned that the workaround is quite involved, if that approach doesn't work on other platforms.

@chcunningham
Copy link
Collaborator

@chcunningham: This is not the expected behavior of clone(); I think that I think the wording in the spec is incorrect. The Clone Frame algorithm does have a note that describes the expected behavior, but the steps are not clear and the destroy() steps don't match up.

Agree, I mixed this up. I'll send a PR to sort it out (clone isn't necessarily deep copy, but its always a new lifetime guarantee).

@jamespearce2006
Copy link
Author

Sorry to resurrect this thread again. I just wanted to make a few additional points after talking with colleagues.

Firstly, @sandersdan responded to one of my implementation questions above with "Your approach may or may not work, depending on...". I think it's important that the API is predictable. Either it should behave the same way regardless of the underlying decoder implementation, or it should allow me to query (or control) what the behavior will be. If I have a need to hold onto decoded frames for longer than a hardware decoder might allow, or readback the decoded data, not knowing whether I'll get a hardware or software decoder means I don't know whether my code will work. At the very least, there should be a way of querying this. Even better, I should be able to ask to enable this so that my code is guaranteed to work. (Quite possibly this results in a software decoder being used, but as long as I get consistency, that's fine).

Secondly, reading between the lines, it seems that linear playback workflows are being prioritized initially for development of the API. I base this on the fact that the readback API is currently a lower priority. I would argue that linear playback is already well served by existing APIs (MSE, HTMLVideoElement, etc). Personally, I think it'd be a better to prioritize more exotic workflows, that are not currently achievable (or perhaps require WebAssembly).

Finally, in the above discussion, the proposed workaround for reading back a decoded frame is to render it, via WebGL, then read back the image data. Unfortunately, this isn't really a viable solution from a web-worker. It's not possible to dynamically create a canvas in a worker - one needs to be passed in from the main thread.

@sandersdan
Copy link
Contributor

I agree that the behavior should be predictable. I'm personally on the side that ImageBitmaps shouldn't be able to stall a decoder, but it does have performance implications so it's being considered thoroughly.

linear playback workflows are being prioritized initially for development of the API

The priority right now is to ship a working MVP API. We'll cut features if they are not required to demonstrate the core goals of WebCodecs.

Don't worry too much about this specific issue though, ImageBitmap integration is a core feature.

It's not possible to dynamically create a canvas in a worker

You can create an OffscreenCanvas in a worker.

@chcunningham
Copy link
Collaborator

@jamespearce2006 earlier wrote

Will there be a way of copying the YUV buffers from video frame without calling createImageBitmap? I think the API spec mentions this, but I didn't notice any mechanism currently implemented.

The latest thinking on this is to offer an API like:
let deepCopiedFrame = videoFrame.copy()

This is mentioned alongside other API changes for VideoFrame readback in #157. We are working now to get them implemented.

I've also created a separate #159 to hash out createImageBitmap semantics. Dan has taken the excellent feedback in this issue and made several proposals. Note that, I think everyone in this thread is actually better suited using the newer drawImage or texImage methods for painting (better perf and memory mgmt). The current plan (in #159) for createImageBitmap() is that it be a friendly (does conversions, rotations, probably makes copies) API for rendering the occasional frame. Please see rationale in that bug.

With that, I think we've probably split out all the remaining work in this issue. Please re-open if I've missed anything. Thank you for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants