Skip to content

A review #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jan-ivar opened this issue May 18, 2021 · 8 comments
Closed

A review #38

jan-ivar opened this issue May 18, 2021 · 8 comments

Comments

@jan-ivar
Copy link
Member

This is a review in response to a Call for Review expiring today. I was going to open issues as requested, only to find they mostly exist already — many predating the CFR even — but may be hard to find. So I'm writing this overview instead.

Outstanding design issues I see as blockers:

  1. Don’t expose raw realtime media on main thread, since there's a precedent that says this is bad Out-of-main-thread processing by default #23 (comment)
  2. Limit to solving video for now, or explain why AudioWorklet is insufficient, to not reinvent it Is MediaStreamTrackProcessor for audio necessary? #29
  3. Lack of a threading model MediaStreamTrackProcessor threading model is underspecified #27
  4. Lack of algorithms Call the streams algorithms #36
  5. Controlling channel mechanism is unclear Controlling channel mechanism is unclear #24
  6. Unclear memory I/O design given that a VideoFrame may be in GPU memory Relationship to WebGPU #34 (Funny hat: GPU→CPU→GPU?)
  7. Can one JS document hose a browser's camera capturing ability? Can one JS document hose a browser's camera capturing ability? #37

The other issue is that there seem to be massive missing areas in WebCodecs that we rely on:

  1. VideoFrame seems optimized for WebCodecs, not funny hats, re copies — Is there room to question immutable VideoFrames vs e.g. lockable mutable ones when frames are not in GPU?
  2. Pixel formats seem underspecified, Add more complete descriptions for PixelFormats webcodecs#165 and Alpha support webcodecs#200 vs. e.g. earlier discarded ideas.

There also appears to be a lack of documentation or working samples to the point where any web developer can use it.

This makes it hard to see the big picture or validate whether the overall design addresses the listed use cases (a list that judging by the examples given, does not appear to require interfacing with WebCodecs — is that an omission?)

@chcunningham
Copy link

Answering just the questions for WebCodecs. @sandersdan @dalecurtis

  1. Unclear memory I/O design given that a VideoFrame may be in GPU memory Relationship to WebGPU #34 (Funny hat: GPU→CPU→GPU?)

VideoFrame is (will be) a CanvasImageSource. GPU:CPU copies are not required to do funny hats (e.g. use canvas drawImage() or webGL texImage2d()).

Similar to ImageBitmap, WebCodecs does not describe the precise location of VideoFrame backing memory (GPU, CPU, shared, ...). This should not be required to define its observable behavior.

  1. VideoFrame seems optimized for WebCodecs, not funny hats, re copies — Is there room to question immutable VideoFrames vs e.g. lockable mutable ones when frames are not in GPU?

Please see above.

  1. Pixel formats seem underspecified, Add more complete descriptions for PixelFormats webcodecs#165 and Alpha support webcodecs#200 vs. e.g. earlier discarded ideas.

The proposed use of fourcc style pixel formats is widely used across existing codec APIs (e.g. FFmpeg). See: https://www.fourcc.org/yuv.php. We will soon provide more verbose definitions for the each format as described in the linked issues. We will additionally extend the list to include other formats (e.g. RGBA, NV12, ...)

@youennf
Copy link
Contributor

youennf commented May 18, 2021

Following my email on public-webrtc, before diving into API specifics, I think we should be discussing the following foundational topics:

  • threading model (worker, worklet...)
  • memory model (buffering, zero-copy, pixel format)
  • native transforms or not
  • WhatWG streams or not

@youennf
Copy link
Contributor

youennf commented May 18, 2021

We will additionally extend the list to include other formats (e.g. RGBA, NV12, ...)

Agreed this is an important topic, media pipelines try hard not to do any conversion.
w3c/mediacapture-extensions#13 might somehow help.
I am unclear whether implicit conversions (like painting a YUV frame on a canvas) is a good thing or not.

@dalecurtis
Copy link

dalecurtis commented May 18, 2021

I am unclear whether implicit conversions (like painting a YUV frame on a canvas) is a good thing or not.

I agree we want to avoid implicit conversions in the course of codec I/O (encode/decode) and transfer (postMessage, transferable streams) operations for WebCodecs. However, draw operations should allow implicit conversion so the user agent can choose the most efficient platform specific rendering path.

E.g., drawImage(VideoFrame) works the same way it does for drawImage(HTMLVideoElement). In Chrome we will defer the conversion until compositing or may even just hand off the YUV data as an overlay to the platform.

@guidou
Copy link
Contributor

guidou commented May 31, 2021

  • Don’t expose raw realtime media on main thread, since there's a precedent that says this is bad #23 (comment)

This has been replied to in #23. It seems that we have agreement that processing on main thread should not be impossible.
We still do not have consensus on how hard it should be (no one has replied to my proposal to have it off by default and require an explicit constructor parameter to enable it).
There are valid use cases for using the main thread. For example, easier migration from canvas capture and any application where not a lot of processing occurs and using a Worker would add no benefit other than some extra resource consumption and making the application more difficult to write.

This has been responded to in several messages of #29, but here's a summary:

  • Makes it easier to develop applications that need to combine audio and video processing (e.g., https://arxiv.org/pdf/1804.03619.pdf)
  • Same API for audio and video, which makes things easier for developers especially if the processing is similar or the same
  • Different processing model from audio worklet. Access to direct data from the track, including its timestamps.

It's not clear to me why this would be needed. I asked for further clarification in #27.

The algorithms are not really needed since the spec is defined in terms of the results the APIs should produce instead of the steps required to produce them. However, since the streams API recommends using them, I'll prepare a PR to use them (without any change in semantics).

Can you elaborate on what is unclear about it? I replied to #24 a while ago and haven't received any extra feedback since them.

Already replied to by Chris in #38 (comment)
Still unclear?

Not by default, but yes, if the document reads enough frames and does not close them a camera capturer can suspend producing frames.

@youennf
Copy link
Contributor

youennf commented Jun 1, 2021

It seems that we have agreement that processing on main thread should not be impossible.

I would phrase it this way:

  • There is no consensus on a goal to provide an API that allows processing on main thread.
  • There is consensus on not having a goal to forbid main thread processing.

It's not clear to me why this would be needed. I asked for further clarification in #27.

I added some comments in the thread.

The algorithms are not really needed since the spec is defined in terms of the results the APIs

The current spec would be really hard to implement in an interoperable manner.
Some JS exposed behaviours are actually not specified for instance.
Describing these algorithms will allow reaching interoperability.

Can you elaborate on what is unclear about it? I replied to #24 a while ago and haven't received any extra feedback since them.

I added some comments in the thread.

Already replied to by Chris in #38 (comment)
Still unclear?

@aboba might clarify things here. My understanding is as follow:

  • We have MediaStreamTrack data that will often be backed by GPU data
  • We have WebGPU API that will allow processing GPU data
  • Is this API able to do WebGPU processing on MediaStreamTrack data in an optimal manner, ideally no conversion/no memory copies? If not, do we have a good idea on how to add that support at a later stage.

Not by default, but yes, if the document reads enough frames and does not close them a camera capturer can suspend producing frames.

The general question is how much we can reduce this risk and whether APIs could be designed with this in mind.
For instance, the current API is exposing a pattern that seems nice: when enqueueing a frame in a MediaStreamTrack generator, the frame gets consumed.
I would hope we can continue improving the design to further reduce these risks.

@guidou
Copy link
Contributor

guidou commented Jun 1, 2021

It seems that we have agreement that processing on main thread should not be impossible.

I would phrase it this way:

  • There is no consensus on a goal to provide an API that allows processing on main thread.
  • There is consensus on not having a goal to forbid main thread processing.

I agree with this phrasing.

The current spec would be really hard to implement in an interoperable manner.

Are you referring to the streams algorithms or something else specifically?

Some JS exposed behaviours are actually not specified for instance.

Can you file specific bugs about those behaviors?

Describing these algorithms will allow reaching interoperability.

I'll prepare a PR to include those algorithms. Tracked on #36.

  • We have MediaStreamTrack data that will often be backed by GPU data
  • We have WebGPU API that will allow processing GPU data
  • Is this API able to do WebGPU processing on MediaStreamTrack data in an optimal manner, ideally no conversion/no memory copies? If not, do we have a good idea on how to add that support at a later stage.

I think we should continue this discussion on a separate bug, perhaps on the WebCodecs spec if we agree that WebCodec's VideoFrame is the right API to expose video frames coming from a MediaStreamTrack.

Not by default, but yes, if the document reads enough frames and does not close them a camera capturer can suspend producing frames.

The general question is how much we can reduce this risk and whether APIs could be designed with this in mind.
For instance, the current API is exposing a pattern that seems nice: when enqueueing a frame in a MediaStreamTrack generator, the frame gets consumed.
I would hope we can continue improving the design to further reduce these risks.

I agree that the goal should be risk reduction since completely eliminating this possibility would negate the goal of optimal WebGPU processing.

@guidou
Copy link
Contributor

guidou commented Jun 18, 2021

  1. Don’t expose raw realtime media on main thread, since there's a precedent that says this is bad #23 (comment)
    We have shown that there are valid use cases for using the API on the main thread and I believe that we have agreed that using the main-thread is discouraged, but forbidding access on the main-thread is not an explicit goal. Either way, we can continue the discussion in Out-of-main-thread processing by default #23.
  1. Limit to solving video for now, or explain why AudioWorklet is insufficient, to not reinvent it Is MediaStreamTrackProcessor for audio necessary? #29
    Already explained, and the spec now mentions some use cases for which the proposed API is a better fit. We can continue the discussion in Is MediaStreamTrackProcessor for audio necessary? #29.
  1. Lack of a threading model MediaStreamTrackProcessor threading model is underspecified #27
    Closed. No threading model required unless a more specific need is raised.
  1. Lack of algorithms Call the streams algorithms #36
    Fixed.
  1. Controlling channel mechanism is unclear Controlling channel mechanism is unclear #24
    The mechanism has been removed.
  1. Unclear memory I/O design given that a VideoFrame may be in GPU memory Relationship to WebGPU #34 (Funny hat: GPU→CPU→GPU?)
    Already answered. This would be a WebCodecs issue anyway.
  1. Can one JS document hose a browser's camera capturing ability? Can one JS document hose a browser's camera capturing ability? #37

Already answered in #37.

The other issue is that there seem to be massive missing areas in WebCodecs that we rely on:

  1. VideoFrame seems optimized for WebCodecs, not funny hats, re copies — Is there room to question immutable VideoFrames vs e.g. lockable mutable ones when frames are not in GPU?
  2. Pixel formats seem underspecified, w3c/webcodecs#165 and w3c/webcodecs#200 vs. e.g. earlier discarded ideas.

Already answered.

There also appears to be a lack of documentation or working samples to the point where any web developer can use it.

Some working samples:
https://webrtc.github.io/samples/src/content/insertable-streams/audio-processing/
https://webrtc.github.io/samples/src/content/insertable-streams/video-processing/

This makes it hard to see the big picture or validate whether the overall design addresses the listed use cases (a list that judging by the examples given, does not appear to require interfacing with WebCodecs — is that an omission?)

Not clear to me what this means. Can you file a specific issue with more details about what you mean?

Closing this since only a handful of issues raised by this review remain open and it's easier to continue the discussion in their respective github issues.

@guidou guidou closed this as completed Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants