-
Notifications
You must be signed in to change notification settings - Fork 26
Encoded frame IDLs share definitions with WebCodecs #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@youennf metadata relating to the frame (such as type, dependencies and TID/SID) is provided by WebCodecs. |
From looking at the spec, here is the list of obvious things that differ right now:
The plan might be in this spec to:
To do so, we need WebCodecs to first fix attribute data setter and add getMetadata. If we directly use WebCodecs structures, we also need to ensure they stay aligned with our model and constraints here. |
HI @youennf, sorry for the delay. Encoded*Chunk currently exposes a readonly data attribute, but we were planning to remove it entirely in favor of a readInto() method that copies the data out (PR for this here). The reasoning is that we wish for all I/O types in webcodecs to be immutable to avoid surprising users and (critically) to avoid TOCTOU security issues. I saw your comment here
Rather than setting the .data attribute, is it possible to instead construct a new chunk w/ the transformed data? We could design a copy constructor or similar to make this easy for authors.
We've had some other requests for metadata as well. I'd like to weigh the options of subclassing, vs wrapping, vs extending (with partial dictionaries) vs having an open ended metadata bucket (maybe a Map) on the Encoded*Chunk interfaces. I don't have an opinion just yet. For the metadata you're seeking, is it important that the metadata be carried through decoding/encoding, such that a VideoFrame corresponding to an EncodedVideoChunk has all the same values?
No objection. |
I haven't looked at the readInto proposal. The typical flow is: I read a frame F from encoder, I change the frame F, I write the frame F in the packetizer.
For the typical RTC metadata we want, we do not need to get it exposed past the decoder. We might in the future have metadata that would go from MediaStreamTrack to WebRTC transform through encoder (or vice versa on decoder side). That could be a separate bucket of metadata. |
For raw frames, we definitely do not want anyone to rely on garbage collection. Both AudioFrame and (more importantly) VideoFrame offer a close() method (similar to ImageBitmap) that releases the object's reference to underlying resources (and may immediately free them if that was the only reference). The spec encourages users to close() immediately when they no longer need the frame (and Chrome actually warns users if it GC's a frame that wasn't closed()). I'm adding further clarity to the ref-counting behavior in w3c/webcodecs#167. For encoded chunks, GC has been working fine since the data is relatively small, but I'd be open to adding a close() there as well.
Sorry, I didn't quite follow this. Can you say more?
With the Encoded*Chunk readInto() PR, this entails one copy when reading from the encoder output. I think it strikes a good balance between keeping I/O types immutable (making it easy to reason about TOCTOU issues) and not being too onerous. Additionally, for VideoFrame, the pixel data will often reside in GPU memory. We don't want to copy to CPU for the sake of making an ArrayBuffer unless the user explicitly requests that. We achieve this by again offering a readInto() API (this time, making it async to avoid requiring synchronization that hurts GPU pipeline perf). For folks that just want to go from MediaStreamTrack -> Canvas or from VideoDecoder -> Canvas, this API would never be called and everything stays on the GPU. This lazy async readback option isn't feasible with an interface that just offers an ArrayBuffer. In the long run, we hope the web platform will introduce a type of read-only ArrayBuffer, which enables us to do things like BYOB and expose buffer attributes (except for the GPU VideoFrame case mentioned above) without fear of TOCTOU issues. Extensive discussion on that in WICG/reducing-memory-copies#1 |
This is the kind of things I am worried. I think I agree with you on raw frames vs. encoded frames.
Making it easy to reason about TOCTOU issues is mostly a concern from browser implementors. |
For the metadata issue, I filed w3c/webcodecs#245 to track this in the WebCodec repo. |
Discussed at editor's meeting, it seems we could reuse WebCodecs metadata and extend it for RTC. |
Given the deployment of the RTCEncoded* types, it seems more Web-compatible to define a well-documented way to turn a WebCodecs encoded frame into an RTC encoded frame and vice versa. Metadata is important. |
Some members of audio/video encoded frames are shared with WebCodecs, like data or timestamp.
Some other members like getMetadata might be specific to WebRTC encoded transform.
We should converge and explicitly mark what is webrtc-specific and what is shared with WebCodecs.
The text was updated successfully, but these errors were encountered: