Skip to content

Video encoders may drop frames #240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
youennf opened this issue May 11, 2021 · 16 comments
Closed

Video encoders may drop frames #240

youennf opened this issue May 11, 2021 · 16 comments
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior).

Comments

@youennf
Copy link
Contributor

youennf commented May 11, 2021

Some encoders, to preserve bitrate, might decide to drop encoding of some frames.
They might also do the same if too much frames are in the pipe.

It is somehow unclear whether either case is an error or not.
From my understanding of the spec, it seems that this case would not trigger an error but would not surface to the web page either. It is then up to the web page to identify which frame was dropped and the web page might not know why it was dropped.

I wonder whether this is something that should be surfaced (more precise error codes?) though the potential impact on fingerprinting should be studied.

@chcunningham
Copy link
Collaborator

chcunningham commented May 11, 2021

The assumption in the current spec is that no frames are dropped.

Some encoders, to preserve bitrate, might decide to drop encoding of some frames.
They might also do the same if too much frames are in the pipe.

Can you elaborate on which encoders? Is it a default behavior? Can it be disabled?

@chcunningham
Copy link
Collaborator

Triage note: tentatively marking 'editorial', as the current wording intends that this not be allowed, so perhaps the change is just to clarify that intent. Discussion still unfolding, so label may change.

In my basic understanding, encoders that offer the ability to drop as described can be configured not to do so, meeting the expectations of the spec.

@chcunningham chcunningham added the editorial changes to wording, grammar, etc that don't modify the intended behavior label May 12, 2021
@dalecurtis
Copy link
Contributor

@youennf
Copy link
Contributor Author

youennf commented May 17, 2021

VTB API exposes kVTEncodeInfo_FrameDropped for that reason, rate control is indeed one potential reason.
There might be other cases, related to error cases or not.

Similarly, on decoder side, some decoding results might be considered errors or not.
VTB sometimes may return a no-error state but return an empty buffer (say packet loss, a key frame fixing those issues).
Let me know if you would prefer a separate issue, in particular given errors seem terminal in the current API design.

@padenot padenot added breaking Interface changes that would break current usage (producing errors or undesired behavior). and removed editorial changes to wording, grammar, etc that don't modify the intended behavior labels May 17, 2021
@padenot
Copy link
Collaborator

padenot commented May 17, 2021

I'd tend to agree with #240 (comment), but if this is added, it would be breaking (there is simply no way to express this), so let's discuss first.

@dalecurtis
Copy link
Contributor

I don't think this is breaking, you can return an error in the failure cases, and there are some use cases where dropping frames must not be allowed (transcoding, etc) -- so user agents must support a non-dropping mode. As such, a setting for allowing dropping later is an extension.

@chcunningham
Copy link
Collaborator

As such, a setting for allowing dropping later is an extension.

+1. @padenot, please mark as 'extension' if we're all aligned.

Similarly, on decoder side, some decoding results might be considered errors or not.
VTB sometimes may return a no-error state but return an empty buffer (say packet loss, a key frame fixing those issues).
Let me know if you would prefer a separate issue, in particular given errors seem terminal in the current API design.

I think this may be covered by #220? If not, separate issue sg.

@youennf
Copy link
Contributor Author

youennf commented May 19, 2021

+1. @padenot, please mark as 'extension' if we're all aligned.

I do not think we are aligned yet, sorry ;-)

so user agents must support a non-dropping mode

Does it mean VTB cannot be used to implement WebCodecs? That would be sad.

Or that, whenever VTB drops a frame, UA will error the encoder/decoder, which is currently a terminal state?
This seems bad.

The current API defines a terminal error state. This makes sense in cases like configuration is now unsupported and it is unrecoverable (say external eGPU is removed or device enters low power mode and now refuses to use SW encoders).

On the other end, there are a lot of cases where errors are not really terminal.
AIUI, all the data in the pipe of the encoder/decoder will be completely dropped on the floor for instance.
We might end up having some random encoder/decoder failures in obscure circumstances, say slight CPU overuse.
I really fear applications will not be prepared to handle that correctly. And if the handling is storing video frames as long as they are not encoded, that would be terrible for perf.

Given WebCodecs is thought as a general purpose codec API, it seems better to design a good error model, with both terminal codec-wide errors and frame-specific errors.

As such, a setting for allowing dropping later is an extension.

Do you know of existing popular encoder/decoder APIs that defines such setting?

@dalecurtis
Copy link
Contributor

And if the handling is storing video frames as long as they are not encoded, that would be terrible for perf.

I think this is covered by #121?

so user agents must support a non-dropping mode

Does it mean VTB cannot be used to implement WebCodecs? That would be sad.

Do you know of existing popular encoder/decoder APIs that defines such setting?

AFAIK, VideoToolbox supports this when kVTCompressionPropertyKey_RealTime is set to false.

Real-time modes are fairly common and function under some set of specified constraints (frame rate, bit rate, cpu usage, etc). As such, we do think there's room for specifying a configurable real time mode. We haven't come to consensus on any shape there though. We'd welcome any proposals you have.

@chcunningham
Copy link
Collaborator

The current API defines a terminal error state. This makes sense in cases like configuration is now unsupported and it is unrecoverable (say external eGPU is removed or device enters low power mode and now refuses to use SW encoders).

Please see #242 (comment) for reasoning about why errors are made fatal. I think this is a separate discussion from dropping (we are not proposing new error paths here). Please file a separate issue if more to discuss on errors generally.

@youennf
Copy link
Contributor Author

youennf commented Jun 3, 2021

And if the handling is storing video frames as long as they are not encoded, that would be terrible for perf.

I think this is covered by #121?

How is it covered?
I understand the simplification from a UA point of view, but this is clearly suboptimal.
Say I enqueue very quickly four frames: I frame, P frame, I frame, P frame.
First I frame decoding is ok, first P frame is bad.
Second I frame would have been ok but is being dropped.
Web application will need to request another I frame :(

Real-time modes are fairly common and function under some set of specified constraints

I might be missing something but I understood low-latency MSE or VC-like applications were amongst the main use cases.
For those use cases, using a realtime encoder seems of prime importance.
If we need to make API changes to support this use case, we should do it now, and not as an after thought.

@bradisbell
Copy link

When changes are made to support realtime use cases, please do not force these changes to always be active. There are use cases where it is desirable to encode slowly, or transcode media. (For example, generating frame data programmatically, waiting for it to complete, and then saving the compressed result.)

@youennf
Copy link
Contributor Author

youennf commented Jun 3, 2021

@bradisbell, agreed.
I hope we can design a single processing API to handle both cases.
This seems feasible given past work on APIs such as VideoToolBox.

@dalecurtis
Copy link
Contributor

How is it covered?
I understand the simplification from a UA point of view, but this is clearly suboptimal.
Say I enqueue very quickly four frames: I frame, P frame, I frame, P frame.
First I frame decoding is ok, first P frame is bad.
Second I frame would have been ok but is being dropped.
Web application will need to request another I frame :(

Sorry, I don't understand the point you're trying to make here. Can you elaborate?

Real-time modes are fairly common and function under some set of specified constraints

I might be missing something but I understood low-latency MSE or VC-like applications were amongst the main use cases.
For those use cases, using a realtime encoder seems of prime importance.
If we need to make API changes to support this use case, we should do it now, and not as an after thought.

I agree we should add a realtime signal prior to shipping, @Djuffin is putting together a proposal from our side. As noted in my comment, we'd love a proposal from you given your experience here.

@Djuffin
Copy link
Contributor

Djuffin commented Jun 4, 2021

So far my thoughts are

  1. When in realtime mode, encoders might have to drop frames in order to meet the bandwidth or time budget, but it is not an error. They might indicate a dropped frame by providing an empty output (or maybe just no output at all).
  2. Encoders are responsible that even if a frame is dropped, the resulting stream can still be decoded as if nothing happened. Following encoded frames should not depend on the dropped frame.

@chcunningham
Copy link
Collaborator

I think this issue is now broken up into other issues.

  • A default configuration strategy is as proposed in Video encoder behavior is under specified #241. Frames are not allowed to be dropped by default.
  • A knob that would allow dropping is proposed in VideoEncoder knob for latency/realtime mode #269.
  • Dropping when disallowed would be an impl bug, not an error. If the codec implementation is incapable of meeting that requirement, a fallback implementation must be chosen. If a fallback cannot be provided, isConfigSupported(...) should indicate non-support and configure(...) should fail.

Closing this one, as I don't see any other work tracked in this issue. Please reopen if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior).
Projects
None yet
Development

No branches or pull requests

6 participants