Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1972,5 +1972,116 @@ <h2>MediaStream in workers</h2>
};</pre>
</div>
</section>
<section>
<h2>Background segmentation mask</h2>
<p>Some platforms or User Agents may provide built-in support for background segmentation of video frames, in particular for camera video streams.
Web applications may want to control whether background segmentation is computed at the source level and to get access to the computed segmentation masks.
This allows the web application for instance
to do custom framing or background blurring or replacement
while leveraging on platform computed background segmentation.
This allows the web application
to access the original unmodified frame and
to fine tune frame modifications based on its likings.
For that reason, we extend {{MediaStreamTrack}} with the following properties and {{VideoFrameMetadata}} with the following attributes.
</p>
<pre class="idl">
partial dictionary MediaTrackSupportedConstraints {
boolean backgroundSegmentationMask = true;
};

partial dictionary MediaTrackConstraintSet {
ConstrainBoolean backgroundSegmentationMask;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it ever be interesting and feasible to tweak the parameters by which segmentation is done?

Copy link

@riju riju May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atleast on Windows, the platform model does not allow tweaking segmentation parameters today. Using tensorflow.js with BodyPix model for Blur, I see there's atleast a segmentationThreshold parameter. Maybe it's the same as foregroundThresholdProbability with the MediaPipeSelfieSegmentation model ?

Did you have some other parameters in mind ?

mediapipe_parameters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have some other parameters in mind?

I am not knowledgeable enough on what parameters would be best to include. I was mostly wondering if this is something we foresee extending from a boolean to a set of parameters, and if so, whether there was a viable path for such future extensions given the current API shape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Media Capture API, the parameter space is flat and not hierarchical.

As an example, there is a constrainable property called whiteBalanceMode which can be constrained to manual. If one then wants to manually change the white balance, there is a constrainable property called colorTemperature which can be constrained separately in order to do that.

So if we later would like to add a numeric constrainable property called backgroundSegmentationThreshold (which could change the segmentation mask to be pre-processed to an blank and white mask according to the threshold without shades of grey) or a string constrainable property called backgroundSegmentationModel (to use the particular AI model), we could certainly do that.

};

partial dictionary MediaTrackSettings {
boolean backgroundSegmentationMask;
};

partial dictionary MediaTrackCapabilities {
sequence&lt;boolean&gt; backgroundSegmentationMask;
};</pre>
<section>
<h3>{{VideoFrameMetadata}}</h3>
<pre class="idl">
partial dictionary VideoFrameMetadata {
ImageBitmap backgroundSegmentationMask;
};</pre>
<section class="notoc">
<h4>Members</h4>
<dl class="dictionary-members" data-link-for="VideoFrameMetadata" data-dfn-for="VideoFrameMetadata">
<dt><dfn><code>backgroundSegmentationMask</code></dfn> of type <code>{{ImageBitmap}}</code></dt>
<dd>
<p>A background segmentation mask with
white denoting certainly foreground,
black denoting certainly background and
grey denoting uncertainty or ambiguity with
light shades of grey denoting likely foreground and
dark shades of grey denoting likely background.
Absence might indicate
that the frame is not from a camera, or
that the user agent does not support or
was not able to do background segmentation.
</p>
</dd>
</dl>
</section>
</section>
<section>
<h3>Example</h3>
<pre class="example">
// main.js:
// Open camera.
const stream = await navigator.mediaDevices.getUserMedia({video: true});
const [track] = stream.getVideoTracks();
// Do video processing in a worker.
const worker = new Worker('worker.js');
worker.postMessage({track}, [track]);
const {data} = await new Promise(result => worker.onmessage = result);
const videoElement = document.querySelector('video');
videoElement.srcObject = new MediaStream([data.track]);

// worker.js:
onmessage = async ({data: {track}}) => {
// Try to enable background segmentation mask.
const capabilities = track.getCapabilities();
if (capabilities.backgroundSegmentationMask?.includes(true)) {
await track.applyConstraints({backgroundSegmentationMask: {exact: true}});
} else {
// Background segmentation mask is not supported by the platform or
// by the camera. Consider falling back to some other method.
}
const trackGenerator = new VideoTrackGenerator();
self.postMessage({track: trackGenerator.track}, [trackGenerator.track]);
const {readable} = new MediaStreamTrackProcessor({track});

const canvas = new OffscreenCanvas(640, 480);
const context = canvas.getContext('2d', {desynchronized: true});

const transformer = new TransformStream({
async transform(frame, controller) {
const {backgroundSegmentationMask} = frame.metadata();
if (backgroundSegmentationMask) {
// Draw the video frame.
context.globalCompositeOperation = 'copy';
context.drawImage(frame, 0, 0);
// Draw (or multiply with) the mask.
// The result is the foreground on black.
context.globalCompositeOperation = 'multiply';
context.drawImage(backgroundSegmentationMask, 0, 0);
} else {
// Everything is background. Fill with black.
context.globalCompositeOperation = 'copy';
context.fillStyle = 'black';
context.fillRect(0, 0, canvas.width, canvas.height);
}
controller.enqueue(new VideoFrame(canvas, {timestamp: frame.timestamp}));
frame.close();
}
});
await readable.pipeThrough(transformer).pipeTo(trackGenerator.writable);
};
</pre>
</section>
</section>
</body>
</html>