Feature/trim #50

mudar · 2019-12-20T21:00:11Z

Hi Mattia,
Thanks for all the work you've done on this library 👍
After some struggle with the media APIs, I think I have a TrimDataSource solution that works fine, 80% of the time 😅 sometimes audio is offsync, but I am not sure if there's a bug the way I'm trying to handle the feature.
I've based the solution on your answer on issue #37. Please let me know if you would like any changes to fit better into the library.
thanks,
mudar

- New component TrimDataSource, wrapping DataSource to be trimmed. - MediaExtractorDataSource is an abstract class to limit visibility of MediaExtractor to package - Updates to Engine to replace selectAudio/transcode/selectVideo/transcode sequence by selectAudio/selectVideo/transcode/transcode

Using 2 editText fields, default value is zero.

Builder can add directly trim values for UriDataSource

natario1

M concern is that this does not seem to work for audio-only files? I think the implementation could be simpler if you don't differentiate between video and audio, and work for both.

Thank you very much for working on this!

lib/src/main/java/com/otaliastudios/transcoder/engine/Engine.java

lib/src/main/java/com/otaliastudios/transcoder/source/MediaExtractorDataSource.java

lib/src/main/java/com/otaliastudios/transcoder/source/TrimDataSource.java

- fixed case where video track is absent - throw exceptions for invalid trim values

In the original sequence selectAudio / transcodeAudio / selectVideo / transcodeVideo the first step (selectAudio) is intercepted by selectVideo + seekVideo and the 3rd step (selectVideo) is skipped. So it becomes selectVideo / seekVideo / selectAudio / transcodeAudio / transcodeVideo - Also added throws IllegalArgumentException to TrimDataSource

natario1 · 2019-12-22T19:50:13Z

@mudar I'll answer here:

My solution was to intercept audio selection to select+seek video. If you're suggesting I delay audio selection until video selection+seek then I'm not sure how to do that.

In the end I guess what I am saying is that the solution should not rely on the presence of a video track, as a file might not have one at all. If this is fixed now, I'm fine!

I think I have a TrimDataSource solution that works fine, 80% of the time

Is this 100%-ish now? I couldn't test myself yet.

natario1 · 2019-12-22T20:04:40Z

If we're not 100% yet my suggestion would be to experiment more with selecting normally and seeking twice during canRead.

I don't think MediaExtractor is bad at seeking audio, rather it's bad at seeking. Often what happens is that the extractor frames are out of order:

audio pts=0
video pts=0
video pts=100us
video pts=200us
audio pts=50us
audio pts=100us
audio pts=150us
audio pts=200us
video pts=300us

When you select both and call seekTo(200us), the extractor is not very smart and can stop at 4. Which is incorrect, because as you keep going you will see audio frames that we should have dropped. One way to overcome this is to skip twice, once per track: this way, the extractor stops at 8. You might lose a few frames but it's simple and clean. I think that SEEK_TO_CLOSEST might work better...

Another way is maybe not skip at all and keep calling advance() until the timestamp is right, but it should be less efficient for large skips. Note also that you're not doing it right at the moment - you can't compare the extractor.getSampleTime() with our trimStartUs: the extractor times are not guaranteed to start at 0. They can start at a random number like 12083192379. So in this case you should keep track of the first presentation time from extractor. Let me know if this is not clear, and thanks for looking into this!

mudar · 2019-12-23T14:32:00Z

The current solution does not rely on the presence of a video track, it does support audio-only files.

My tests are 💯 when trimming a single file, it's the concatenation of multiple files that has some issues, mainly off-sync audio. This could be related to the wrong comparison with extractor.getSampleTime().

the extractor times are not guaranteed to start at 0.

I was not aware of that! Will look into this and come back with questions if I don't manage.

my suggestion would be to experiment more with selecting normally and seeking twice during canRead.

I'll try that too. My understanding is that seeking twice must be done before any reading. If that's not the case, then I think I'll need some more explanations!

Cleaner simplified code :) The extractor needs a second call to seekTo() after reaching a video keyframe, to obtain better values for audio track. Otherwise, too many audio frames can be lost, causing visible off-sync.

mudar · 2019-12-23T18:52:35Z

I think things are starting to look good 😅 I moved seekTo() to the first call of canReadTrack(). We do a single seek when the file has a single track. If the file has both audio+video, canReadTrack() returns false until both tracks have seeked.

There are 2 sequences, depending on which track started first:

canRead video / seekVideo / returns false / canRead audio / seek audio / return true
canRead audio / seek audio / return false / canRead video / seekVideo / seekTo(sampleTime) / return true

In the second sequence, we have 3 calls to seekTo(). The hacky one is the third one extractor.seekTo(extractor.getSampleTime()) which fixes things for the audio track. I find that weird 🤔 but it does work.

Anyways... one last thing: I called the method hasTrack() to keep the style of TrackTypeMap.has(trackType). I can invert it and call it isTrackAbsent() if you prefer.

natario1

Thanks @mudar ! I like it, looking elegant. I think we should fix a couple of things before merging though.

natario1 · 2019-12-23T21:58:06Z

lib/src/main/java/com/otaliastudios/transcoder/source/TrimDataSource.java

+    private boolean isAudioTrackReady;
+    private boolean isVideoTrackReady;


If it helps (not sure), you could use a TrackTypeMap here instead of these two. Just in case you haven't noticed

lib/src/main/java/com/otaliastudios/transcoder/source/TrimDataSource.java

natario1 · 2019-12-23T22:03:52Z

lib/src/main/java/com/otaliastudios/transcoder/source/TrimDataSource.java

+    @Override
+    public boolean canReadTrack(@NonNull TrackType type) {
+        if (source.canReadTrack(type)) {


In this method you call source.canReadTrack(type) and if true apply our logic. Can we do the opposite?

The upstream source might be on AUDIO position, but if you seek later, the next position might VIDEO instead. So it would be better to do our stuff first (which might seek) and then return source.canReadTrack(type).

natario1 · 2019-12-23T22:05:01Z

lib/src/main/java/com/otaliastudios/transcoder/source/TrimDataSource.java

+    private boolean hasTrack(@NonNull TrackType type) {
+        return source.getTrackFormat(type) != null;
+    }


I don't think this is needed as is. We don't care if the track exist but rather if the track is selected, and this can only be checked in canRead/read, not during the constructor. (if it doesn't exist, it will never be selected, so you can simply check selection)

Ok... I think this the only one that remains open 😅 multiple part answer!

We don't care if the track exist but rather if the track is selected

I agree 💯 The error in my code is that it assumes that all existing tracks will be selected. I now see that's not the case!
I might need to keep a list of tracks selected in selectTrack() to set non-selected tracks to ready at the beginning of canReadTrack()

and this can only be checked in canRead/read

That's the issue I was trying to resolve in Engine, now that I see my error, I think we need to do some changes in the following block

if (!audioCompleted) { stepped |= getCurrentTrackTranscoder(TrackType.AUDIO, options).transcode(forceAudioEos); } if (!videoCompleted) { stepped |= getCurrentTrackTranscoder(TrackType.VIDEO, options).transcode(forceVideoEos); }

What this does is that getCurrentTrackTranscoder(AUDIO) calls selectTrack(audio) then transcode() will call canRead(audio). Then we'll seek and read audio, before calling getCurrentTrackTranscoder(VIDEO). At this point, we can still select the video track, but it's too late to call seekTo() because that will rewind the audio track.

I'm sorry if this is redundant, but I'm not sure if my explanation was clear the first time. Or maybe I'm just completely lost here 🤔

thanks again for your patience ✌️

natario1 · 2019-12-23T22:09:57Z

lib/src/main/java/com/otaliastudios/transcoder/source/TrimDataSource.java

+                        extractor.seekTo(trimStartUs, MediaExtractor.SEEK_TO_CLOSEST_SYNC);
+                        updateTrimValues(extractor.getSampleTime());


Suggestion - instead of calling extractor.seekTo() and then update trimStartUs... You can just leave trimStartUs as is (even make it final) and call extractor.seekTo(extractor.getSampleTime() + trimStartUs) anytime you must seek.

This second option would be better because we could get rid of the MediaExtractorDataSource interface! The default data source could simply implement seekTo(long) by doing what I suggested above (seeking to first sample time + desired value). So that seekTo(0) seeks to start. What do you think?

🤔 Let me know if you still prefer to get rid of updateTrimValues(). I have removed MediaExtractorDataSource, related to comment below https://github.com/natario1/Transcoder/pull/50#discussion_r361033517

natario1 · 2019-12-23T22:12:08Z

lib/src/main/java/com/otaliastudios/transcoder/source/TrimDataSource.java

+    @Override
+    public boolean isDrained() {
+        return source.isDrained();
+    }


Have you tested trimEnd values? I think it should be implemented here, returning true if getReadUs() >= getDurationUs(). So that reading stops.

done 👍 TrimEnd was auto-magically ok since the beginning, I thought that was because of

public long getDurationUs() { return trimDurationUs; }

but adding the condition to isDrained() does seem the right thing to do 🙃 I was also able to stop manually setting KEY_DURATION for the mediaFormat

natario1 · 2019-12-23T22:17:01Z

lib/src/main/java/com/otaliastudios/transcoder/source/MediaExtractorDataSource.java

+/**
+ * DataSource that allows access to its MediaExtractor.
+ */
+abstract class MediaExtractorDataSource implements DataSource {


I hope we get rid of this (see comments below) but if we don't, this should be made public because it's in the TrimDataSource constructor.

Done, 👍 I added

long seekTo(long timestampUs);

to the DataSource interface. The method also returns the new timestamp. This solves the problem where TrimDataSource could not call getSampleTime() anymore since we're removing its access to the extactor.

Signature is a bit similar to https://github.com/bcgit/bc-java/blob/master/core/src/main/java/org/bouncycastle/crypto/SkippingCipher.java#L23

Replaced by adding seekTo() to DataSource interface

The rest of the lib already assumes that timestamps start from arbitrary values.

replacing two booleans

Stop reading when readUs is past duration. This removes the need to manually define KEY_DURATION in the mediaFormat

to avoid possible bug where seekTo lands on a different track. Ex: The upstream source might be on AUDIO position, but if you seek later, the next position might VIDEO instead.

…re/trim

natario1 · 2020-01-04T12:15:19Z

@mudar I'm sorry, I completely forgot about this!

Let's merge this soon. I understand now what you are saying about the fact the before the second track is selected, the first one has already been through a canRead() and read(). This is an issue indeed. And probably one of the reason why seeking twice gives better results.

Now that we have fixed many details in the source implementation, I wonder if you original Engine idea would let us seek only once and simplify everything. I mean this change:

natario1@fb9a53d#diff-4bd91bf352680b43ed4066c95874c6b2L367-R373

Maybe it wasn't working due to other things that we have solved. In theory if we apply that change, the canRead() could be as simple as:

  @Override
   public boolean canReadTrack(@NonNull TrackType type) {
       if (!didSeekTracks) {
           source.seekTo(trimStartUs);
           didSeekTracks = true;
       }
       return source.canReadTrack(type);
   }

What do you think about this?

Note that, since the first extractor timestamp is not always 0, there are 2 time scales here: one goes from 0...video duration, the other goes from first extractor timestamp...last extractor timestamp. So the current seekTo implementation in DefaultDataSource is wrong because it takes a time from the first scale and gives it to extractor which thinks in terms of the second scale.

To solve this we could simply change seekTo with seekBy, so we can pass a relative duration:

    @Override
    public void seekBy(long durationUs) {
        ensureExtractor();
        mExtractor.seekTo(mExtractor.getSampleTime() + durationUs, MediaExtractor.SEEK_TO_CLOSEST_SYNC);
    }

This method goes forward by durationUs. Not as powerful as seekTo(), but it is exactly what we need here so it would be great to simplify this way. If we ever need to seek back, we can always pass negative durations - this might be useful in the future.

- Updates to Engine to replace selectAudio/transcode/selectVideo/transcode sequence by selectAudio/selectVideo/transcode/transcode - remove unnecessary hasTrack() - seekTo() is applied in canReadTrack(), once per selected track. This now works because all track selection operations are done before the first call to canReadTrack(). - When 2 tracks are selected, seekTo() is called twice and this helps the extractor with Audio sampleTime issues.

mudar · 2020-01-06T23:40:03Z

Hi @natario1,
As you suggested, I've restored the changes to the Engine class. This now works because all track selection operations are done before the first call to canReadTrack(). I've tried your code example, calling seekTo() once only. But sadly that doesn't work.
Still, I find the current solution as clear/simple as it gets: one call to seekTo() for each track selected. Audio does need that second call 😓 And this allows seek to work correctly if existing tracks were not selected (which was the missing feature in my code).
I will look into the seekBy() vs seekTo() tomorrow, but I would prefer to keep a seekTo() logic.

- reverted unnecessary changes to Engine class. Previous changes cannot guarantee that all calls to selectTracks() are done before the first canRead(). Latest bug was with merging multiple trimmed files. - use seekBy() to better handle first extractor timestamp - DefaultDataSource makes sure all available tracks are selected by extractor (without adding to mSelectedTracks array). Then seekTo() is called mutlitple times, using the resulting sampletimeUs for the later calls.

mudar · 2020-01-07T18:58:29Z

Well, another commit, hopefully the last one!
I noticed a bug in yesterday's version: when concatenating multiple trimmed files, the 2nd or 3rd file can select then read a track before selecting the other track. Probably related to the drain state and transcode() returning false 🤔 Today's changes:

work ok 💯
allow the non-selection of existing tracks
use seekBy(duration) instead of seekTo()
avoid changes to Engine
TrimDataSource is simple and clean.
The only bad news is that DefaultDataSource.seekBy() does not look that good 😓 Two required operations there: selecting all existing tracks before the first seekTo(), multiple calls to seekTo() where the 2nd one is seekTo(extractor.getSampleTime()) to fix the audio issues.

Any suggestions? I was not at ease messing with the drain state..

natario1 · 2020-01-07T19:17:15Z

Thanks for doing it! MediaExtractor is a pain, no wonder Google itself dropped it (thinking of ExoPlayer). If this works now I would merge it as is and maybe try to improve it later. If I do I'll ping you.

Can you just sync with master though? @mudar

mudar · 2020-01-07T19:54:29Z

done 👍

natario1

Thank you! I will try to release a new version soon, but if you need this now please fetch the library from JitPack.

mudar added 3 commits December 20, 2019 15:37

Added support for TrimDataSource into demo app

4c1a3af

Using 2 editText fields, default value is zero.

TranscoderOptions Builder support for TrimDataSource

6238570

Builder can add directly trim values for UriDataSource

natario1 reviewed Dec 21, 2019

View reviewed changes

mudar added 2 commits December 21, 2019 11:30

TrimDataSource updates following PR code review

8d8ea93

- fixed case where video track is absent - throw exceptions for invalid trim values

Moved seekTo() from selectTrack() to canReadTrack()

7ddc093

Cleaner simplified code :) The extractor needs a second call to seekTo() after reaching a video keyframe, to obtain better values for audio track. Otherwise, too many audio frames can be lost, causing visible off-sync.

Merge branch 'master' into feature/trim

1a70d9e

natario1 requested changes Dec 23, 2019

View reviewed changes

mudar added 6 commits December 23, 2019 17:43

Removed MediaExtractorDataSource

d596091

Replaced by adding seekTo() to DataSource interface

Removed timestamp adjustment

a9c4975

The rest of the lib already assumes that timestamps start from arbitrary values.

Use TrackTypeMap for readyTracks flags

8f45d8c

replacing two booleans

Handle trimEnd in isDrained()

b382f34

Stop reading when readUs is past duration. This removes the need to manually define KEY_DURATION in the mediaFormat

Fix seek vs canRead order

421f8af

to avoid possible bug where seekTo lands on a different track. Ex: The upstream source might be on AUDIO position, but if you seek later, the next position might VIDEO instead.

Merge branch 'feature/trim' of github.com:mudar/Transcoder into featu…

ed06a2a

…re/trim

Merge branch 'master' into feature/trim

5177e6d

natario1 approved these changes Jan 7, 2020

View reviewed changes

natario1 merged commit 63e6e30 into deepmedia:master Jan 7, 2020

		private boolean isAudioTrackReady;
		private boolean isVideoTrackReady;

		extractor.seekTo(trimStartUs, MediaExtractor.SEEK_TO_CLOSEST_SYNC);
		updateTrimValues(extractor.getSampleTime());

Feature/trim #50

Feature/trim #50

Uh oh!

Conversation

mudar commented Dec 20, 2019

Uh oh!

natario1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

natario1 commented Dec 22, 2019

Uh oh!

natario1 commented Dec 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mudar commented Dec 23, 2019

Uh oh!

mudar commented Dec 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

natario1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mudar Dec 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natario1 commented Jan 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mudar commented Jan 6, 2020

Uh oh!

mudar commented Jan 7, 2020

Uh oh!

natario1 commented Jan 7, 2020

Uh oh!

mudar commented Jan 7, 2020

Uh oh!

natario1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

natario1 commented Dec 22, 2019 •

edited

Loading

mudar commented Dec 23, 2019 •

edited

Loading

mudar Dec 24, 2019 •

edited

Loading

natario1 commented Jan 4, 2020 •

edited

Loading