Skip to content

Feature/trim #50

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jan 7, 2020
Merged

Feature/trim #50

merged 16 commits into from
Jan 7, 2020

Conversation

mudar
Copy link
Collaborator

@mudar mudar commented Dec 20, 2019

Hi Mattia,
Thanks for all the work you've done on this library 👍
After some struggle with the media APIs, I think I have a TrimDataSource solution that works fine, 80% of the time 😅 sometimes audio is offsync, but I am not sure if there's a bug the way I'm trying to handle the feature.
I've based the solution on your answer on issue #37. Please let me know if you would like any changes to fit better into the library.
thanks,
mudar

- New component TrimDataSource, wrapping DataSource to be trimmed.
- MediaExtractorDataSource is an abstract class to limit visibility of
MediaExtractor to package
- Updates to Engine to replace
selectAudio/transcode/selectVideo/transcode sequence by
selectAudio/selectVideo/transcode/transcode
Using 2 editText fields, default value is zero.
Builder can add directly trim values for UriDataSource
Copy link
Member

@natario1 natario1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M concern is that this does not seem to work for audio-only files? I think the implementation could be simpler if you don't differentiate between video and audio, and work for both.

Thank you very much for working on this!

- fixed case where video track is absent
- throw exceptions for invalid trim values
In the original sequence
selectAudio / transcodeAudio / selectVideo / transcodeVideo
the first step (selectAudio) is intercepted by selectVideo + seekVideo
and the 3rd step (selectVideo) is skipped. So it becomes
selectVideo / seekVideo / selectAudio / transcodeAudio / transcodeVideo
- Also added throws IllegalArgumentException to TrimDataSource
@natario1
Copy link
Member

@mudar I'll answer here:

My solution was to intercept audio selection to select+seek video. If you're suggesting I delay audio selection until video selection+seek then I'm not sure how to do that.

In the end I guess what I am saying is that the solution should not rely on the presence of a video track, as a file might not have one at all. If this is fixed now, I'm fine!

I think I have a TrimDataSource solution that works fine, 80% of the time

Is this 100%-ish now? I couldn't test myself yet.

@natario1
Copy link
Member

natario1 commented Dec 22, 2019

If we're not 100% yet my suggestion would be to experiment more with selecting normally and seeking twice during canRead.

I don't think MediaExtractor is bad at seeking audio, rather it's bad at seeking. Often what happens is that the extractor frames are out of order:

  1. audio pts=0
  2. video pts=0
  3. video pts=100us
  4. video pts=200us
  5. audio pts=50us
  6. audio pts=100us
  7. audio pts=150us
  8. audio pts=200us
  9. video pts=300us

When you select both and call seekTo(200us), the extractor is not very smart and can stop at 4. Which is incorrect, because as you keep going you will see audio frames that we should have dropped. One way to overcome this is to skip twice, once per track: this way, the extractor stops at 8. You might lose a few frames but it's simple and clean. I think that SEEK_TO_CLOSEST might work better...

Another way is maybe not skip at all and keep calling advance() until the timestamp is right, but it should be less efficient for large skips. Note also that you're not doing it right at the moment - you can't compare the extractor.getSampleTime() with our trimStartUs: the extractor times are not guaranteed to start at 0. They can start at a random number like 12083192379. So in this case you should keep track of the first presentation time from extractor. Let me know if this is not clear, and thanks for looking into this!

@mudar
Copy link
Collaborator Author

mudar commented Dec 23, 2019

The current solution does not rely on the presence of a video track, it does support audio-only files.

My tests are 💯 when trimming a single file, it's the concatenation of multiple files that has some issues, mainly off-sync audio. This could be related to the wrong comparison with extractor.getSampleTime().

the extractor times are not guaranteed to start at 0.

I was not aware of that! Will look into this and come back with questions if I don't manage.

my suggestion would be to experiment more with selecting normally and seeking twice during canRead.

I'll try that too. My understanding is that seeking twice must be done before any reading. If that's not the case, then I think I'll need some more explanations!

Cleaner simplified code :)
The extractor needs a second call to seekTo() after reaching a video
keyframe, to obtain better values for audio track. Otherwise, too many
audio frames can be lost, causing visible off-sync.
@mudar
Copy link
Collaborator Author

mudar commented Dec 23, 2019

I think things are starting to look good 😅 I moved seekTo() to the first call of canReadTrack(). We do a single seek when the file has a single track. If the file has both audio+video, canReadTrack() returns false until both tracks have seeked.

There are 2 sequences, depending on which track started first:

  • canRead video / seekVideo / returns false / canRead audio / seek audio / return true
  • canRead audio / seek audio / return false / canRead video / seekVideo / seekTo(sampleTime) / return true

In the second sequence, we have 3 calls to seekTo(). The hacky one is the third one extractor.seekTo(extractor.getSampleTime()) which fixes things for the audio track. I find that weird 🤔 but it does work.

Anyways... one last thing: I called the method hasTrack() to keep the style of TrackTypeMap.has(trackType). I can invert it and call it isTrackAbsent() if you prefer.

Copy link
Member

@natario1 natario1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mudar ! I like it, looking elegant. I think we should fix a couple of things before merging though.

Comment on lines 25 to 26
private boolean isAudioTrackReady;
private boolean isVideoTrackReady;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps (not sure), you could use a TrackTypeMap here instead of these two. Just in case you haven't noticed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍

Comment on lines 82 to 84
@Override
public boolean canReadTrack(@NonNull TrackType type) {
if (source.canReadTrack(type)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this method you call source.canReadTrack(type) and if true apply our logic. Can we do the opposite?

The upstream source might be on AUDIO position, but if you seek later, the next position might VIDEO instead. So it would be better to do our stuff first (which might seek) and then return source.canReadTrack(type).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍

Comment on lines 73 to 75
private boolean hasTrack(@NonNull TrackType type) {
return source.getTrackFormat(type) != null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed as is. We don't care if the track exist but rather if the track is selected, and this can only be checked in canRead/read, not during the constructor. (if it doesn't exist, it will never be selected, so you can simply check selection)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... I think this the only one that remains open 😅 multiple part answer!

We don't care if the track exist but rather if the track is selected

I agree 💯 The error in my code is that it assumes that all existing tracks will be selected. I now see that's not the case!
I might need to keep a list of tracks selected in selectTrack() to set non-selected tracks to ready at the beginning of canReadTrack()

and this can only be checked in canRead/read

That's the issue I was trying to resolve in Engine, now that I see my error, I think we need to do some changes in the following block

if (!audioCompleted) {
  stepped |= getCurrentTrackTranscoder(TrackType.AUDIO, options).transcode(forceAudioEos);
}
if (!videoCompleted) {
  stepped |= getCurrentTrackTranscoder(TrackType.VIDEO, options).transcode(forceVideoEos);
}

What this does is that getCurrentTrackTranscoder(AUDIO) calls selectTrack(audio) then transcode() will call canRead(audio). Then we'll seek and read audio, before calling getCurrentTrackTranscoder(VIDEO). At this point, we can still select the video track, but it's too late to call seekTo() because that will rewind the audio track.

I'm sorry if this is redundant, but I'm not sure if my explanation was clear the first time. Or maybe I'm just completely lost here 🤔

thanks again for your patience ✌️

Comment on lines 92 to 93
extractor.seekTo(trimStartUs, MediaExtractor.SEEK_TO_CLOSEST_SYNC);
updateTrimValues(extractor.getSampleTime());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion - instead of calling extractor.seekTo() and then update trimStartUs... You can just leave trimStartUs as is (even make it final) and call extractor.seekTo(extractor.getSampleTime() + trimStartUs) anytime you must seek.

This second option would be better because we could get rid of the MediaExtractorDataSource interface! The default data source could simply implement seekTo(long) by doing what I suggested above (seeking to first sample time + desired value). So that seekTo(0) seeks to start. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Let me know if you still prefer to get rid of updateTrimValues(). I have removed MediaExtractorDataSource, related to comment below https://github.com/natario1/Transcoder/pull/50#discussion_r361033517

Comment on lines 129 to 132
@Override
public boolean isDrained() {
return source.isDrained();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested trimEnd values? I think it should be implemented here, returning true if getReadUs() >= getDurationUs(). So that reading stops.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍 TrimEnd was auto-magically ok since the beginning, I thought that was because of

public long getDurationUs() {
    return trimDurationUs;
}

but adding the condition to isDrained() does seem the right thing to do 🙃 I was also able to stop manually setting KEY_DURATION for the mediaFormat

/**
* DataSource that allows access to its MediaExtractor.
*/
abstract class MediaExtractorDataSource implements DataSource {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we get rid of this (see comments below) but if we don't, this should be made public because it's in the TrimDataSource constructor.

Copy link
Collaborator Author

@mudar mudar Dec 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, 👍 I added

long seekTo(long timestampUs);

to the DataSource interface. The method also returns the new timestamp. This solves the problem where TrimDataSource could not call getSampleTime() anymore since we're removing its access to the extactor.

Signature is a bit similar to https://github.com/bcgit/bc-java/blob/master/core/src/main/java/org/bouncycastle/crypto/SkippingCipher.java#L23

Replaced by adding seekTo() to DataSource interface
The rest of the lib already assumes that timestamps start from arbitrary
values.
Stop reading when readUs is past duration. This removes the need to
manually define KEY_DURATION in the mediaFormat
to avoid possible bug where seekTo lands on a different track. Ex: The
upstream source might be on AUDIO position, but if you seek later, the
next position might VIDEO instead.
@natario1
Copy link
Member

natario1 commented Jan 4, 2020

@mudar I'm sorry, I completely forgot about this!

Let's merge this soon. I understand now what you are saying about the fact the before the second track is selected, the first one has already been through a canRead() and read(). This is an issue indeed. And probably one of the reason why seeking twice gives better results.

Now that we have fixed many details in the source implementation, I wonder if you original Engine idea would let us seek only once and simplify everything. I mean this change:

natario1@fb9a53d#diff-4bd91bf352680b43ed4066c95874c6b2L367-R373

Maybe it wasn't working due to other things that we have solved. In theory if we apply that change, the canRead() could be as simple as:

  @Override
   public boolean canReadTrack(@NonNull TrackType type) {
       if (!didSeekTracks) {
           source.seekTo(trimStartUs);
           didSeekTracks = true;
       }
       return source.canReadTrack(type);
   }

What do you think about this?

Note that, since the first extractor timestamp is not always 0, there are 2 time scales here: one goes from 0...video duration, the other goes from first extractor timestamp...last extractor timestamp. So the current seekTo implementation in DefaultDataSource is wrong because it takes a time from the first scale and gives it to extractor which thinks in terms of the second scale.

To solve this we could simply change seekTo with seekBy, so we can pass a relative duration:

    @Override
    public void seekBy(long durationUs) {
        ensureExtractor();
        mExtractor.seekTo(mExtractor.getSampleTime() + durationUs, MediaExtractor.SEEK_TO_CLOSEST_SYNC);
    }

This method goes forward by durationUs. Not as powerful as seekTo(), but it is exactly what we need here so it would be great to simplify this way. If we ever need to seek back, we can always pass negative durations - this might be useful in the future.

- Updates to Engine to replace
selectAudio/transcode/selectVideo/transcode sequence by
selectAudio/selectVideo/transcode/transcode
- remove unnecessary hasTrack()
- seekTo() is applied in canReadTrack(), once per selected track. This
now works because all track selection operations are done before the
first call to canReadTrack().
- When 2 tracks are selected, seekTo() is called twice and this helps
the extractor with Audio sampleTime issues.
@mudar
Copy link
Collaborator Author

mudar commented Jan 6, 2020

Hi @natario1,
As you suggested, I've restored the changes to the Engine class. This now works because all track selection operations are done before the first call to canReadTrack(). I've tried your code example, calling seekTo() once only. But sadly that doesn't work.
Still, I find the current solution as clear/simple as it gets: one call to seekTo() for each track selected. Audio does need that second call 😓 And this allows seek to work correctly if existing tracks were not selected (which was the missing feature in my code).
I will look into the seekBy() vs seekTo() tomorrow, but I would prefer to keep a seekTo() logic.

- reverted unnecessary changes to Engine class. Previous changes cannot
guarantee that all calls to selectTracks() are done before the first
canRead(). Latest bug was with merging multiple trimmed files.
- use seekBy() to better handle first extractor timestamp
- DefaultDataSource makes sure all available tracks are selected by
extractor (without adding to mSelectedTracks array). Then seekTo() is
called mutlitple times, using the resulting sampletimeUs for the later
calls.
@mudar
Copy link
Collaborator Author

mudar commented Jan 7, 2020

Well, another commit, hopefully the last one!
I noticed a bug in yesterday's version: when concatenating multiple trimmed files, the 2nd or 3rd file can select then read a track before selecting the other track. Probably related to the drain state and transcode() returning false 🤔 Today's changes:

  • work ok 💯
  • allow the non-selection of existing tracks
  • use seekBy(duration) instead of seekTo()
  • avoid changes to Engine
  • TrimDataSource is simple and clean.
  • The only bad news is that DefaultDataSource.seekBy() does not look that good 😓 Two required operations there: selecting all existing tracks before the first seekTo(), multiple calls to seekTo() where the 2nd one is seekTo(extractor.getSampleTime()) to fix the audio issues.

Any suggestions? I was not at ease messing with the drain state..

@natario1
Copy link
Member

natario1 commented Jan 7, 2020

Thanks for doing it! MediaExtractor is a pain, no wonder Google itself dropped it (thinking of ExoPlayer). If this works now I would merge it as is and maybe try to improve it later. If I do I'll ping you.

Can you just sync with master though? @mudar

@mudar
Copy link
Collaborator Author

mudar commented Jan 7, 2020

done 👍

Copy link
Member

@natario1 natario1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I will try to release a new version soon, but if you need this now please fetch the library from JitPack.

@natario1 natario1 merged commit 63e6e30 into deepmedia:master Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants