Skip to content
This repository was archived by the owner on Oct 15, 2025. It is now read-only.

Conversation

@xfim
Copy link
Contributor

@xfim xfim commented Mar 26, 2021

I believe it can improve the capacity to recognize patterns, from my rudimentary tests. But I am curious about what you think.

Another, more complex option, would be to do it to each of the small chunks of audio, one by one. In that case, I have not been able to find a suitable function on pyAudioAnalysis. But I am sure there must be one way to do it.

Thank you very much for the tool.

@abhirooptalasila
Copy link
Owner

Hi
Sox has a default norm value of -3.0. I think the value you've provided in the CLI argument works better for your use case.
How big of a difference is visible in your generated subtitles?

@xfim
Copy link
Contributor Author

xfim commented Mar 29, 2021

Honestly, I have tried now and explicitly using -3.0 gives slightly (very slightly) better values than my proposal of -0.1. But I have to admit that I have used it / adapted it not in the context of a movie with arranged audio, but using more domestic / low quality videos with poorer audios.

Comparing using -0.1 and nothing gives better results to the default (nothing), but not so well as -3.0.

So I leave it to you what to do with it. I am far a way from being an expert.

Add normalization
@abhirooptalasila abhirooptalasila merged commit 4942c5f into abhirooptalasila:master Mar 29, 2021
@abhirooptalasila
Copy link
Owner

I think -3.0 works best for a variety of use-cases. Thank you!

@xfim
Copy link
Contributor Author

xfim commented Mar 30, 2021

Thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants