This project is a video processing bot that extracts audio from video files, performs speech recognition, generates subtitles, and allows for text correction and dialect conversion. It utilizes various libraries for audio processing, speech recognition, and translation.
- Audio Extraction: Extracts audio from video files.
- Speech Recognition: Converts audio to text using VAD (Voice Activity Detection).
- Subtitle Generation: Creates SRT files for subtitles.
- Text Correction: Uses OpenAI's API to correct transcription errors.
- Dialect Conversion: Converts subtitles to specified dialects.
- Web Interface: A Flask-based web interface for user interaction.
- Python 3.x
- Required libraries:
flaskpyrogramlibrosasoundfilenumpydeep_translatortransformersopenainoisereducepyloudnormavwerkzeug
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Install the required dependencies.
-
Set up your OpenAI API key in the
make_it_correct.pyfile. -
Create necessary directories for input and output files:
mkdir -p data/input_videos data/audio_outputs data/text_outputs data/subtitles data/chunks
To run the bot, execute the following command:
python bot.pyTo access the web interface, run:
python app.pyThen navigate to http://localhost:5000 in your web browser.
- /start: Start the bot and receive instructions.
- Upload a video: Send a video file to the bot for processing.
- Upload a video file.
- Select the original language of the video.
- Choose the target language for subtitles.
- Decide if you want to enhance audio quality.
- Choose whether to correct the subtitles.
- Select the desired dialect for the subtitles.
Contributions are welcome! Please feel free to submit a pull request or open an issue for any suggestions or improvements.
This project is licensed under the MIT License. See the LICENSE file for details.
- OpenAI for providing the API for text correction.
- Hugging Face for the speech recognition models.
- Deep Translator for translation capabilities.