This is an official Github repository for WACV'25 paper. NarrAD: Automatic Generation of Audio Descriptions for Movies with Rich Narrative Context.
[https://ieeexplore.ieee.org/abstract/document/10944028]
Here are the qualitative results of NarrAD.

You can check out several demo videos here: [https://bit.ly/4aSwOTr].
You can check out full outputs on the MAD evaluation set here: [https://drive.google.com/drive/folders/1PIjL6qpZt4D2nxQwwMD7iZ9xmlfjRiuh?usp=drive_link].
- Download the MAD dataset from https://github.com/Soldelli/MAD and place the
mad-v2-ad-named.csvfile todatasetsdirectory, renaming it toMAD_train.csv. - Movie frames are saved in the format
frame_000000.pngin thevideos/{movie}directory. Due to file size constraints, only frames for selected samples are provided. - Movie scripts used for AD creation can be found in the
scriptsdirectory. We have pre-parsed the movie scripts and generatedlines.csv,stage_directions.txt, andscenes.csv. transcribe.csvhas been pre-generated using the Google Cloud API.- Prompts for using GPT can be found in the
promptsdirectory, and the generated results are saved in theresultsdirectory.
ROOT_DIRrefers to the root directory of the project.API_KEYrefers to your openai api key
python src/main.py --rootdir $ROOT_DIR --api_key $API_KEY --task synchronize
python src/main.py --rootdir $ROOT_DIR --api_key $API_KEY --task generate
python src/main.py --rootdir $ROOT_DIR --api_key $API_KEY --task curate
Please cite NarrAD as:
@inproceedings{park2025narrad,
title={NarrAD: Automatic Generation of Audio Descriptions for Movies with Rich Narrative Context},
author={Park, Jaehyeong and Ye, Juncheol and Lee, Seungkook and Ka, Hyun W and Han, Dongsu},
booktitle={2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
pages={409--419},
year={2025},
organization={IEEE}
}
