- Replicate results for Text Summarization task on Gigaword (see 'About')
- Getting started with Text Summarization using
OpenNMT(src) - Getting started with ROUGE scoring using
files2rouge(src)
- Reference: http://opennmt.net//Models/#english-summarization
- Dataset: https://github.com/harvardnlp/sent-summary
- Expected results:
- R1: 33.13
- R2: 16.09
- RL: 31.00
- OpenNMT v0.2.0. (precisely using commit from the 4th of Jan., 2017, 561994adcd147f9f77cc744a041152c3182a9300)
- file2rouge commit: 5397befa8397017964d21aa61a4e399dedd5c340
git clone https://github.com/OpenNMT/OpenNMT.git opennmt
git clone --recursive https://github.com/pltrdy/files2rouge.git files2rougeDownload data from here and extract (tar -xzf summary.tar.gz) to ./data.
We assume that your file system is like:
./
opennmt/
data/
file2rouge/
Following the guide
# First, move to OpenNMT dir
cd opennmt1) Preprocess
th preprocess.lua -train_src ../data/train/train.article.txt -train_tgt ../data/train/train.title.txt -valid_src ../data/train/valid.article.filter.txt -valid_tgt ../data/train/valid.title.filter.txt -save_data ../data/train/textsum2) Train
th train.lua -data ./textsum_train/textsum-train.t7 -save_model textsumor using GPU:
th train.lua -data ./textsum_train/textsum_model-train.t7 -save_model textsum -gpuid 13) Generate summary
th translate.lua -model textsum_final.t7 -src ../data/Giga/inputs.txt(add -gpuid 1 if you trained the model using GPU)
The output will be in pred.txt
cd ../files2rouge
./files2rouge --ref ../data/Giga/task1_ref0.txt --summ ../opennmt/pred.txt| ROUGE-1 | ROUGE-2 | ROUGE-L |
|---|---|---|
| 34.2 | 16.2 | 31.9 |