SemEval-2021-Task-6

The code here is the full code used by WVOQ for all three tasks of SemEval-2021 Task 6.

You can find my article WVOQ at SemEval-2021 Task 6: BART for Span Detection and Classification as part of the Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021).

The poster presented at the ACL/IJCNLP / SemEval2021 is here at doc/bart_for_span_detection_poster.pdf.

Most likely you are interested only in the code for task 2: simultaneously detecting a span and classifying it.

That code is now taken out of the code-base, cleaned up, given installation instructions, and available at https://github.com/ceesroele/span_model.

Improvements:

Simpler to understand
Re-usable for simultaneous span detection and classification tasks
And it is more likely to run on your machine too, not just on mine ...

Introduction

This is the WVOQ team code for participation in SemEval 2021 Task 6: "Detection of Persuasion Techniques in Texts and Images".

WVOQ participated in all three subtasks. Most interesting is the contribution to subtask 2. Simultaneously detecting a span and classifying it was done through a sequence-to-sequence model.

Architecture

The system is built with Simple Transformers, which is a task-oriented framework on top of Hugging Face Transformers.

Configuration is in dev.yaml. Here you find the configuration for the three tasks and options to run specific functionality, e.g. to create predictions on the basis of a trained model.

There are five major modules:

load_data.py - load the data and convert it to a format using the Fragment dataclass
train.py - train any type of model
eval.py - evaluate any type of model
postprocess.py - predict
pipeline.py - wrap the different steps together.

The system is started with: python pipeline.py which will select cur_scenario from dev.yaml for execution.

Future work

Deal with the two main causes of systemic errors:

Begin and end tags are not matching
Words or characters are introduced in the generated sentences that were not in the input

Ideas are:

Train with half-masked sentences consisting only of begin and end tags (pre-training for tags)
Add functionality to the generator code in Transformers to prevent tokens other than tags in the output that are not in the input.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
doc		doc
LICENSE		LICENSE
README.md		README.md
dev.yaml		dev.yaml
eval.py		eval.py
fragment.py		fragment.py
fragment_utils.py		fragment_utils.py
label_norm.yaml		label_norm.yaml
label_tool.py		label_tool.py
load_data.py		load_data.py
pipeline.py		pipeline.py
pipeline_config.py		pipeline_config.py
postprocess.py		postprocess.py
test_fragment_utils.py		test_fragment_utils.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SemEval-2021-Task-6

Introduction

Architecture

Future work

About

Uh oh!

Releases

Packages

Languages

License

ceesroele/SemEval-2021-Task-6

Folders and files

Latest commit

History

Repository files navigation

SemEval-2021-Task-6

Introduction

Architecture

Future work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages