The code here is the full code used by WVOQ for all three tasks of SemEval-2021 Task 6.
You can find my article WVOQ at SemEval-2021 Task 6: BART for Span Detection and Classification as part of the Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021).
The poster presented at the ACL/IJCNLP / SemEval2021 is here at doc/bart_for_span_detection_poster.pdf.
Most likely you are interested only in the code for task 2: simultaneously detecting a span and classifying it.
That code is now taken out of the code-base, cleaned up, given installation instructions, and available at https://github.com/ceesroele/span_model.
Improvements:
- Simpler to understand
- Re-usable for simultaneous span detection and classification tasks
- And it is more likely to run on your machine too, not just on mine ...
This is the WVOQ team code for participation in SemEval 2021 Task 6: "Detection of Persuasion Techniques in Texts and Images".
WVOQ participated in all three subtasks. Most interesting is the contribution to subtask 2. Simultaneously detecting a span and classifying it was done through a sequence-to-sequence model.
The system is built with Simple Transformers, which is a task-oriented framework on top of Hugging Face Transformers.
Configuration is in dev.yaml. Here you find the configuration for the three tasks and options to run specific functionality, e.g. to create predictions on the basis of a trained model.
There are five major modules:
load_data.py
- load the data and convert it to a format using the Fragment dataclasstrain.py
- train any type of modeleval.py
- evaluate any type of modelpostprocess.py
- predictpipeline.py
- wrap the different steps together.
The system is started with: python pipeline.py
which will select cur_scenario
from dev.yaml
for execution.
Deal with the two main causes of systemic errors:
- Begin and end tags are not matching
- Words or characters are introduced in the generated sentences that were not in the input
Ideas are:
- Train with half-masked sentences consisting only of begin and end tags (pre-training for tags)
- Add functionality to the generator code in Transformers to prevent tokens other than tags in the output that are not in the input.