This repository contains the data for the MeOffendEs@IberLEF task. All zip files are password-protected. Access to data will be granted to participants of MeOffendEs workshop at IberLEF 2021.
The folder scorers
contains the programs for evaluating each dataset as used by Codalab competition.
-
Development data (trial data): File offendes-devdata.zip (100 samples)
- devdata1.tsv -->
comment_id, comment
- devdata2.tsv -->
comment_id, comment, label, influencer, influencer_gender, media
- devgold.tsv -->
comment_id, label, confidence_list
confidence_list
is a list with four probability values (the sum up 1.0), one per each possible label in this order:[NO_prob, NOM_prob, OFP_prob, OFG_prob]
- devdata1.tsv -->
-
Train data: File offendes-traindata.zip (16,710 samples)
- traindata1.tsv -->
comment_id, comment, label
- traindata2.tsv -->
comment_id, comment, label, influencer, influencer_gender, media
- traindata1.tsv -->
-
Test data: File offendes-testdata.zip (13,606 samples)
- testdata1.tsv -->
comment_id, comment
- testdata2.tsv -->
comment_id, comment, influencer, influencer_gender, media
- testdata1.tsv -->
-
Development data (trial data): File mx-devdata.zip (76 samples)
- mx-trial-data.csv -->
each row in this file is associated to a sample in the dataset, the first row (headers) should be discarded
- mx-trial-names-csv -->
Description of the information (columns) in mx-trial-data.csv
- mx-trial-vulgarity-label.csv -->
Vulgarity label for each sample in mx-trial-data.csv, this information will be only available for trial and training data
- mx-trial-outputs.sol -->
ground truth labels for samples in mx-trial-data.csv
- mx-trial-data.csv -->
-
Training data: File mx-traindata.zip (76 samples)
- mx-train-data-non-contextual.csv -->
tweets without contextual information, each row in this file is associated to a sample in the dataset, the first row (headers) should be discarded
- mx-train-data-contextual.csv -->
tweets including contextual information, each row in this file is associated to a sample in the dataset, the first row (headers) should be discarded
- mx-train-names-csv -->
Description of the information (columns) in mx-train-data-non-contextual.csv
- mx-train-vulgarity-label.csv -->
Vulgarity label for each sample in the data files mx-train-data-contextual.csv and mx-train-data-non-contextual.csv, this information will be only available for trial and training data
- mx-train-outputs.sol -->
ground truth labels for samples in mx-train-data-contextual.csv and mx-train-data-non-contextual.csv
- mx-train-data-non-contextual.csv -->