TextFlint

TextFlint is an unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing . TextFlint offers 20 general transformations, 60 task-specific transformations and thousands of their combinations, and provides over 67,000 evaluation results generated by the transformation on 24 classic datasets from 12 tasks.

This repository contains all code and data used to build the TextFlint website, hosted here: https://www.textflint.io

Adding or modifying results

TextFlint allows you to upload evaluation results on new models or transformations. You can now edit files directly in GitHub to create pull requests. Please follow the instructions blow, We will check the correctness of the results. If the data is correct, it will be synchronized to the website in a few days.

Contributing to TextFlint

Thanks for contributing to TextFlint! Here are some guidelines to get your pull request accepted. You can now edit files directly in GitHub to create pull requests. All data is in the folder ./Tasks. Each task requires at least three JSON files (dataset_description.json, task_description.json and paper_list.json), four folders (human_evaluation, models, results and transformations) and a picture of the task in (./media/logos/task_logo/{task_name}.jpg), the picture should be less than 20kb.

How to add dataset description

The dataset description are in the ./Tasks/{your_task}/dataset_description.json. Open the JSON file, following fields will be shown.

[
	{
		"name": "SemEval2014-Restaurant",
		"description": "The standard SemEval2014-Restaurant dataset consists of 3,452 training, 150 validation, and 1,120 test English sentences from the restaurant reviews. Our task-specfic transformations are based on SemEval2014-Restaurant-TOWE, which provides opinion words and their position. The test set of SemEval2014-Restaurant-TOWE owns 492 different sentences (847 aspect terms).",
		"available_transformation_type": ["domain", "ut", "domain_domain", "domain_ut", "ut_ut"],
		"dataset_size": 1120,
		"models": 
        [
            {
				"model_name": "LCF-BERT",
				"paper_link": "https://www.researchgate.net/publication/335238076_LCF_A_Local_Context_Focus_Mechanism_for_Aspect-Based_Sentiment_Classification",
				"github_link": "https://github.com/yangheng95/LC-ABSA",
                "dockerhub_link":"https://registry.hub.docker.com/layers/fudannlp/reimplement/ABSA-AppliedScience-zeng2019lcf/images/sha256-e8437267241ba4570a31a3af2974f83faaf66fb8e8ba76ef59b2bed2aed75d2d?context=explore", 
				"paper_name": "LCF: A Local Context Focus Mechanism for Aspect-Based Sentiment Classification",
				"metric": 
                        {
                            "Accuracy": 84.82,
                            "Macro-F1": 76.99
                        }
			}
        ]
    }
]

This file stores all datasets used on the current task and their information, fill in the following information for each dataset

name: Name of the dataset
description: Description of the dataset
available_transformation_type: The type of transformation that can be performed on this dataset. You can only choose from domain, ut, domain_domain, domain_ut, ut_ut
dataset_size: The number of samples in the dataset
models: The information of models that are evaluated on the dataset
- model_name: Short name of the model
- paper_name: The corresponding paper of the model
- paper_link: The avaliable link of the paper
- github_link: The code link of the paper
- dockerhub_link: The dockerhub link of the model
- metric: Evaluation metric used in the datasets, the names of metric need to be consistent across all files

How to add task desctiotion

The task description are in the ./Tasks/{your_task}/task_description.json. Open the JSON file, following fields will be shown.

{
"name":"SA",
"full_name":"Sentiment Analysis",
"description":"Sentiment analysis is the task of classifying the polarity of a given text.",
"available_domain":["SwapSpecialEnt-Movie", "SwapSpecialEnt-Person", "AddSum-Movie", "AddSum-Person", "DoubleDenial"],
"available_ut":["Typos", "Ocr", "Keyboard", "AddPunc", "SwapSyn-WordNet", "SwapSyn-WordEmbedding", "SpellingError", "Contraction", "Tense", "SwapNamedEnt", "SwapNum", "InsertAdv", "MLMSuggestion", "AppendIrr", "WordCase-upper", "WordCase-lower", "WordCase-title", "TwitterType"]
}

name: Short name of the task
full_name: Full name of the task
description: A brief description of the task
available_domain: Task Specific transformations that can be performed on the task, the names of transformation need to be consistent across all files.
available_ut: Universal transformations of the task, the names of transformation need to be consistent across all files.

How to add paper list

The task description are in the ./Tasks/{your_task}/paper_list.json. This file contains all the papers we have referred to for this task. Open the JSON file, following fields will be shown.

{
"header":[
"paper",
"code",
"imdb",
"yelp"
],
"content":[
{
"paper":"[ACL 2019] Sentiment Classification Using Document Embeddings Trained with Cosine Similarity",
"code":"https://github.com/tanthongtan/dv-cosine",
"imdb":"Yes",
"yelp":"No"
},

header: The first two items must be "paper" and "code", the remaining items are the names of the datasets.
content: The information about the papers
- paper: Follow this format [Conference Year] Paper Name
- code: Link of the code
- dataset1: If the model is evaluated on the dataset1, enter Yes, otherwise enter No
- dataset2: Same as dataset1

How to add evaluation result

Before adding a evaluation result, make sure the result is reliable and to comfirm the following information of the result.

Task
Dataset
Model
Metric
Download link of dataset before and after transformation

The result files are in the ./Tasks/{your_task}/results/{your_data} folder. According to the metirc and transformation type, open the corresponding JSON file, following fields will be shown ( take ABSA as an example ).

{transformation_type}_{metric}.json

[
    {
        "model":"LCF-BERT",
        "attack_results":
            {
                "RevTgt":{
                            "ori":81.97,
                            "trans":48.93,
                            "sample_num": ,
                            "ori_download_link": "https://www.textflint.com/static/Tasks/ABSA/trans_dataset/SemEval2014-Laptop/ori_RevTgt.json",
                            "trans_download_link": "https://www.textflint.com/static/Tasks/ABSA/trans_dataset/SemEval2014-Laptop/trans_RevTgt.json",
                            "contributor": {"rui zheng":{"github": "https://github.com/ruizheng20"}}
                        }
            }
    }
]

Find the model to which the evaluation result belong (If it's a new model, fill in all the fields above and insert it after the other models), add this result to attack_results field, followed by other transformation. The meanings of the attack_results field is as follows. Make sure the added files follow the JSON strictly.

attack_results: Results you should add
- key: Name of transformation
- value:
  - ori: Evaluation result before transformation
  - trans: Evaluation result after transformation
  - sample_num: Number of data that can be transformed
  - ori_download_link: Download link of dataset before transformation, please provide available download link, or contact us
  - trans_download_link: Download link of dataset after transformation
  - contributor:
    - key: Name of contributor
    - value:
      - github: Github homepage link of contributor

If your model or transformation are not in our list, you need to upload a description of the model or transformation

How to add model description

The model description files are in the ./Tasks/{your_task}/models folder. Create a JSON file named by your model, and finish the JSON file by imitating the fields below.

your_model_name.json

{
    "desc":"The key idea of these proposals are to learn aspect embeddings and let aspects participate in computing attention weights.",
    "contributor":[{"name":"rui zheng", "github":"https://github.com/ruizheng20"}]
}

desc: Description of model
contributor
- name: Name of contributor
- github: Github homepage link of contributor

How to add transformation description

The transformation description files are in the ./Tasks/{your_task}/transformations folder. Create a json file named by your transformation, and finish the JSON file by imitating the fields below.

your_transformation_name.json

{
    "desc":"AddDiff: Add aspects with the opposite sentiment from the target aspect.",
    "examples":[
        {
            "ori":"BEST spicy tuna roll, great asian salad.（Target: spicy tuna roll)",           
             "trans":"TBEST spicy tuna roll, great asian salad, but this small place is packed, on a cold day, the seating by the entrance way can be pretty drafty and bad service."
        },
        {
            "ori":"The food was extremely tasty, creatively presented and the wine excellent. (Target: wine)",
            "trans":"The food was extremely tasty, creatively presented and the wine excellent, but yeah, sometimes the service can be slow, a lentil dish was salty beyond edibility and the red curry is weak and tasteless."
        }
    ],
    "contributor":[
        {
            "name":"rui zheng",
            "github":"https://github.com/ruizheng20"
        }
    ]
}

desc: Description of transformation
examples: Please provide at least two examples
- ori: Text before transformation
- trans: Text after transformation
contributor
- name: Name of contributor
- github: Github homepage link of contributor

How to add human evaluation

Please follow the same standard as https://www.textflint.com/human_evaluation for the human evaluation. The human evaluation files are in the ./Tasks/{your_task}/human_evaluations folder. Create a json file named by your transformation, and finish the JSON file by imitating the fields below. your_transformation_name.json

{
    "ori_list": [3.9, 4.0, 4.030100334, 4.0], 
    "trans_list": [3.84, 4.0, 3.806666667, 4.0]
}

ori_list: Human evaluation results of data before transformation. The elements in the array represent each other in turn: the mean of grammar score, the median of grammar score, the mean of plausibility score, the median of plausibility score.
trans_list: Human evaluation results of data after transformation. The elements in the array represent each other in turn: the mean of grammar score, the median of grammar score, the mean of plausibility score, the median of plausibility score.

textflint/textflint.github.io