NLPCC-2020 Shared Task on AutoIE

Leaderboard

Rank	TeamName	Organization	F1-score
1	Sophie	搜狗杭州研究院知识图谱组	85.00
2	Hair Loss Knight	美团点评NLP中心	84.87
3	Hermers	武汉汉王大数据	84.75
4	augmented_autoner	PATech	82.88
5	一只小绵羊	北京语言大学	82.78
6	STAM	**科学院信息工程研究所	82.50
7	BUTAUTOJ	北京工业大学信息学部	80.91
8	Circle	北京林业大学	80.80
9	yunke_ws	加拿大皇后大学	80.34
10	AI surfing	Nanjing University of Posts and Telecommunications	80.28
11	Yulong	武汉大学	79.52
12	小牛队	东北大学自然语言处理实验室	78.64
13	Auto-IE	北京航空航天大学计算机系实体抽取组	75.20
14	AutoIE_ISCAS	Institute of Software, Chinese Academy of Sciences	74.59
15	FIGHTING	大连民族大学	69.75
16	ENCH	DCD Lab Zhejiang University	67.70
17	BaselineSystem	NLPCC	63.98

Leaderboard without Valid Data

Rank	TeamName	Organization	F1-score
1	Hair Loss Knight	美团点评NLP中心	77.32
2	yunke_ws	加拿大皇后大学	71.96
3	Hermers	武汉汉王大数据	71.86
4	augmented_autoner	PATech	71.70
5	ENCH	DCD Lab Zhejiang University	70.71
6	Circle	北京林业大学	68.59
7	小牛队	东北大学自然语言处理实验室	65.91
8	BaselineSystem	NLPCC	63.98

Background

Entity extraction is the fundamental problem in language technology, and usually utilized as inputs for many downstream tasks, especially dialogue system, question answering etc. Most previous work focus on the scenario that labelled data is provided for interesting entities, however, the categories of entities are hierarchical and cannot be exhausted, the general solution cannot depend on the hypothesis that enough data with label is given. Therefore, how to build IE system for new entity type under low resource is becoming the common problem for both academic and industry.

Task

The task is to build IE system with Noise and Incomplete annotations. Given a list of entities of specific type and a unlabelled corpus containing these entity types, the task aims to build an IE system which may recognize and extract the interesting entities of given types.

Note:

entity is a general concept of named entity in this task. Some words without a specific name are also very important for downstream applications, therefore, they are included in this information extraction task
No human annotation and correction are allowed for train and test dataset.
Dev dataset with full label may be used in the training step in any way.

Data

The corpus are from caption text of YouKu video. Three categories of information are considered in this task, which are TV, person and series. All data are split into 3 datasets for training, developing and testing.

Train dataset

Unlabelled corpus containing 10000 samples, the entities are labelled by string matching with the given entity lists.
Entity lists with specific category, which may cover around 30% of entities appearing in the unlabelled corpus

Dev dataset

1000 samples with full label

Test dataset

2000 samples with full label

Baseline

The evaluation provides a baseline system for participants. The solution is based on the paper "Better Modeling of Incomplete Annotations for Named Entity Recognition", please check the readme file in the baseline folder for more detail

Submission & Evaluation

For submission, please write the prediction result into a single file and email it to Xuefeng Yang (杨雪峰) email：ryan@wezhuiyi.com

The submission file format should be the same as given YourTeamName.json file under Submission folder. To be specific, each line is a json string containing the prediction result of one sample.

For evaluation. all the system will be evaluated against 2000 test samples with full annotation. Ranking of submissions are based on the f1 score of these test samples. The test dataset includes 2000 real test samples and 8000 mixed samples, the score is only based on the prediction of the real 2000 samples.

The test dataset will be provided in 2020/05/15, and each team has three oppotunities to submit their results in the week 05/15--05/20. The results are public available in this github page and ranked by the f1 score.

Organizers:

Xuefeng Yang (ZhuiYi Technology) email: ryan@wezhuiyi.com

Benhong Wu (ZhuiYi Technology) email: wubenhong@wezhuiyi.com

Zhanming Jie (Singapore University of Technology and Design) email: zhanming_jie@mymail.sutd.edu.sg

ZhuiyiTechnology/AutoIE