Source code and dataset for ACL2022 Findings Paper "LEVEN: A Large-Scale Chinese Legal Event Detection dataset"

Primary LanguagePython


Dataset and source code for ACL 2022 Findings paper "LEVEN: A Large-Scale Chinese Legal Event Detection Dataset" .


The dataset can be obtained from Tsinghua Cloud or Google Drive. The annotation guidelines are provided in Annotation Guidelines.

Large Scale

LEVEN is the largest Legal Event Detection dataset and the largest Chinese Event Detection dataset. Here is a comparison between the scale of LEVEN and other datasets.


Datasets denoted with * are not publicly available, and – means the value is not accessible

High Coverage

LEVEN contains 108 event types in total, including 64 charge-oriented events and 44 general events. Their distribution is shown below.


The LEVEN event schema has a sophisticated hierarchical structure, which is shown here.

Leader Board

To get the test results, you can submit your predictions to our CodaLab competition (link is coming soon).


The source codes for the experiments are included in the Baselines and Downstreams folder.

​ The Baselines folder includes DMCNN, BiLSTM, BiLSTM+CRF, BERT, BERT+CRF and DMBERT.

​ The Downstreams folder includes Legal Judgment Prediction and Similar Case Retrieval.


We implement six competitive Baselines and their performances are as follows.


Downstream Tasks

We also explore the use of LEVEN on two Downstreams. We simply use event as side information to promote the performance of Legal Judgment Prediction and Similar Case Retrieval.

The experiment results for Legal Judgment Prediction are shown below.


The experiment results for Similar Case Retrieval are shown below.



The Chinese event schema is shown below. Please check our paper for the English version.

The detailed explanation and annotation guidelines are provided in Annotation Guidelines.



If these data and codes help you, please cite this paper.