We provide a new Chinese military relation extraction dataset.
The compilation of the military dataset involved the utilization of diverse sources, including the Global Military Network, Wikidata military equipment data, Baidu Encyclopedia, and corpus data from third-party websites. These sources provided unstructured or semi-structured data, which were collected through the employment of web crawler technology. Following the preprocessing of the acquired data, which entailed tasks such as long text segmentation and logical error correction, the construction of the military relation extraction dataset was accomplished through manual annotation. During the annotation process, the text was classified into five relation types with reference to this research: instance, operation, country of occurrence, participation in combat, and equipment. Subsequently, the relation type, head entity, and tail entity were sequentially labeled. Finally, the resulting dataset consisted of 12,607 labeled instances, which were subsequently partitioned into training, testing, and validation sets in an 8:1:1 ratio.
Some samples of the Chinese military relation extraction dataset are shown below:
In the military folder under Dataset, more than 3000 pieces of data are provided.