HPG IncidentAI Dataset
Official code and data for our paper "Towards Safer Operations: An Expert-involved Dataset of High-Pressure Gas Incidents for Preventing Future Failures" - EMNLP 2023 (Industry Track)
Overview
This project provides a new Japanese IncidentAI dataset for safety prevention on high-pressure gas plant domain. Our dataset comprises NLP three tasks: Named Entity Recognition (NER), Cause-Effect Extraction (CE), and Information Retrieval (IR). The original dataset was collected from publicly available reports of high-gas incidents published in 2022 by the High-Pressure Gas Safety Institute of Japan.
The dataset is annotated by domain experts who have at least six years of practical experience as high-pressure gas conservation managers. These experts possess qualifications as high-pressure gas production safety managers, a national certification demonstrating a certain level of knowledge and experience necessary to ensure the safety of high-pressure gas manufacturing facilities.
The detailed descriptions of each annotation definition for NER, CE and IR can be accessed in our annotation guideline, HPG_Annotaion_Guideline.pdf
.
Named Entity Recognition
Definitions
The NER dataset include six type of entities as following.
Entity | Descriptions | Examples |
---|---|---|
Products | Various gases. Gaseous state at normal temperature and pressure. Nouns. ※Do not tag items that are not general (things that do not appear even if you search the Web). |
Mixed gas Flammable gas Refrigerant gas Inert gas Liquefied petroleum gas Carbon dioxide gas Sulphur dioxide gas Liquefied petroleum gas Freon Hydrogen, Carbon monoxide, Acetylene, Methane, Ethylene |
Chemicals | Chemical substances, reactants, and materials (other than gases) used in gas generation and process management。 Items not included in the above Products. Nouns. |
Water, water droplets, rainwater, wash water, hot water, pure water H2O Benzene Austenitic stainless steel Lubricating oil C4-C6 Hydrocarbons |
Storages | General equipment where above Products and Chemicals come into contact. ※Include equipment such as supports and insulators. ※Include expressions that indicate the entire plant or facility. ※Do not include expressions indicating parts such as entrances and exits if they are placed at the end of a word. |
Tank Maturation furnace Refining tower Dehumidification tower Separation tower Heat exchanger Piping Valve Gasket Flange BTX manufacturing equipment Butadiene plant |
Incidents | Incidents that resulted in or caused an accident, regardless of severity. Include only incidents that actually occurred, and do not include situations that did not lead to an incident. | Explosion Seepage Leakage Fire Serious injury Death Degradation Concentration Issuing of an alert, detection, awareness, (alarm) activation |
Process | Handling of gas, and unit operations related to gas. Abnormal processes are included in Incidents. |
Filling Distillation Extraction Reaction Recovery Mixing Sealing Nitrogen purge |
Tests | Inspection devices and inspection actions outside the production process line. Do not include inspection items such as XX concentration. |
Inspection, visual inspection, three-month inspection Detailed inspection, leakage inspection Freon checker Leak test Analysis Patrol |
Data Format
@Nathan: Please provide the explanation about the data format in the file uploaded in directory: ./NER/
Example Data
Japanese (Original)
English (Translated)
Cause-Effect Extraction
Definitions
The CE dataset define the span following five type of entities.
Entity | Descriptions | Examples |
---|---|---|
Event_Leak | Various gases. Gaseous state at normal temperature and pressure. Nouns. ※Do not tag items that are not general (things that do not appear even if you search the Web). |
Hydrogen and aniline leakage |
Event_others | Chemical substances, reactants, and materials (other than gases) used in gas generation and process management。 Items not included in the above Products. Nouns. |
It is estimated that hydrogen, which has a low ignition energy, was ignited by static electricity. |
Damage_Property | General equipment where above Products and Chemicals come into contact. ※Include equipment such as supports and insulators. ※Include expressions that indicate the entire plant or facility. ※Do not include expressions indicating parts such as entrances and exits if they are placed at the end of a word. |
Container ruptures. |
Damage_Human | Incidents that resulted in or caused an accident, regardless of severity. Include only incidents that actually occurred, and do not include situations that did not lead to an incident. | One employee injured left thigh and left ear. |
Cause | Tag sentences that confirm the event causing Event_Leak and Event_others. Target not only direct causes but also indirect causes (e.g., Cause's Cause)。 In case of ignition or explosion, the three elements of combustion (combustibles, oxygen, and heat) shall be noted cause. |
As a result of reduced tightening torque in some of the flange sections cooled by hydrogen |
Data Format
To make the Cause-Effect Extraction data more accessible, we prepare train/test splits in JSON format. Each item contains three fields:
- "text": original text of the data item
- "tags": list of tags for annotated spans
- "spans": list of annotated spans
From these fields, you can convert train/test data to standard NER (sequence labeling) or extractive QA format (SQuAD format).
{
"text": "2006-217 ポリブデン製造設備の水添反応器において、触媒再生作業中、内部を冷却するため水素ガスを送っていたところ爆発音がしたため、作業員が現場に急行したところ反応器の下部配管フランジ部より発火していた。 このため消火器で火を消し、その後公設消防が放水して当該部位付近を冷却した。この火災により、リアクター下部配管の保温材が焼損した。 原因は、本作業中に当該下部配管を取り替えたが、接続する際に本来は直径111mmのパッキンを取付けるところ、誤って95mmのパッキンを取付けてしまったことである。 このため、水素が漏えいし静電気により着火し火災となったとみられる。今後は、作業マニュアルを見直し、作業員の教育を徹底することとした。",
"tags": [
"Event_others",
"Damage_Property",
"Cause",
"Event_Leak",
"Cause",
"Event_others"
],
"spans": [
"ポリブデン製造設備の水添反応器において、触媒再生作業中、内部を冷却するため水素ガスを送っていたところ爆発音がしたため、作業員が現場に急行したところ反応器の下部配管フランジ部より発火していた",
"この火災により、リアクター下部配管の保温材が焼損した",
"本作業中に当該下部配管を取り替えたが、接続する際に本来は直径111mmのパッキンを取付けるところ、誤って95mmのパッキンを取付けてしまったことである",
"水素が漏えいし",
"静電気により",
"着火し火災となったとみられる"
]
}
Example Data
Japanese (Original)
English (Translated)
Information Retrieval
Definitions
The IR dataset defines Attributes
and their Labels
for a given accident descriptions as following table.
Attribute | Label | Description |
---|---|---|
Types of high-pressure gas |
a. Flammable (or flame retardant) gas b. Toxic gas c. Satisfies a and b d. Not applicable |
The high-pressure gas that caused the reported accident was classified from the perspective of danger in the event of an accident. Cases where the gas could not be identified were included under “d. Not applicable”. The definition of flammable gas and toxic gas shall conform to the High Pressure Gas Safety Act in Japan. |
Cause of accident | a. Equipment Factor b. Human Factor c. External factor d. Other factor |
The events that caused or triggered the accident were classified. Equipment factors refer to those caused by initial defects in parts built into the equipment. Human factors refer to errors made in operation or judgment by people on site. External factors indicate those caused by events from outside the equipment, such as falling objects. |
Accident Results | a. Leakage b. Fires and explosions c. a. and property damage d. a. and human casualties e. b and property damage f. b and human casualties g. Property damage and human casualties |
The events that occurred as a result of the accident were classified. Physical and human damage were only considered if they occurred as secondary events, such as gas leaks or fires. Property damage : Accidents resulting in damage to equipment or facilities due to fire or explosion ※Do not include damage to equipment or other items that caused the accident. Human casualties : Accidents resulting in health hazards to humans due to leakage, fire, or explosion |
Time span from cause to effect |
a. Sudden b. Long-term c. Unknown |
The classification was made based on the time from when the cause or trigger of the accident occurred until the accident event took place. Sudden : Accidents where the results are caused generally within a few minutes to several tens of minutes from the occurrence of the cause. |
Operational status of equipment at the time of cause occurrence |
a, During steady-state operation b. During non-steady state operation c. During maintenance d. Other situations. |
The classification was made based on the operational status of the equipment at the time of the accident. Non-steady state operation refers to operating conditions that differ from normal operation, such as immediately after the equipment starts running or during test operation. |
Example Data
Citation
If you find our work helpful, please cite us:
@misc{inoue2023safer,
title={Towards Safer Operations: An Expert-involved Dataset of High-Pressure Gas Incidents for Preventing Future Failures},
author={Shumpei Inoue and Minh-Tien Nguyen and Hiroki Mizokuchi and Tuan-Anh D. Nguyen and Huu-Hiep Nguyen and Dung Tien Le},
year={2023},
eprint={2310.12074},
archivePrefix={arXiv},
primaryClass={cs.CL}
}