OpenRegs - Knowledge Graph of ADGM Regulations

Regulatory Knowledge Graph built in collaboration with Abu Dhabi Global Market. O Original paper: A case study for Compliance as Code with graphs and language models: public release of the Regulatory Knowledge Graph .

Introduction

In order to automate compliance and RegTech process we've designed an approach aimed to convert human-readable regulations into Regulation-As-A-Code: executable rules parsed from texts with the help of transformer based LM trained on NER task and relation extraction task. This allows us to support the Open World Assumption and have an extendable ontology. Rules are extracted as a subgraph with GNN thus supplement inference with causality reasoning as follows:

If an ‘ENT’ with this ‘PERM’ was doing this ‘ACT’ or ‘FS’ with ‘PROD’ using ‘TECH’ then to avoid this ‘RISK’ they should ‘MIT’.

The project status:

Design the approach and initial taxonomy for labeling ✅
Train models to extract entities for graph nodes. ✅
Train models to extract relations for graph edges.
Train GNN for rules extraction

The project got inspiration from the DARPA's research around surveillance of the media field, where video and text feeds were parsed by ML to extract events, event's time, event's place, entities involved and other supplementary information and store that in the Knowledge DB for a downstream analysis. Find more details on how it is applicable to the RegTech and compliance and more insights for higher level vision of the Knowledge Graph project here in athe article at Linkedin

Getting started

Download the data and checkout the instructions in the ./data/ folder. You may need lfs for that.
Open the Neo4j Browser and test one of the cypher queries.

You may check all the shortest path in the Regulatory Knowledge Graph between insurance and obligation:

MATCH (ent:TagOccur),(mit:TagOccur),
p = shortestPath((ent)-[*..3]-(mit))
WHERE ent.text contains 'insur' AND mit.text contains 'obl'
RETURN p LIMIT 250

The result is interactive and should look like this:
Check more examples at ./reports/ folder

Models and technologies used to extract current version of the graph

NeuralCoref is used for coreference resolution.

Spacy To train NER models for nodes creation and custom lemmatizer:

Bert-based NER	Precision	Recall	F1
ACT	91.03	91.67	91.35
FS	94.97	79.41	86.50
PROD	90.32	93.33	91.80
TECH	94	93.73	93.87
RISK	86.24	80.50	83.27
MIT	81.54	55.58	66.11
PERM	91.01	96.43	93.64
ENT	96.57	97.40	96.98

POS-Regexp based models	Size	Source
DEF	370	COBS labeled

There is a stub for relation extraction which is based on the combination of Spacy and NetworkX

Neo4J is used for graph storage with APOC extension, cypher execution and visualisations

What has been released in this repository?

Knowledge graph for ADGM regulations in the ./data folder, which consist of:

"Nodes" "value.count"

"Document" 26

"Paragraph" 22027

"Tag" 35498

"TagOccur" 173853

"Relations" "value.count"

"NEXT" 37381

"INSIDE" 22027

"OCCUR" 173853

"SOURCE" 186693

"REL" 789253
Reports to showcase various topics coverage by regulatory documents in the ./reports folder:
Example cypher queries to try in the Reports Readme

"Nodes"	"value.count"
"Document"	26
"Paragraph"	22027
"Tag"	35498
"TagOccur"	173853

"Relations"	"value.count"
"NEXT"	37381
"INSIDE"	22027
"OCCUR"	173853
"SOURCE"	186693
"REL"	789253

FAQ

TBD

Vladimir-Ershov/adgm-kg1

OpenRegs - Knowledge Graph of ADGM Regulations

Introduction

Getting started

Models and technologies used to extract current version of the graph

What has been released in this repository?

FAQ

Disclaimer

Contact information