Eilon Tsadok
Zvi Mints
Managed byOr Anidjar
andAmit Dvir
In this project, we will explore how to benefit the knowledge that can be extracted
from a Knowledge Graph
, in order to detect a community of pedophilles with some
charachteristics.
As such, we will represent in this Knowledge Graph an IM dataset containing users
that served as pedophilles impersonators and their textual content shared over the
Internet. For the task of*"understanding"*the content, we will take advantage of the
recently based language modelling developed by Google corpora, which is called
BERT
(Bidirectional Encoder Representations from Transformers).
After the knowledge representation of the data set in the embedded knowledge
graph, we will apply a spectral clustering algorithm in order to detect communities
of users that are sharing close enough behavioral patterns.
- Dataset Source
- https://pan.webis.de/clef12/pan12-web/sexual-predator-identification.html
.json
format training dataset: https://f2h.io/8b3xzuy6rnq4.json
format test dataset: https://f2h.io/5n7mcc7cx3b6
- Knowledge Graph Vertices Embedding
- PCA (Principal Component Analysis)
- Spectral Clustering
- Topic Extraction
- Google BERT Model Utilization in Python
- Start Date: 24/03/2020
- End Date:
Start Date
+ 12 week