/fake-job-posting-detection

Detect fake job posting with deep learning

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Fake Job Posting Detection

Detecting fake job postings with deep learning

Introduction

I have used deep learning to solve a binary classification problem: "Is this job description real? Isn't it a fake one?"

The used dataset can be found here.

Method

After reading data from the CSV file they should be vectorized, so I used tf-idf algorithm for the strings. Then, I implemented a fully-connected neural network in PyTorch framework for processing those vectors:

class Network(nn.Module):
	def __init__(self, input_size=NETWORK_INPUT_SIZE, output_size=NETWORK_OUTPUT_SIZE):
		super(Network, self).__init__()
		self.fc1 = nn.Linear(input_size, 256)
		self.fc2 = nn.Linear(256, 128)
		self.fc3 = nn.Linear(128, 64)
		self.fc4 = nn.Linear(64, 32)
		self.fc5 = nn.Linear(32, 16)
		self.fc6 = nn.Linear(16, 8)
		self.fc7 = nn.Linear(8, 4)
		self.fc8 = nn.Linear(4, output_size)

	def forward(self, x):
		x = self.fc1(x)
		x = F.relu(x)
		x = self.fc2(x)
		x = F.relu(x)
		x = self.fc3(x)
		x = F.relu(x)
		x = self.fc4(x)
		x = F.relu(x)
		x = self.fc5(x)
		x = F.relu(x)
		x = self.fc6(x)
		x = F.relu(x)
		x = self.fc7(x)
		x = F.relu(x)
		x = self.fc8(x)
		return x

We have an imbalanced dataset for this binary classification problem. Because of that, I have used torch.nn.BCEWithLogitsLoss as my loss function. And for the cross-validation part, skorch library has been used in my code.

Result

After running the code, a confusion matrix and some related statistics will be shown to you:

Predict     real        fake           
Actual
real        16864       150         
fake        384         482         


Overall Statistics: 

95% CI                                                            (0.96764,0.97263)
Kappa                                                             0.62834
NIR                                                               0.95157
Overall ACC                                                       0.97013

Class Statistics:

Classes                                                           real          fake             
ACC(Accuracy)                                                     0.97013       0.97013 
ERR(Error rate)                                                   0.02987       0.02987 
F0.5(F0.5 score)                                                  0.9804        0.71008 
F1(F1 score - harmonic mean of precision and sensitivity)         0.98441       0.64352 
F2(F2 score)                                                      0.98846       0.58838 
FN(False negative/miss/type 2 error)                              150           384     
FNR(Miss rate or false negative rate)                             0.00882       0.44342 
FP(False positive/type 1 error/false alarm)                       384           150     
FPR(Fall-out or false positive rate)                              0.44342       0.00882 
PPV(Precision or positive predictive value)                       0.97774       0.76266 
TN(True negative/correct rejection)                               482           16864   
TNR(Specificity or true negative rate)                            0.55658       0.99118 
TP(True positive/hit)                                             16864         482     
TPR(Sensitivity, recall, hit rate, or true positive rate)         0.99118       0.55658 

Run

First of all, install the dependencies:

pip3 install -r requirements.txt

Then, run the project using Python version 3:

python3 main.py