- note : open the readme file on https://dillinger.io/, edit it, and copy and paste it back if you want to see the formatted style in real time
-
Data has repeated entries
-
Data has birthdates that makes no sense
-
Filtered data to contain first and second arrest only
-
Filtered data to contain second and third arrest only
-
[For recitivism prediction] Only selected users who are in the network (i.e they were arrested with other people in the first arrest)
-
age_cat -> quantile of their age in the dataset (1 to 10 categorial)
-
eigen - eigenvalue centrality in network
-
degree - degree centrality in network
-
closeness - closeness centrality in network
-
clus - local clustering coefficient in network
NETWORKS ARE BUILT BASED ON TIME(YEAR) OF ARREST SO WE DONT HAVE FUTURE INFORMATION
-
BASELINE : ['first_arrest_SEXE','first_arrest_NCD1', 'first_arrest_MUN', 'first_arrest_ED1','age_cat']
-
SELF : BASELINE + ['eigen','degree','clus','closeness']
-
NEIGHBOUR : BASELINE + sum of neighbour's ['eigen','degree','clus','closeness']
-
SELF2010 : BASELINE + ['eigen','degree','clus','closeness'] from 2010 network (IE we use all the information to build the network , not just the information we have at the time of arrest - its cheating, just to see how well we can perform)
Fold=3
raw recidivism rate: 0.341884 1.0 percent of the people are arrested with someone and has networkinfo in this year[2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010] data size is 88416 raw recidivism rate: 0.346736 wild guess accuracy is 0.653264115092
========baseline Data========
Accuracy score of 0.879919923996 during training
ROC score of 0.95510035397 during training
Accuracy score of 0.881853963084 during training
ROC score of 0.955521046406 during training
Accuracy score of 0.881921824104 during training
ROC score of 0.955457901738 during training
The average validation score is 0.673430148389
The average validation AUC is 0.671329394353
========self Data======== Accuracy score of 0.948798859935 during training
ROC score of 0.990768563388 during training
Accuracy score of 0.949799809989 during training
ROC score of 0.991362979168 during training
Accuracy score of 0.949087269273 during training
ROC score of 0.991049924299 during training
The average validation score is 0.711568041983
The average validation AUC is 0.733301674281
========self 2010 Data======== Accuracy score of 0.95365092291 during training
ROC score of 0.989927188308 during training
Accuracy score of 0.953277687296 during training
ROC score of 0.990152837516 during training
Accuracy score of 0.952327633008 during training
ROC score of 0.989701611582 during training
The average validation score is 0.762746561708
The average validation AUC is 0.781303359362
========Neighbour Data======== Accuracy score of 0.949104234528 during training
ROC score of 0.991029495065 during training
Accuracy score of 0.950071254072 during training
ROC score of 0.991137961165 during training
Accuracy score of 0.9493417481 during training
ROC score of 0.991170712162 during training
The average validation score is 0.698108939558
The average validation AUC is 0.71891503471