CBSD-2022-UNIPD

This study focuses on the replicability of finding relevant predictors for lie detection in various psychometric tests concerning medicine, behavioral science and data science that have been compiled twice, once honestly and once dishonestly. More precisely, the goal is to develop a framework for feature selection that leads to good and similar results for different models used for the discrimination of honesty and dishonesty of test responses. Accuracy, Top-5 stability and Accuracy Standard Deviation are the metrics used to evaluate the results.

Approaches used

The approaches developed in this project to select the features are the following:

PCA: select 20% of the total number of features using the principal component analysis.
Permutation importance: fitted on a random forest, with features selected based on t-test.
Mutual Information: the features selected by the Joint Mutual Information Maximization (JMIM) algorithm with an importance score of at least 0.8 out of 1 are used.

Before applying the methods, the datasets are split into training and test (70%-30%) and for every feature, the mean and the standard deviation are computed in order to scale that feature: $Z=\frac{X-\mu}{\sigma}$. These three methods are independent of each other and each one of them is going to be described in depth later on.

Models used

Each one of the approaches considered in this project, as mentioned before, selects a number of features from the corresponding original dataset, and then these selected features are used to train different models and to observe their performance. The models trained in this project are:

Logistic regression model on all the features (Full LR)
Logistic regression model on selected features (LR)
Support vector machine (SVM)
Random forest (RF)
Multi-layer perceptron classifier (MLP)

For each of these models is also computed the related accuracy in order to see firstly how good that model is performing with the selected features and secondly to compare the models between them in order to figure out if the selected features give similar performances among all the models. A logistic regression with all the features is trained at the beginning. In this way, it’s possible to have a comparison between the results obtained with the selected features.

Metrics

Accuracy: ratio of correct predictions over the number of instances. This has been chosen as all the datasets show a fairly balanced number of examples per class (all are binary classification tasks). The accuracy is computed on the full model (Full LR) as well as all the other four models used for benchmarking and trained only on the subset of features selected by each of the procedures in scope.
Accuracy Standard Deviation: standard deviation of the four models (i.e. LR, SVM, RF, MLP) fitted on the subset of selected features. It is a measure of the consistency of the classification performance across different models, thus the lower the better.
Top-5 stability: a more specific metric for assessing consistency across models (i.e. LR, SVM, RF, MLP). It takes into account the first five most important features used by each of the models, the formula developed is:

$$TOP5 Stability=1-\big(\frac{1}{(num.models-1)\cdot\min{(5,|\Omega|)}}\sum^{\min{(5,|\Omega|)}}_{i=1}{|\beta_{i}|-1}\big)$$

where $\Omega$ is the set of features selected by a procedure, i.e. $\Omega={\beta_1,...,\beta_n}$; $\beta_i$ is a vector with the feature selected with importance $i$ across the models (notice that in our case $num.models=4$). Finally, $|\beta_i|$ is the number of unique values in $\beta_i$.

Datasets

Name	Topic	Faking good/faking bad	Number of samples	Numbers of features
DT_df_CC	Short Dark Triad 3 for child costudy	Faking good	482	27
DT_df_JI	Short Dark Triad 3 for a job interview	Faking good	864	27
PRMQ_df	Identify memory difficulties	Faking bad	1404	16
PCL5_df	Identify victims of PTSD	Faking bad	402	20
NAQ_R_df	Identify possible victims of mobbing	Faking bad	712	22
PHQ9_GAD7_df	Identify possible victims of anxious-depressive syndrom	Faking bad	1118	16
PID5_df	Identify mental disorders	Faking bad	824	220
sPID5_df	Identify mental disorders	Faking bad	1038	25
PRFQ_df	Specific caregivers' ability to mentalize with their children	Faking good	678	18
IESR_df	Identify possible victims of PTSD	Faking bad	358	22
R_NEO_PI_df	Personality questionnaire (Big5)	Faking good	77687	30
RAW_DDDT_df	Identify Dark Triad personality	Faking bad	986	12
IADQ_df	Identify adjustment disorder (stress response syndrome)	Faking bad	450	9
BF_df_CTU	Job interview for a salesperson position	Faking good	442	10
BF_df_OU	Job interview for in humanitarian organization	Faking good	460	10
BF_df_V	Obtain child costudy	Faking good	486	10

MattiaBrocco/Model-agnostic-feature-selection

CBSD-2022-UNIPD

Approaches used

Models used

Metrics

Datasets