/Access-Control-using-Machine-Learning

We aim to build up an employee access control system, which automatically approves or reject employee’s resource application.

Primary LanguageJupyter Notebook

Access-Control-using-Machine-Learning-Implementation

Introduction

An employee may need to apply for different resources during his or her career at the company. For big companies like Google and Amazon, due to their highly complicated employee and resource situations, the application review process is generally done by different human administrators.

Our Goal

We aim to build up an employee access control system, which automatically approves or reject employee’s resource application.

About the Dataset

The Dataset was taken from : https://www.kaggle.com/c/amazon-employee-access-challenge/data?select=train.csv

The dataset consists of 2 files train.csv and test.csv. Train.csv contains 32,769 data points Test.csv contains 58,921 data points.

Algorithms Used:

  1. Logistic Regression: Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. The logistic regression model predicts P(Y=1) as a function of X. It uses the sigmoid function to predict probability values

  2. Random Forest (Ensemble Bagging): The random forest is a classification algorithm consisting of many decision trees. It uses bagging and feature randomness when building each individual tree to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree

  3. K-Nearest Neighbour: K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which can be used for both classification as well as regression predictive problems. KNN is a non parametric algorithm (meaning, it does not make any underlying assumptions about the distribution of data)belonging to supervised learning community. KNN algorithm can also be used for regression problems. The only difference will be using averages of nearest neighbors rather than voting from nearest neighbors

  4. SVM: They are especially effective at classification, numeral prediction, and pattern recognition tasks.  SVMs find a line (or hyperplane in dimensions greater than 2) in between different classes of data such that the distance on either side of that line or hyperplane to the next-closest data points is maximized. In other words, support vector machines calculate a maximum-margin boundary that leads to a homogeneous partition of all data points. This classifies an SVM as a maximum margin classifier.

  5. Naïve Bayes Classifier: Naives bayes classifiers are a group of machine learning algorithms that all use the Bayes’ Theorem to classify data points. The reason they are called “naive” is because they each assume features of a data point as being completely independent of one another. Naives bayes classifiers use the probabilities of certain events being true — given other events are true — in order to make predictions about new data points. This is the factor that makes this formula so unique compared to other machine learning classifying algorithms.