The Immigration and Nationality Act (INA) of the US permits foreign workers to come to the United States to work on either a temporary or permanent basis. The act also protects US workers against adverse impacts on working place and maintain requirements when they hire foreign workers to fill workforce shortages. The immigration programs are administered by the Office of Foreign Labor Certification (OFLC).
- OFLC gives job certification applications for employers seeking to bring foreign workers into the United States and grants certifications.
- As In last year the count of employees were huge so OFLC needs Machine learning models to shortlist visa applicants based on their previous data.
In this project we are going to use the data given to build a Classification model:
- This model is to check if Visa get approved or not based on the given dataset.
- This can be used to Recommend a suitable profile for the applicants for whom the visa should be certified or denied based on the certain criteria which influences the decision.
For Detailed EDA and Feature engineering Check out notebook directory
Their performances were compared in order to determine which one works best with our dataset and used them to predict if Visa will get approved or not from user input from Flask application.
The data contains the different attributes of employee and the employer. The detailed data dictionary is given below.
case_id
: ID of each visa applicationcontinent
: Information of continent the employeeeducation_of_employee
: Information of education of the employeehas_job_experience
: Does the employee has any job experience? Y= Yes; N = Norequires_job_training
: Does the employee require any job training? Y = Yes; N = Nono_of_employees
: Number of employees in the employer's companyyr_of_estab
: Year in which the employer's company was establishedregion_of_employment
: Information of foreign worker's intended region of employment in the US.prevailing_wage
: Average wage paid to similarly employed workers in a specific occupation in the area of intended employment. The purpose of the prevailing wage is to ensure that the foreign worker is - not underpaid compared to other workers offering the same or similar service in the same area of employment.unit_of_wage
: Unit of prevailing wage. Values include Hourly, Weekly, Monthly, and Yearly.full_time_position
: Is the position of work full-time? Y = Full Time Position; N = Part Time Positioncase_status
: Flag indicating if the Visa was certified or denied
💿 Installing
- Environment setup.
conda create --prefix venv python==3.9 -y
conda activate venv/
- Install Requirements and setup
pip install -r requirements.txt
- Run Application
python app.py
- Flask
- Python 3.9
- Machine learning
- Scikit learn
- Logistic Regression
- KNeighbors Classifier
- XGB Classifier
- CatBoost Classifier
- SVC
- AdaBoost Classifier
- RandomForest Classifier
From these above models after hyperparameter optimization we selected Top two models which were XGBRegressor and Random Forest Regressors and used the following in Pipeline.
-
GridSearchCV is used for Hyperparameter Optimization in the pipeline.
-
Any modification has to be done in Inside Config.yaml which can be done in route /update_model_config
Artifact : Stores all artifacts created from running the application
Components : Contains all components of Machine Learning Project
- DataIngestion
- DataValidation
- DataTransformations
- ModelTrainer
- ModelEvaluation
- ModelPusher
Custom Logger and Exceptions are used in the Project for better debugging purposes.
- This Project can be used in real-life by US Visa applicant so that they can improve their resume and criteria for the approval process
- Can be implemented in Visa application website for users.
=====================================================================