Prathamesh-Verlekar/US-Permanent-Visas-Analysis

In this project we have cleaned and processed the data extracted from USCIS website, which includes all the details and information for US Visa applications from year 2011-2016. Then have created a data model based on the dataset and using which created a database in Neo4j, which is a graph database and best for problem-solving and analysis. The data was loaded into Neo4j using Cypher queries. Then, built a data pipeline for connecting Neo4j to Python and built an interactive dashboard for better insights. Finally, implemented acceptance and integration testing for data and system validation.

Jupyter Notebook

Big-Data-Architecture-and-Governance

Project Objective

To clean and validate the data extracted from USCIS website
Create a data model based on the dataset
Create a database in Neo4j and load the data using Cypher queries
Create a data pipeline for connecting Neo4j to Python
Build an interactive dashboard for better insights
Extract Metadata from Neo4j database and load it to SQL Server database
Integration and Acceptance testing for data validation

Data Overview using Pandas Profiling

This Dataset gives detailed information of around 374K visa applications and its decision.
Data covers 2011-2016 and includes information on employer, position, wage offered, job posting history, employee education and past visa history, and final decision.
we can analyze that the dataset has 374362 observations out of which 373025 are unique observations. The dataset has 154 variables out of which only 21 variables have more than 330000 non-missing observations.
The Dataset has, 116 Categorical values 2 Date Time values 10 Numerical values 26 Boolean values

Technical Vision Diagram

Graph Data Model

Database Schema in Neo4j

Interactive Dashboard

Target Audience

US Citizenship and Immigration Services
Corporates of different sectors
Immigrants applying for US Visa

Dashboard Insights

We found that H-1B is the top visa application that is applied through the different companies and has most approved visas.
Amazon is amongst top 5 companies that file the highest number of visa applications.
Computer Engineering is the hottest job for which companies are filling visa application and has highest rate of approval.
India is the country with the most visa applications filed throughout the world and has the most approved cases.