- To clean and validate the data extracted from USCIS website
- Create a data model based on the dataset
- Create a database in Neo4j and load the data using Cypher queries
- Create a data pipeline for connecting Neo4j to Python
- Build an interactive dashboard for better insights
- Extract Metadata from Neo4j database and load it to SQL Server database
- Integration and Acceptance testing for data validation
- This Dataset gives detailed information of around 374K visa applications and its decision.
- Data covers 2011-2016 and includes information on employer, position, wage offered, job posting history, employee education and past visa history, and final decision.
- we can analyze that the dataset has 374362 observations out of which 373025 are unique observations. The dataset has 154 variables out of which only 21 variables have more than 330000 non-missing observations.
- The Dataset has,
116 Categorical values
2 Date Time values
10 Numerical values
26 Boolean values
- US Citizenship and Immigration Services
- Corporates of different sectors
- Immigrants applying for US Visa
- We found that H-1B is the top visa application that is applied through the different companies and has most approved visas.
- Amazon is amongst top 5 companies that file the highest number of visa applications.
- Computer Engineering is the hottest job for which companies are filling visa application and has highest rate of approval.
- India is the country with the most visa applications filed throughout the world and has the most approved cases.