mehroosali
MS CS graduate at the University of Texas at Dallas | Data Engineer | Software Developer
Richardson, Texas
Pinned Repositories
ABCStoresPipeline
Batch ETL data pipeline built on HDP 3.0 to process daily sales and business data to procedure power Bi reports. Automated the pipelines using Airflow.
assembly_file_statistics
Assembly project to compute file statistics using MIPS for class CS5330 (Computer Architecture) at the University of Texas at Dallas.
batch_number_conversion
batch number conversion project using MIPS for class CS5330 (Computer Architecture) at the University of Texas at Dallas.
bigquery-sparksql-batch-etl
Batch ETL pipeline project on GCP to load and transform daily flight data using Spark to update tables in BigQuery. The pipeline is automated using Airflow.
databricks-F1-Project
A data pipeline project build on databricks and azure to demostrate lifecycle of a cloud data project.
Ebay-DB-design
Ebay database design project for the class CS6360 (Database Design) at the University of Texas at Dallas.
Information-Retrieval-Search-Engine
Search Engine project for Information Retrieval class.
Realtime-Customer-Viewership-Analysis
data pipeline using the lambda architecture is created for the unification and consolidation of real-time customer web events, weblogs, and profile data into a hive warehouse for adhoc analysis.
s3-redshift-batch-etl-pipeline
Built functional python ETL script with functions that initialized spark clusters using pyspark library to extract songs stored in S3 bucket. Partitioned songs data by year and artist_id and compressed in parquet output files to increase load performance. Used the overwrite mode in spark to ensure every new run of ELT script is overwritten in the data lake to avoid duplicates. Orchestrated ELT data pipeline that extracts from S3, loads in redshift for transformation and loads output back to S3. Used hooks in airflow to make connection credentials configurable in order to separate access rights from code base for security. Used operators to execute loading and transformation scripts for redshift with airflow DAG.
Twitter-Sentiment-Analysis
personal project to pull live Twitter data using Nifi getTwitter processor and pushes to Kafka topic which is then consumed by a Spark Streaming application where basic sentiment analysis is performed and the final result is stored in elastic search for visualization using Kibana.
mehroosali's Repositories
mehroosali/databricks-F1-Project
A data pipeline project build on databricks and azure to demostrate lifecycle of a cloud data project.
mehroosali/s3-redshift-batch-etl-pipeline
Built functional python ETL script with functions that initialized spark clusters using pyspark library to extract songs stored in S3 bucket. Partitioned songs data by year and artist_id and compressed in parquet output files to increase load performance. Used the overwrite mode in spark to ensure every new run of ELT script is overwritten in the data lake to avoid duplicates. Orchestrated ELT data pipeline that extracts from S3, loads in redshift for transformation and loads output back to S3. Used hooks in airflow to make connection credentials configurable in order to separate access rights from code base for security. Used operators to execute loading and transformation scripts for redshift with airflow DAG.
mehroosali/bigquery-sparksql-batch-etl
Batch ETL pipeline project on GCP to load and transform daily flight data using Spark to update tables in BigQuery. The pipeline is automated using Airflow.
mehroosali/Ebay-DB-design
Ebay database design project for the class CS6360 (Database Design) at the University of Texas at Dallas.
mehroosali/Information-Retrieval-Search-Engine
Search Engine project for Information Retrieval class.
mehroosali/Twitter-Sentiment-Analysis
personal project to pull live Twitter data using Nifi getTwitter processor and pushes to Kafka topic which is then consumed by a Spark Streaming application where basic sentiment analysis is performed and the final result is stored in elastic search for visualization using Kibana.
mehroosali/ABCStoresPipeline
Batch ETL data pipeline built on HDP 3.0 to process daily sales and business data to procedure power Bi reports. Automated the pipelines using Airflow.
mehroosali/assembly_file_statistics
Assembly project to compute file statistics using MIPS for class CS5330 (Computer Architecture) at the University of Texas at Dallas.
mehroosali/batch_number_conversion
batch number conversion project using MIPS for class CS5330 (Computer Architecture) at the University of Texas at Dallas.
mehroosali/kruskals-algorithm
Kruskal's algorithm project using Java for class CS5343 (Data Structures and Algorithms) at the University of Texas at Dallas.
mehroosali/Realtime-Customer-Viewership-Analysis
data pipeline using the lambda architecture is created for the unification and consolidation of real-time customer web events, weblogs, and profile data into a hive warehouse for adhoc analysis.
mehroosali/maze-solver
Maze Solver project using Java for class CS5343 (Data Structures and Algorithms) at the University of Texas at Dallas.
mehroosali/mehroosali
Github profile homepage.
mehroosali/mehroosali.github.io
Personal Portfolio website
mehroosali/Muy-Feliz
Android application project in react ecosystem for the class CS 6326 (Human Computer Interaction).
mehroosali/word-puzzle
Word Puzzle project using Java for class CS5343 (Data Structures and Algorithms) at the University of Texas at Dallas.