ShuoTian172
Interest: Data Science, Data Engineering, Machine Learning, Data Analyst, Deep Learning, Software Development
University of WaterlooToronto, Ontario, CA
Pinned Repositories
AmazonComments-Sentiment
Arduino-IoT-SmartCan
Bank-Marketing-Data-Analysis
CannabiSpect
Customer Segmentation Based on Cannabis Consumer Reviews
Chatbot-Seq2Seq-Attention-Transformer
Chatbot-Raw Reddit Comments-Data Clean-Seq2Seq-Tensorflow-Attention-Bidirectional GRU
CNN_PCamImages
Data_Lake_Spark
Building out an ETL pipeline, extracting data from S3 buckets, processing it through Spark and transforming into a star schema stored in S3 buckets with parquet formatting and efficient partitioning.
DNA-Sequences-Classification
Linear Feature Extraction using PCA, LDA. Nonlinear Dimensionality Reduction using LLE and ISOMAP. Naive Bayes classifier, kNN, SVM
ECE608-Quantitative-Methods
Data Wrangling-statistics-ANOVA-parametric assumption-Regression- Multiple Regression- Logistic Regression- Poisson Regression-Validity-non-parametric
ECE650-Traffic-Management-System
Vertex Cover problem-Optimization-Multi-thread-Multi-process-MiniSAT
ShuoTian172's Repositories
ShuoTian172/Data_Lake_Spark
Building out an ETL pipeline, extracting data from S3 buckets, processing it through Spark and transforming into a star schema stored in S3 buckets with parquet formatting and efficient partitioning.
ShuoTian172/Chatbot-Seq2Seq-Attention-Transformer
Chatbot-Raw Reddit Comments-Data Clean-Seq2Seq-Tensorflow-Attention-Bidirectional GRU
ShuoTian172/DNA-Sequences-Classification
Linear Feature Extraction using PCA, LDA. Nonlinear Dimensionality Reduction using LLE and ISOMAP. Naive Bayes classifier, kNN, SVM
ShuoTian172/ECE608-Quantitative-Methods
Data Wrangling-statistics-ANOVA-parametric assumption-Regression- Multiple Regression- Logistic Regression- Poisson Regression-Validity-non-parametric
ShuoTian172/AmazonComments-Sentiment
ShuoTian172/Arduino-IoT-SmartCan
ShuoTian172/Bank-Marketing-Data-Analysis
ShuoTian172/CannabiSpect
Customer Segmentation Based on Cannabis Consumer Reviews
ShuoTian172/CNN_PCamImages
ShuoTian172/ECE650-Traffic-Management-System
Vertex Cover problem-Optimization-Multi-thread-Multi-process-MiniSAT
ShuoTian172/bigdata-2019w
CS 451/651, CS 431/631: Data-Intensive Distributed Computing (Winter 2019) at the University of Waterloo https://aroegies.github.io/bigdata-2019w/
ShuoTian172/CapstoneProject
ShuoTian172/Data-Modeling-with-PostgreSQL
Data modeling with PostgreSQL and building an ETL pipeline using Python. Define fact and dimension tables for a star schema for a particular analytic focus, and write an ETL pipeline that transfers data from files in two local directories into these tables in PostgreSQL using Python and SQL.
ShuoTian172/Data_Pipelines_Airflow
Using Airflow to automate ETL pipelines using Airflow, Python, Amazon Redshift. Transforming data from various sources into a star schema optimized for the analytics team's use cases. Writing custom operators to perform tasks such as staging data, filling the data warehouse, and validation through data quality checks.
ShuoTian172/Data_Warehouse_Redshift
Building out an ETL pipeline using AWS SDK, Redshift, Python and PostgreSQL. Developing seamless pipeline to connect to Redshift cluster and COPY data from S3 buckets to redshift staging tables. Creating a database with tables designed to optimize queries on song play analysis
ShuoTian172/deeplearning-tutorials
Code for deep learning tutorials that I have posted on my blog: https://hareeshbahuleyan.github.io/blog/
ShuoTian172/handwritten-digit-dataset-analysis
Handwritten dataset with 5 classes: digit 0, 1, 2, 3, 4. Dimensional reduction approaches-PCA, LDA, LLE, and Isomap- were implied.
ShuoTian172/java-design-patterns
Design patterns implemented in Java
ShuoTian172/Movie-Recommendation-Website-using-Apache-Spark-and-Flask
ShuoTian172/NoSQL-Data-Modeling-with-Apache-Cassandra
Building out an ETL pipeline using Python. Creating a database schema and ETL pipeline for this analysis. Creating an Apache Cassandra database with denormalized tables designed to optimize queries on event data. Define robust Partition Keys, Clustering Columns and Composite Primary Keys.
ShuoTian172/Optimization-Controllers-Numerical-simulation
Optimization-Controllers-Numerical simulation-Matlab
ShuoTian172/SYDE631-Time-Series-Modlling-Houston-House-Price-Prediction
ShuoTian172/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.