/2110531_DataScience_2022s1

Data Science Tools Course at Dept. of Computer Engineering, Chula 2022

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

๐Ÿณ 2110531 Data Science and Data Engineering Tools @Chula 2022

Support-Ukraine

alt text

Welcome to the Data Science and Data Engineering Tools Course (2110531) at Chulalongkorn University! This repository contains the code, exercises, and resources to guide you through the fascinating world of data science, deep learning, and more. Each week, you'll dive into new concepts, reinforced with hands-on labs, to help you build a strong foundation in data science.

๐Ÿ“š Weekly Labs and Exercises

Week 1: Intro to Numpy, Pandas

  1. Numpy: Introduction to numerical computing with Numpy. Open In Colab

  2. Pandas: Basic data manipulation with Pandas. Open In Colab

  3. Pandas with Youtube stat data: Analyzing YouTube video statistics with Pandas. Open In Colab

  4. (Advanced) Pandas with Youtube stat data: Advanced analysis of YouTube video statistics. Open In Colab

  5. Assignment (Pandas with Youtube stat data): Hands-on assignment analyzing YouTube video statistics. Open In Colab

Week 2: Data Preparation

  1. EDA: Exploratory Data Analysis on loan data. Open In Colab

  2. Impute Missing Value: Handling missing data in loan datasets. Open In Colab

  3. Split Train/Test: Splitting datasets into training and testing sets. Open In Colab

  4. Outliers with Log: Identifying and handling outliers using logarithmic transformation. Open In Colab

  5. Outliers with Log (Titanic DataSet): Advanced outlier analysis using the Titanic dataset. Open In Colab

  6. Assignment: Titanic dataset analysis assignment. Open In Colab

Week3-4: Traditional ML

  1. Decision Trees: Open In Colab

  2. Linear Regression: Open In Colab

  3. Logistic Regression: Open In Colab

  4. Neural Network: Open In Colab

  5. K Nearest Neighbors: Open In Colab

  6. SVM: Open In Colab

  7. Save and Load Model: Open In Colab

  8. K-Means: Open In Colab

  9. Market-Basket Analysis: Open In Colab

Assignment for Week3 (Safe to eat or deadly poison?): Open In GitHub

Mushroom

Week5-6: Intro to Deep Learning

  1. Image classification (basic): flower classification Open In Colab

  2. Image classification (advanced): flower classification Open In Colab

  3. Semantic Segmentation (UNET): The Oxford-IIIT pet dataset Open In Colab

  4. LSTM: Stock price prediction Open In Colab

  5. SARIMAX: PM2.5 forecasting Open In Colab

Assignment (Fashion MNIST): Open In Colab

Week8: Data Storage with Redis

Redis Example using local data

Assignment (connect to redis server)

Week9: Data Storage with Redis

  1. Basic Webpage Scarping Open In Colab

  2. Wikipeia Data ExtractionOpen In Colab

  3. Settrade Rest API Open In Colab

  4. Twitter Data Extraction Open In Colab

  5. Selenium Open In Colab

Assignment (Counting เธงเธฑเธ™เธžเธฃเธฐ)Open In Colab

Week10: Data Ingestion with Kafka

  1. Several simple examples including both produxer and consumer in simple folder

  2. Complex example in complex folder

  3. AVRO Producer Open In Colab and Consumer Open In Colab

  4. Group example in group folder

Assignment (Transaction Verifier)Open In Colab

Note: Do not forget to upload the following schema files to your Colab

Week11: Big Data Processing with Spark

  1. Basic Spark Open In Colab

Note: Do not forget to upload the following data file to your Colab

  1. Spark SQL Open In Colab

Note: Do not forget to upload the following data file to your Colab

  1. Spark ML Open In Colab

Note: Do not forget to upload the following data file to your Colab

Assignment (Analyze IMDB)Open In Colab

Note: Do not forget to upload the following data file to your Colab

Week12: Ops Stars

  1. Several airflow examples in (airflow folder)[https://github.com/kaopanboonyuen/2110531_DataScience_2022s1/tree/main/code/week12_orchestration/airflow]

  2. Several fastapi examples in (fastapi folder)[https://github.com/kaopanboonyuen/2110531_DataScience_2022s1/tree/main/code/week12_orchestration/fastapi]

๐Ÿ›  Environment Setup

The code in this repository is designed to run in Google Colab or a local Python environment. To get started locally, ensure you have Python 3.8+ installed and use the following steps to set up your environment:

git clone https://github.com/kaopanboonyuen/2110446_DataScience_2021s2.git
cd 2110446_DataScience_2021s2
pip install -r requirements.txt

๐Ÿ“š References

  1. https://www.kaggle.com/code
  2. https://www.tensorflow.org/tutorials
  3. https://github.com/topics/machine-learning
  4. https://archive.ics.uci.edu/ml/datasets.php
  5. https://colab.research.google.com/notebooks/

๐ŸŽ“ License

This project is licensed under the MIT License. See the LICENSE file for more information.

๐Ÿ›ก๏ธ Disclaimer

This repository is for educational purposes only. All code and resources are provided as-is, without any guarantees or warranties.

For any questions or feedback, please contact me at Kao Panboonyuen.