Welcome to the Data Science and Data Engineering Tools Course (2110531) at Chulalongkorn University! This repository contains the code, exercises, and resources to guide you through the fascinating world of data science, deep learning, and more. Each week, you'll dive into new concepts, reinforced with hands-on labs, to help you build a strong foundation in data science.
-
Pandas with Youtube stat data: Analyzing YouTube video statistics with Pandas.
-
(Advanced) Pandas with Youtube stat data: Advanced analysis of YouTube video statistics.
-
Assignment (Pandas with Youtube stat data): Hands-on assignment analyzing YouTube video statistics.
-
Impute Missing Value: Handling missing data in loan datasets.
-
Split Train/Test: Splitting datasets into training and testing sets.
-
Outliers with Log: Identifying and handling outliers using logarithmic transformation.
-
Outliers with Log (Titanic DataSet): Advanced outlier analysis using the Titanic dataset.
Assignment for Week3 (Safe to eat or deadly poison?):
Redis Example using local data
Assignment (connect to redis server)
Assignment (Counting เธงเธฑเธเธเธฃเธฐ)
-
Several simple examples including both produxer and consumer in simple folder
-
Complex example in complex folder
-
Group example in group folder
Assignment (Transaction Verifier)
Note: Do not forget to upload the following schema files to your Colab
Note: Do not forget to upload the following data file to your Colab
- (star-wars.txt)[https://github.com/kaopanboonyuen/2110531_DataScience_2022s1/blob/main/code/week11_spark/star-wars.txt]
Note: Do not forget to upload the following data file to your Colab
- (bank-additional-full.csv)[https://github.com/kaopanboonyuen/2110531_DataScience_2022s1/blob/main/code/week11_spark/bank-additional-full.csv]
Note: Do not forget to upload the following data file to your Colab
- (bank-additional-full.csv)[https://github.com/kaopanboonyuen/2110531_DataScience_2022s1/blob/main/code/week11_spark/bank-additional-full.csv]
Note: Do not forget to upload the following data file to your Colab
- (netflix-rotten-tomatoes-metacritic-imdb.csv)[https://github.com/kaopanboonyuen/2110531_DataScience_2022s1/blob/main/code/week11_spark/assignment/netflix-rotten-tomatoes-metacritic-imdb.csv]
-
Several airflow examples in (airflow folder)[https://github.com/kaopanboonyuen/2110531_DataScience_2022s1/tree/main/code/week12_orchestration/airflow]
-
Several fastapi examples in (fastapi folder)[https://github.com/kaopanboonyuen/2110531_DataScience_2022s1/tree/main/code/week12_orchestration/fastapi]
The code in this repository is designed to run in Google Colab or a local Python environment. To get started locally, ensure you have Python 3.8+ installed and use the following steps to set up your environment:
git clone https://github.com/kaopanboonyuen/2110446_DataScience_2021s2.git
cd 2110446_DataScience_2021s2
pip install -r requirements.txt
- https://www.kaggle.com/code
- https://www.tensorflow.org/tutorials
- https://github.com/topics/machine-learning
- https://archive.ics.uci.edu/ml/datasets.php
- https://colab.research.google.com/notebooks/
This project is licensed under the MIT License. See the LICENSE file for more information.
This repository is for educational purposes only. All code and resources are provided as-is, without any guarantees or warranties.
For any questions or feedback, please contact me at Kao Panboonyuen.