/Data_Analysis_Club

Programming practices - ML, DL, Kaggle with Python, R, Matlab

Primary LanguageJupyter NotebookMIT LicenseMIT

Data Analysis Club

  • This repo is to maintain the work of the data analysis club.
  • Programming activities in weekly seminar
  • Nov. 5, 2019 ~ Mar. 11, 2021

1. Python Tutorials

Python for Data Analysis

[1] Wes McKinney. (2012). Python for Data Analysis. O'Reilly Media, Inc.
  1. Build in Data structure functions and files
  2. Arrays and Vectorized Computations
  3. Getting started with Pandas
  4. Data Loading Storage and File Formats
  5. Data Cleaning and Preparation
  6. Data Wrangling Join Combine and Reshape
  7. Plotting and Visualization
  8. Data Aggregation and Group Operations
  9. Time Series
  10. Advanced pandas

2. R Tutorials

  • Practice Plot, Treemap, Bubble Chart and Mosaic Plot
  • Practice Simple, Multiple Regression and Logistic Regression | Presentation

3. Python Crawling

  • Web Crawler for downloading images

4. R Text Mining

  • English Text mining with 'tm' | Presentation
  • Visualization with 'wordcloud'

5. Python MachineLearning

Supervised Learning

  • KNN (K-Nearest_Neighbor)
  • Linear regression
  • Ridge, Lasso
  • Elastic net
  • Logistic regression
  • Naive Bayes Classifier
  • Decision tree
  • Random Forest
  • Gradient Boosting
  • Ada Boost Algorithm
  • SVM (Support Vector Machine) | Presentation
  • MLP (Multi-layer Perceptron) | Presentation

Unsupervised Learning

  • PCA
  • K-Means
  • Agglomerative Clustering
  • DBSCAN

6. MATLAB MachineLearning

  • Practice MATLAB basic
  • Gradient Descent with 1 weight and 2 weight (adjust the learning rate)

7. R MachineLearning

  • Simple Linear Regression with women height&weight dataset
  • Multiple Linear Regression
  • Optimal Valuable Selection with AIC and K-fold Cross Validation
  • LASSO Regression
  • Decision Tree
  • KNN Algorithm
  • Correlation analysis
  • Clustering Analysis with iris
  • Hierarchical clustering and PCA

8. Kaggle Tutorials

  • Titanic Tutorial | Presentation
    Exploratory data analysis, visualization, machine_learning
    Cross Validation, Confusion Matrix, Hyperparameter-Tuning, Ensembling (Voting, Bagging, Boosting), Feature Importance

  • Household Poverty Level Prediction | Presentation
    Feature Engineering, Machine Learning, Model Selection, Feature Selection, Gradient Boosting

  • Tensorflow Speech Recognition Challenge | Presentation
    Speech representation and data exploration, Light-Weight CNN, 1D Inception approach

9. PyTorch Tutorials

[1] PyTorch로 시작하는 딥 러닝 입문, https://wikidocs.net/book/2788

10. Opencv Tutorials

[1] OpenCV 4로 배우는 컴퓨터 비전과 머신 러닝, https://thebook.io/006939/
[2] sunkyoo/opencv4cvml, https://github.com/sunkyoo/opencv4cvml/tree/main/python

11. Natural Language Processing

[1] 딥러닝을 이용한 자연어 처리, https://wikidocs.net/book/2155
  • Pandas Profiling
  • Tokenization, Cleaning and Normalization, Stemming and Lemmatization, Stopword, Regular Expression