/personal-development

Summary for the book Lean Analytics

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

This is the beginning of my trajectory to become an awesome data scientist.

yeah

I have no degree on computer science and I'm not a "formal" software engineer. I have however a bachelor degree in Physics, so my math background is pretty solid.

I already know Python (thanks to MIT course on edX) but I have A LOT to improve, and I plan to learn Go too.

This is what I figured so far for my development:

deeplearning.ai's Deep Learning Specialization

  • Course 1 - Neural networks and deep learning
  • Course 2 - Improving deep neural networks
  • Course 3 - Structuring machine learning projects
  • Course 4 - Convolutional neural networks
  • Course 5 - Sequence models

Book "Feature Engineering for Machine Learning"

Summary here

  • Chapter 1 - The Machine Learning Pipeline
  • Chapter 2 - Fancy Tricks with Simple Numbers
  • 3. Text Data: Flattening, Filtering, and Chunking
  • 4. The Effects of Feature Scaling: From Bag-of-Words to Tf-Idf
  • 5. Categorical Variables: Counting Eggs in the Age of Robotic Chickens
  • 6. Dimensionality Reduction: Squashing the Data Pancake with PCA
  • 7. Nonlinear Featurization via K-Means Model Stacking
  • 8. Automating the Featurizer: Image Feature Extraction and Deep Learning
  • 9. Back to the Feature: Building an Academic Paper Recommender

PGA Study

There's a repo for that

  • Some descriptive analytics for PGA index file
  • Descriptive analytics for siva files on PGA according to some criteria
    • Download siva files
    • Examine siva files
    • Use gitbase to query siva files

Data Science from Scratch

Following O'Reilly book

  • Chapter 01 - Introduction
  • Chapter 02 - A Crash Course in Python
  • Chapter 03 - Visualizing Data
  • Chapter 04 - Linear Algebra
  • Chapter 05 - Statistics
  • Chapter 06 - Probability
  • Chapter 07 - Hypothesis and Inference
  • Chapter 08 - Gradient Descent
  • Chapter 09 - Getting Data
  • Chapter 10 - Working with Data
  • Chapter 11 - Machine Learning
  • Chapter 12 - k-Nearest Neighbors
  • Chapter 13 - Naive Bayes
  • Chapter 14 - Simple Linear Regression
  • Chapter 15 - Multiple Regression
  • Chapter 16 - Logistic Regression
  • Chapter 17 - Decision Trees
  • Chapter 18 - Neural Networks
  • Chapter 19 - Clustering
  • Chapter 20 - Natural Language Processing
  • Chapter 21 - Network Analysis
  • Chapter 22 - Recommender Systems
  • Chapter 23 - Databases and SQL
  • Chapter 24 - MapReduce
  • Chapter 25 - Go Forth and Do Data Science

Using source{d} stack

Following "Introduction to Code As Data & Machine Learning On Code"

  • Getting started with Babelfish
  • Analyzing Git Repositories
  • Getting started with gitbase & gitbase web
  • MLonCode Pre-trained Models
  • Training MLonCode Models

Playing with Kaggle's Titanic dataset

There's a repo for that

  • Acquire data
  • Analyze by describing data
  • Analyze by pivoting features
  • Analyze by visualizing data
  • Wrangle data
  • Model, predict and solve
    • Logistic Regression
    • KNN or k-Nearest Neighbors
    • Support Vector Machines
    • Naive Bayes classifier
    • Decision Tree
    • Random Forrest
    • Perceptron
    • Artificial neural network
    • RVM or Relevance Vector Machine

Lean Analytics Book

Following the book

  • To be developed

Natural Language Processing with Python

Following O'Reilly book

  • To be developed

Introducing Go

Following the book by Caleb Doxsey

  • Chapter 01 - Getting Started
  • Chapter 02 - Types
  • Chapter 03 - Variables
  • Chapter 04 - Control Sctructures
  • Chapter 05 - Arrays, Slices and Maps
  • Chapter 06 - Functions
  • Chapter 07 - Structs and Interfaces
  • Chapter 08 - Packages
  • Chapter 09 - Testing
  • Chapter 10 - Concurrency
  • Chapter 11 - Next Steps

Misc

Concepts that I have no clue about and have to study/practice

  • reflog
  • git bisect
  • binaries
  • packfile
  • namespace
  • xpath
  • testing
  • SDK
  • debug
  • protobuf
  • rpc