30DaysOfCode

Continuous improvement is key! Not sure where I first read about this, but I just found this post by lifehack.org that highlighted the philosophy of Kaizen, the practice of continuous improvement. This resonated with me a lot, so I decided to set a goal to code every day of October!

After going through the almost the entire month (Oct 2021) I realized I really liked the visual aspect of this, so I initally decided to keep this going indefinitely. A bit down the line, I realized that as much as I loved this, I have other things I want to dedicate my time to, and have decided to stop adding new work here... Maybe I will come back in the future :)

Overview

Topics Explored: Scraping, Multiprocessing, Gradient Boosted Trees (GBT), Visualization, Content Creation, Dimensionality Reduction, Data Cleaning, Data Visualization, Data Exploration/Exploratory Data Analysis (EDA), Object Oriented Programming (OOP), Data Wrangling, Databases, Statistics, Automation, Data Versioning, Documentation,

Tools I used so far:

(Python) concurrent.futures, bs4, requests, multiprocessing, threading, numpy, matplotlib, plotly, seaborn, sqlite3, ebooklib, collections, sklearn, pandas; (SQL); (C++); (git); (Medium)

Daily Breakdown

January 2021

Re-evaluated priorities to start up my learning again!
Watched Risk at Scale - Running a large investment risk system and how risk analysis techniques can help you - fascinating watch about working with risk at large scale and the software choices behind it.
Watched:
- Highly-Scalable NLP to Answer Questions on South Africa’s COVID-19 WhatsApp Hotline - Impresive use of NLP to help covid Q&A
- Computations as Assets - a New Approach to Reproducibility and Transparency -- Introduction to ExAx and some visualizations it allows us to create. I really liked the COVID-19 visualization they did with taxi cars. Added ExAx/Accelerator to my list of thing to learn.
- Darts for Time Series Forecasting - Introduction to the Darts library. Seems like a very versatile tool for forecasting, I added it to my list of things to check out.
Looked for resources to learn some more theoretical topics and found Complexity Explorer
(Docker) Watched What is Docker in 5 Minutes
(Data Pipeline) Watched How to quickly build Data Pipelines for Data Scientists - Some nice tips for data pipelining and tutorial for delta using python
(Random Walks) Watched What is a Random Walk? | Infinite Series - Introduction to random walks to remember what they are all about
None (Weekend)
None (Weekend)
(Random Walks) Began Complexity Explorer Random Walk tutorial (1/9)
(Random Walks) Continued Complexity Explorer Random Walk tutorial (4/9)

Topics I am interested in looking into:

High priority

Lower Priority

Progress for past months:

October 2021

Oct 1: (requests, bs4, re, concurrent.futures, nltk, and pandas) Scraped readlightnovel.me to create a light-novels dataset
Oct 2: (concurrent.futures, Threading, Multiprocessing) A comparison of multi- and single core multiprocessing for matrix multiplication in Python
Oct 3: (xgboost) Implemented xgboost from scratch! (xgboost part 1)
Oct 4: (xgboost, boosting) Implemented boosting and added to previously created xgboost trees (xgboost part 2)
Oct 5: (xgboost, boosting, plotly) Finished xgboost project! Added multi-dim input feature and aproximate splitting (xgboost part 3)
Oct 6: (git, PyTest, Circle.Ci) Set up git on my PC! (I ran into problems with this before, so I opted to use desktop app/web interface locally and git for remote server work). I also studied unit testing using using PyTest and Circle.ci.
Oct 7: (SQL, sqlite) Tested out sqlite3 for running SQLite
Oct 8: (Seaborn) Added visualization in seaborn to my multiprocessing project
Oct 9: (PCA, DevOps, Blogging) Watched a couple of videos on PCA (which I found similar to SVD, a procedure I love), started going through a DevOps course on YouTube, and began writing a Medium post on SPPPACY (I have been meaning to do this last one for a long time and finally got to it!)
Oct 10: (SQL, sqlite, ebooklib, bs4, re, collections) I made a dataset for ingredient pairings
Oct 11: (SQL, streamlit, flask) Took some time to dig in deeper on SQL and web developement using Python so I can make the ingredient pairings project into an app
Oct 12: (PCA, NumPy, Sklearn) Coded up PCA in Numpy and compared results with sklearn
Oct 13: (Spark, PySpark) watched and read tutorials on PySpark and Spark
Oct 14: (medium) went back and edited the medium post i wrote on Oct 9... hopefully I get it out soon
Oct 15: (C++) I coded Othello in C++ from scratch!
Oct 16: (hugo, portfolio) Watched some tutorials on making a portfolio website
Oct 17: (hugo, portfolio) Put some more work into the porfolio
Oct 18: (A/B testing) Read about A/B testing
Oct 19: (Statistics) Started 365 Data Science statistics course
Oct 20: (Data cleaning) Went and cleaned the data I generated from the light novels cite
Oct 21: (Statistics, PySpark) Continued statistics course and read more about PySpark (on tutorialpoint)
Oct 22: (Seaborn, Pandas) Basic data exploration on the scraped novel data
Oct 23: (Seaborn, Pandas) Continued the data exploration and visualization for the light novel dataset
Oct 24: (PySpark) Figured out how to run PySpark on Google Colab
Oct 25: (Rasterio, concurrent.futures) Created a tool to match tif files between 2 directories
Oct 26: (Rasterio, concurrent.futures) More work on the tif matching tool
Oct 27: (Rasterio, concurrent.futures) Finished the tif matching tool
Oct 28: (Data Versioning: DVC, DagsHub, FastDS; Documentation: Sphinx, Read the Docs; Exploratory Analysis: Missingno, Sidetable, Pandas; GPU Programming: Numba, CuPy, CuDF, CuML; Databases: Snowflake, Tecton) Joined PyData Global 2021 and went to:
- 🦉DVC Showcase – Who Moved My Data?
- Document your scientific project with Markdown, Sphinx, and Read the Docs
- Know Your Data First: An Introduction to Exploratory Data Analysis
- GPU development with Python 101
- Snowflake and Tecton: How to build production-ready machine learning pipelines
Oct 29: (Bayesian Ordered Logistic Regression: jax, numpyro; Graphs: neo4j, optuna, sklearn, pandas) More webinars:
- Let's Implement Bayesian Ordered Logistic Regression!
- Working with Data in a Connected World: the Power of Graph Data Science
Oct 30: (Open Source: Contributed to NumPy) Last day of PyData Global:
- Participated in the NumPy + SciPy Sprint and made my first open source contribution!
Oct 31: (Compressive sensing; Data Pipelines: Apache Kafka; Causal Inference: Simpson's Paradox) Catching up on PyData webinars I missed:
- Compressive Sensing
- Start Asking Your Data “Why?” - A Gentle Introduction To Causal Inference
- Get to know Apache Kafka with Jupyter Notebooks

yuvalofek/30DaysOfCode