/python-deliberate-practice

Python Deliberate Practice

Primary LanguageJupyter Notebook

Python Deliberate Practice

First of all, don't be afraid, read Plateau of Productivity. More importantly, be patient, a good read from Peter Norvig, titled Teach Yourself Programming in 10 years.

Motivation

Language war between Python and R is one of the most frequently discussed topics among the Data Scientists, and there doesn't seem to be a consensus on which one is better. Personally, I used both R and Python, but for very different purposes. I mainly use tidyverse packages (dplyr + ggplot2) to carry out data analyses and data visualization, while using Python for web scraping, task automations, and building basic web applications in Flask.

By now, I have a pretty good working knowledge of the R language. There are obviously many more things that I can learn - in particular building and maintaining R packages as well as more advanced R materials. Yet, the appeal of Python has always been there for me for a few reasons:

  • It's a general purpose programming language, so presumably it is a lot easier to learn good software engineering principles. (What are they though?)
  • Many of the data stacks are built using the tools in the Python ecosystem (ETL using Airflow, Front-end using Flask with RESTful API supports, Machine Learning using scikit-learn) - being able to use the same language for different parts of the data stack will bring prototypes closer to production.

To me, the appeal of Python is not necessarily the Data Analysis part, R is already doing a great job on this. Rather, the appeal of using Python for data work is that you have a higher chance to see how data plays a role within the whole integrated technology stack. Knowing Python is likely to make me a better end-to-end Data Scientist and better Software Engineer.

Here is a great reddit answer that explains the intersection and disjoint union of the two languages beautifully.

Deliberate Practice

I am a huge believer in learning by doing, and there are a lot of opportunities on the job where I can hone my Python skills through Deliberate Practice:

  • Identify the Top Performers: I think there are quite a few people at Work (e.g. Dan F.) who can really be a role model for me to follow. Understand what they've been through to get to where they are today. What is their mental representation that I do not have about Python.

  • Build Practice Plans: Ideally, based on the rough understanding of that mental representation:

    • Define clear goals and select learning materials
    • Create deadline and milestones for the project
    • Estimate time required and come up weekly schedules

    Augment these insights with your current level of mental representation of Python to improve your understanding.

  • Targeted Practice: If I force myself to switch over to Python for Data Analysis, Data visualization, Modeling, or contribute to our internal Python Data Analysis packages, I can maximize my time practicing this skill, which is high leverage.

  • Immediate Feedbacks: We have a culture of code reviews, both for IC work as well as internal package work. The former is harder because most DS on our team are in the R camp. There's also the weekly Python office hours that should be very useful. Find constant opportunities to get feedback as much as you can.

Performance Goals

  • [Immediate] Learn to write pythonic code
  • [Shorter term, easiest to practice] Write re-usable, modular, tested code for my data work and knowledge posts
  • [Medium term, harder to practice] Achieve efficiency and feature parity on Data Analysis using Python compared to R
  • [Longer term, hardest to practice] Write tools. Being able to work on projects that span the entire data stack using Python, apply good software engineering principles to these projects

Project Goals

  • Outcome: I want to move my data stack to Python completely. This means my day-to-day data analysis work will be done in Python instead of R, make my code as pythonic as possible. Become a Contributor to Airpy / tools, and take on one bigger Python project (ML, Data Viz ...etc).

  • Curriculum: I want do everything that I can to go through all the basic materials in Pandas/Matplotlib combo. Expose myself to functional programming, OOP, testing in Python, or even making command tools. Get feedbacks from Airpy team members.

  • Timeframe: Efficiency parity by end of October. One contribution to Airpy by Mid November. One ongoing big project touching different stacks in Python by the end of 2016.

Project Milestones

Next Steps / Level In 2017

Once mastered all the above, the next natural step is to create public work that other people can use so you can democratize your useful tool to others. A great introduction to how to get started is from Tim Hopper's talk, titled Sharing Your Side Projects.

Reference