
Collection of useful data science topics along with articles and videos.
Subscribe to:
How to Download the Code in This Repository to Your Local Machine
To download the code in this repo, you can simply use git clone
git clone https://github.com/khuyentran1401/Data-science
- MLOps
- Testing
- Productive Tools
- Python Helper Tools
- Tools for Deployment
- Speed-up Tools
- Math Tools
- Machine Learning
- Natural Language Processing
- Computer Vision
- Time Series
- Feature Engineering
- Visualization
- Mathematical Programming
- Scraping
- Python
- Terminal
- Linear Algebra
- Data Structure
- Statistics
- Web Applications
- Share Insights
- Cool Tools
- Learning Tips
- Productive Tips
- VSCode
- Book Review
- Data Science Portfolio
| Title |
Article |
Repository |
Video |
| Introduction to DVC: Data Version Control Tool for Machine Learning Projects |
π |
π |
π |
| Introduction to Hydra.cc: A Powerful Framework to Configure your Data Science Projects |
π |
π |
π |
| Introduction to Weight & Biases: Track and Visualize your Machine Learning Experiments in 3 Lines of Code |
π |
π |
|
| Kedro β A Python Framework for Reproducible Data Science Project |
π |
π |
|
| Orchestrate a Data Science Project in Python With Prefect |
π |
π |
|
| Orchestrate Your Data Science Project with Prefect 2.0 |
π |
π |
π |
| DagsHub: a GitHub Supplement for Data Scientists and ML Engineers |
π |
π |
|
| 4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python |
π |
π |
π |
| BentoML: Create an ML Powered Prediction Service in Minutes |
π |
π |
π |
| How to Structure a Data Science Project for Readability and Transparency |
π |
π |
|
| How to Structure an ML Project for Reproducibility and Maintainability |
π |
π |
|
| GitHub Actions in MLOps: Automatically Check and Deploy Your ML Model |
π |
π |
|
| Create Robust Data Pipelines with Prefect, Docker, and GitHub |
π |
π |
|
| Create a Maintainable Data Pipeline with Prefect and DVC |
π |
π |
|
| Build a Full-Stack ML Application With Pydantic And Prefect |
π |
π |
π |
| DVC + GitHub Actions: Automatically Rerun Modified Components of a Pipeline |
π |
π |
π |
| Create Observable and Reproducible Notebooks with Hex |
π |
π |
π |
| Title |
Article |
Repository |
Video |
| Pytest for Data Scientists |
π |
π |
π |
| 4 Lessor-Known Yet Awesome Tips for Pytest |
π |
π |
|
| Great Expectations: Always Know What to Expect From Your Data |
π |
π |
|
| Validate Your pandas DataFrame with Pandera |
π |
π |
π |
| Introduction to Schema: A Python Libary to Validate your Data |
π |
π |
|
| DeepDiff β Recursively Find and Ignore Trivial Differences Using Python |
π |
π |
|
| Checklist β Behavioral Testing of NLP Models |
π |
π |
|
| How to Create Fake Data with Faker |
π |
π |
|
| Detect Defects in a Data Pipeline Early with Validation and Notifications |
π |
π |
π |
| Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for Testing |
π |
π |
π |
| Title |
Article |
Repository |
| 3 Tools to Track and Visualize the Execution of your Python Code |
π |
π |
| 2 Tools to Automatically Reload when Python Files Change |
π |
π |
| 3 Ways to Get Notified with Python |
π |
π |
| How to Create Reusable Command-Line |
π |
|
| How to Strip Outputs and Execute Interactive Code in a Python Script |
π |
π |
| Sending Slack Notifications in Python with Prefect |
π |
π |
| Title |
Article |
Repository |
Video |
| Pydash: A Kitchen Sink of Missing Python Utilities |
π |
π |
|
| Write Clean Python Code Using Pipes |
π |
π |
π |
| Introducing FugueSQL β SQL for Pandas, Spark, and Dask DataFrames |
π |
π |
|
| Fugue and DuckDB: Fast SQL Code in Python |
π |
π |
|
| Title |
Article |
Repository |
| How to Effortlessly Publish your Python Package to PyPI Using Poetry |
π |
π |
| Typer: Build Powerful CLIs in One Line of Code using Python |
π |
π |
| Title |
Article |
Repository |
| Cython-A Speed-Up Tool for your Python Function |
π |
π |
| Train your Machine Learning Model 150x Faster with cuML |
π |
π |
| Title |
Article |
Repository |
| SymPy: Symbolic Computation in Python |
π |
π |
| Title |
Article |
Repository |
Video |
| How to Monitor And Log your Machine Learning Experiment Remotely with HyperDash |
π |
π |
|
| How to Efficiently Fine-Tune your Machine Learning Models |
π |
π |
|
| How to Learn Non-linear Dataset with Support Vector Machines |
π |
π |
|
| Introduction to IBM Federated Learning: A Collaborative Approach to Train ML Models on Private Data |
π |
π |
|
| 3 Steps to Improve your Efficiency when Hypertuning ML Models |
π |
|
|
| human-learn: Create a Human Learning Model by Drawing |
π |
π |
|
| Patsy: Build Powerful Features with Arbitrary Python Code |
π |
π |
|
| SHAP: Explain Any Machine Learning Model in Python |
π |
π |
|
| Predict Movie Ratings with User-Based Collaborative Filtering |
π |
π |
|
| River: Online Machine Learning in Python |
π |
π |
π |
| Human-Learn: Rule-Based Learning as an Alternative to Machine Learning |
π |
π |
π |
Natural Language Processing
| Title |
Article |
Repository |
Video |
| Sentiment Analysis of LinkedIn Messages |
π |
π |
|
| Find Common Words in Article with Python Module Newspaper and NLTK |
π |
π |
|
| How to Tokenize Tweets with Python |
π |
π |
|
| How to Solve Analogies with Word2Vec |
π |
π |
|
| What is PyTorch |
π |
π |
|
| Convolutional Neural Network in Natural Language Processing |
π |
π |
|
| Supercharge your Python String with TextBlob |
π |
π |
π |
| pyLDAvis: Topic Modelling Exploration Tool That Every NLP Data Scientist Should Know |
π |
π |
|
| Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge |
π |
π |
|
| Build a Robust Conversational Assistant with Rasa |
π |
π |
|
| I Analyzed 2k Data Scientist and Data Engineer Jobs and This is What I Found |
π |
π |
|
| Checklist β Behavioral Testing of NLP Models |
π |
π |
|
| PRegEx: Write Human-Readable Regular Expressions in Python |
π |
π |
|
| Texthero: Text Preprocessing, Representation, and Visualization for a pandas DataFrame |
π |
π |
|
| Title |
Article |
Repository |
| How to Create an App to Classify Dogs Using fastai and Streamlit |
π |
π |
| Title |
Article |
Repository |
| Kats: a Generalizable Framework to Analyze Time Series Data in Python |
π |
π |
| How to Detect Seasonality, Outliers, and Changepoints in Your Time Series |
π |
π |
| 4 Tools to Automatically Extract Data from Datetime in Python |
π |
π |
| Title |
Article |
Repository |
| 3 Ways to Extract Features from Dates with Python |
π |
π |
| Similarity Encoding for Dirty Categories Using dirty_cat |
π |
π |
| Snorkel β A Human-In-The-Loop Platform to Build Training Data |
π |
π |
| Title |
Article |
Repository |
Video |
| How to Embed Interactive Charts on your Articles and Personal Website |
π |
π |
|
| What I Learned from Scraping 15k Data Science Articles on Medium |
π |
π |
|
| How to Create Interactive Plots with Altair |
π |
π |
|
| How to Create a Drop-Down Menu and a Slide Bar for your Favorite Visualization Tool |
π |
π |
|
| I Scraped more than 1k Top Machine Learning Github Profiles and this is what I Found |
π |
π |
|
| Top 6 Python Libraries for Visualization: Which one to Use? |
π |
π |
|
| Introduction to Yellowbrick: A Python Library to Visualize the Prediction of your Machine Learning Model |
π |
π |
|
| Visualize Gender-Specific Tweets with Scattertext |
π |
π |
|
| Visualize Your Teamβs Projects Using Python Gantt Chart |
π |
π |
|
| How to Create Bindings and Conditions Between Multiple Plots Using Altair |
π |
π |
|
| How to Sketch your Data Science Ideas With Excalidraw |
π |
|
|
| Pyvis: Visualize Interactive Network Graphs in Python |
π |
π |
π |
| Build and Analyze Knowledge Graphs with Diffbot |
π |
|
|
| Observe The Friend Paradox in Facebook Data Using Python |
π |
π |
|
| What skills and backgrounds do data scientists have in common? |
π |
π |
|
| Visualize Similarities Between Companies With Graph Database |
π |
π |
|
| Visualize GitHub Social Network with PyGraphistry |
π |
π |
|
| Find the Top Bootcamps for Data Professionals From Over 5k Profiles |
π |
π |
|
| floWeaver β Turn Flow Data Into a Sankey Diagram In Python |
π |
π |
|
| atoti β Build a BI Platform in Python |
π |
π |
|
| Analyze and Visualize URLs with Network Graph |
π |
π |
|
| statsannotations: Add Statistical Significance Annotations on Seaborn Plots |
π |
π |
π |
| Title |
Article |
Repository |
| How to choose stocks to invest in with Python |
π |
π |
| Maximize your Productivity with Python |
π |
π |
| How to Find a Good Match with Python |
π |
π |
| How to Solve a Staff Scheduling Problem with Python |
π |
π |
| How to Find Best Locations for your Restaurants with Python |
π |
π |
| How to Schedule Flights in Python |
π |
π |
| How to Solve a Production Planning and Inventory Problem in Python |
π |
π |
| Title |
Article |
Repository |
| Web Scrape Movie Database with Beautiful Soup |
π |
π |
| top-github-scraper: Scrape Top Github Users and Repositories Based On a Keyword in One Line of Code |
π |
π |
| Title |
Article |
Repository |
| Numpy Tricks for your Data Science Projects |
π |
π |
| Timing for Efficient Python Code |
π |
π |
| How to Use Lambda for Efficient Python Code |
π |
π |
| Python Tricks for Keeping Track of Your Data |
π |
π |
| Boost Your Efficiency With Specialized Dictionary Implementations in Python |
π |
π |
| Dictionary as an Alternative to If-Else |
π |
π |
| How to Use Zip to Manipulate a List of Tuples |
π |
π |
| Get the Most out of Your Array With These Four Numpy Methods |
π |
π |
| 3 Python Tricks to Read, Create, and Run Multiple Files Automatically |
π |
π |
| How to Exclude the Outliers in Pandas DataFrame |
π |
π |
| Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable |
π |
π |
| 3 Techniques to Effortlessly Import and Execute Python Modules |
π |
π |
| Simplify Your Functions with Functoolsβ Partial and Singledispatch |
π |
π |
| Title |
Article |
Repository |
| How to Create and View Interactive Cheatsheets on the Command-line |
π |
|
| Understand CSV Files from your Terminal with XSV |
π |
|
| Prettify your Terminal Text With Termcolor and Pyfiglet |
π |
π |
| Stop Using Print to Debug in Python. Use Icecream Instead |
π |
|
| Rich: Generate Rich and Beautiful Text in the Terminal with Python |
π |
π |
| Create a Beautiful Dashboard in your Terminal with Wtfutil |
π |
π |
| 3 Tools to Monitor and Optimize your Linux System |
π |
|
| Ptpython: A Better Python REPL |
π |
π |
| fd: a Simple but Powerful Tool to Find and Execute Files on the Command Line |
π |
|
| Speed Up your Command-Line Navigation with These 3 Tools |
π |
|
| Python and Data Science Snippets on the Command Line |
π |
π |
| Title |
Article |
Repository |
| Can Datasets of a Dinosaur and a Circle have Identical Statistics? |
π |
π |
| Introduction to One-Way ANOVA: A Test to Compare the Means between More than Two Groups |
π |
π |
| Bayesβ Theorem, Clearly Explained with Visualization |
π |
π |
| Detect Change Points with Bayesian Inference and PyMC3 |
π |
π |
| Bayesian Linear Regression with Bambi |
π |
π |
| Earn More Salary as a Coder β Higher Degree or More Years of Experience? |
π |
π |
| Title |
Article |
Repository |
| How to Build a Matrix Module from Scratch |
π |
π |
| Linear Algebra for Machine Learning: Solve a System of Linear Equations |
π |
π |
| Title |
Article |
Repository |
| Convex Hull: An Innovative Approach to Gift-Wrap your Data |
π |
π |
| How to Visualize Social Network With Graph Theory |
π |
π |
| How to Search Data with KDTree |
π |
π |
| How to Find the Nearest Hospital with a Voronoi Diagram |
π |
π |
| Title |
Article |
Repository |
| How to Create an Interactive Startup Growth Calculator with Python |
π |
π |
| Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge |
π |
π |
| PyWebIO: Write Interactive Web App in Script Way Using Python |
π |
π |
| PyWebIO 1.3.0: Add Tabs, Pin Input, and Update an Input Based on Another Input |
π |
π |
| Create an App to Deal with Boredom Using PyWebIO |
π |
π |
| Build a Robust Workflow to Visualize Trending GitHub Repositories in Python |
π |
π |
| Title |
Article |
Repository |
| Introduction to Datapane: A Python Library to Build Interactive Reports |
π |
|
| Datapaneβs New Features: Create a Beautiful Dashboard in Python in a Few Lines of Code |
π |
π |
| Introduction to Datasette: Explore and Publish Your Data in One Line of Code |
π |
|
| How to Share your Python Objects Across Different Environments in One Line of Code |
π |
π |
| How to Share your Jupyter Notebook in 3 Lines of Code with Ngrok |
π |
|
| Introduction to Deepnote: Real-time Collaboration on Jupyter Notebook |
π |
|
| Title |
Article |
Repository |
| Simulate Real-life Events in Python Using SimPy |
π |
π |
| How to Create Mathematical Animations like 3Blue1Brown Using Python |
π |
π |
| Title |
Article |
Repository |
| How to Learn Data Science when Life does not Give You a Break |
π |
|
| How to Accelerate your Data Science Career by Putting yourself in the Right Environment |
π |
|
| To become a Better Data Scientist, you need to Think like a Programmer |
π |
|
| How not to be Overwhelmed with Data Science |
π |
|
| Title |
Article |
Repository |
| How to Organize your Data Science Articles with Github |
π |
π |
| 5 Reasons why you should Switch from Jupyter Notebook to Scripts |
π |
|
| 7 Reasons Why you Should Start Documenting your Code |
π |
|
| Title |
Article |
Repository |
| How to Leverage Visual Studio Code for your Data Science Projects |
π |
|
| Top 4 Code Viewers for Data Scientist in VSCode |
π |
|
| Incorporate the Best Practices for Python with These Top 4 VSCode Extensions |
π |
|
| Boost Your Efficiency with Customized Code Snippets on VSCode |
π |
|
| Top 9 Keyboard Shortcuts in VSCode for Data Scientists |
π |
|
| Title |
Article |
Repository |
| Python Machine Learning: A Comprehensive Handbook for Machine Learning |
π |
|
| Title |
Article |
Repository |
| How to Create an Elegant Website for your Data Science Portfolio in 10 minutes |
π |
|
| Build an Impressive Github Profile in 3 Steps |
π |
|
Special thanks to these supporters for supporting this project!
