/posts

A list of all my posts and personal projects

Primary LanguageJupyter Notebook

Posts

A list of (some of) my posts and personal projects.

The objective of this repository is to put together in a single page my main posts and projects. I prioritize posts written in English (and that I'm proud of 😁).

I mainly write about Machine Learning and Data Science on Medium. You can visit my Medium profile to view all my posts.

The list

Title Link Tags
Code
Creating a Text Preprocessing Microservice with FastAPI 🔗 🔗
Brazilian Laws analysis with TF-IDF and K-Means 🔗 🔗
Understanding Topic Coherence Measures 🔗 -
How to ensemble Clustering Algorithms 🔗 🔗
Improve Your Data Preprocessing with ColumnTransformer and Pipelines 🔗 -
Creating a Simple ETL Pipeline With Apache Spark 🔗 🔗
Machine Learning Streaming with Kafka, Debezium, and BentoML. 🔗 🔗
Stream Processing and Data Analysis with ksqlDB 🔗 🔗
A Fast Look at Spark Structured Streaming + Kafka 🔗 🔗
First Steps in Machine Learning with Apache Spark 🔗 🔗
Temporal and Geo-referenced Traffic Management with Python+Streamlit 🔗 🔗
Hands-On Introduction to Delta Lake with (py)Spark 🔗 🔗
Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query 🔗 🔗
Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue) 🔗 🔗
Automatically Managing Data Pipeline Infrastructures With Terraform 🔗 🔗
Automatically Detecting Label Errors in Datasets with CleanLab 🔗 🔗
My First Billion (of Rows) in DuckDB 🔗 🔗
Anatomy of Windows Functions 🔗 🔗

* Is used in almost every project