A list of (some of) my posts and personal projects.
The objective of this repository is to put together in a single page my main posts and projects. I prioritize posts written in English (and that I'm proud of 😁).
I mainly write about Machine Learning and Data Science on Medium. You can visit my Medium profile to view all my posts.
Title | Link | Tags |
Code |
---|---|---|---|
Creating a Text Preprocessing Microservice with FastAPI | 🔗 |
|
🔗 |
Brazilian Laws analysis with TF-IDF and K-Means | 🔗 |
|
🔗 |
Understanding Topic Coherence Measures | 🔗 |
|
- |
How to ensemble Clustering Algorithms | 🔗 |
|
🔗 |
Improve Your Data Preprocessing with ColumnTransformer and Pipelines | 🔗 |
|
- |
Creating a Simple ETL Pipeline With Apache Spark | 🔗 |
|
🔗 |
Machine Learning Streaming with Kafka, Debezium, and BentoML. | 🔗 |
|
🔗 |
Stream Processing and Data Analysis with ksqlDB | 🔗 |
|
🔗 |
A Fast Look at Spark Structured Streaming + Kafka | 🔗 |
|
🔗 |
First Steps in Machine Learning with Apache Spark | 🔗 |
|
🔗 |
Temporal and Geo-referenced Traffic Management with Python+Streamlit | 🔗 |
|
🔗 |
Hands-On Introduction to Delta Lake with (py)Spark | 🔗 |
|
🔗 |
Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query | 🔗 |
|
🔗 |
Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue) | 🔗 |
|
🔗 |
Automatically Managing Data Pipeline Infrastructures With Terraform | 🔗 |
|
🔗 |
Automatically Detecting Label Errors in Datasets with CleanLab | 🔗 |
|
🔗 |