Pinned Repositories
big-data-science-notes
My notes of each module in Big Data Science, an online course offered by Semantix Brasil
maven-unicorn-challenge
This is a web app made with Python consisting of a dashboard that was used as submission for a visualization challenge called "Maven Unicorn Challenge" by Maven Analytics
Optimizing-a-Pipeline-in-Azure
The main goal of this project was to build and optimize an Azure ML pipeline using the Python SDK and a provided Scikit-learn Logistic Regression model to solve a classification problem. Hyperdrive was used to optimize the model. This was then compared to an Azure AutoML run to see which of these approaches returns the best tuned model.
spark-kubernetes
This repository contains files used to build images to deploy Spark clusters on Kubernetes
Spark-StudyClub
#DataEngineeringLATAM
kauvinlucas's Repositories
kauvinlucas/maven-unicorn-challenge
This is a web app made with Python consisting of a dashboard that was used as submission for a visualization challenge called "Maven Unicorn Challenge" by Maven Analytics
kauvinlucas/spark-kubernetes
This repository contains files used to build images to deploy Spark clusters on Kubernetes
kauvinlucas/big-data-science-notes
My notes of each module in Big Data Science, an online course offered by Semantix Brasil
kauvinlucas/Optimizing-a-Pipeline-in-Azure
The main goal of this project was to build and optimize an Azure ML pipeline using the Python SDK and a provided Scikit-learn Logistic Regression model to solve a classification problem. Hyperdrive was used to optimize the model. This was then compared to an Azure AutoML run to see which of these approaches returns the best tuned model.
kauvinlucas/Spark-StudyClub
#DataEngineeringLATAM
kauvinlucas/bert-sentiment
kauvinlucas/DataCamp-Projects
Notebooks of Datacamp projects
kauvinlucas/dio-analise-de-dados-com-pandas
Neste repositório apresentei os notebooks de analise exploratória e visualização de dados feitos no Python com a ajuda das bibliotecas Pandas e Matplotlib. Este repositório responde ao desafio da plataforma Digital Innovation One.
kauvinlucas/dio-google-cloud-dataproc
Este repositório contêm os arquivos de contagem de palavras gerados no Google Cloud por meio de script de Python e dentro de um ecossistema de Big Data gerenciado em cloud chamado Google DataProc. O repositório em questão responde ao desafio da plataforma Digital Innovation One.
kauvinlucas/docker-bigdata
Big Data Ecosystem Docker
kauvinlucas/fifa18-all-player-statistics
A complete catalog of all the players in Fifa 18 and their complete statistics.
kauvinlucas/imersao
kauvinlucas/jupyter-spark-enem-2019
In this project, I analyzed the scores of the ENEM 2019, a standardized test used for admission in Brazilian colleges, in the context of existing socioeconomic disparities between participants. PySpark was used for data ingestion and transformation. Pandas, Statsmodels, Matplotlib/Seaborn/Folium, and Scikit-learn were used for descriptive analysis and data visualization.
kauvinlucas/kauvinlucas
kauvinlucas/Predicting_car_accident_severity
Final project submission for the IBM Data Science Professional Certificate specialization
kauvinlucas/pyspark-stateful-processing-with-twitter-kafka
This is a simple project consisting of a pipeline of streaming processing with Apache Kafka, PySpark and Twitter Streaming API. This project is meant to understand the concepts behind stateful processing and event time processing with Spark Streaming
kauvinlucas/utility-bill-fraud-detection
This proyect aimed to approach the problem of detecting fraud by forgery of utility bills (as proof of address) with computer vision