Wittline

Learning and working on data engineering projects

Mexico City, Mexico

Pinned Repositories

apache-spark-docker
Dockerizing an Apache Spark Standalone Cluster
Language:VBA43 5 327
csv-schema-inference
A tool to automatically infer columns data types in .csv files
Language:Jupyter Notebook35 3 54
data-engineer-challenge
Challenge Data Engineer
Language:Python25 2 08
data-engineering-challenge-th
Dockerizing a Python Script for Web Scraping and consume the scraped data using FastApi (www.metroscubicos.com)
Language:Python14 3 02
Dropout-Students-Prediction
The goal of this project is to identify students at risk of dropping out the school
Language:HTML22 3 119
livyc
Apache Spark as a Service with Apache Livy Client
Language:Python3 3 01
pyDag
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
Language:Python24 3 13
pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Language:Python26 3 413
uber-expenses-tracking
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Language:Jupyter Notebook119 6 336
wbz
A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format.
Language:Python13 2 03

Wittline's Repositories

Wittline/uber-expenses-tracking
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Language:Jupyter Notebook119 6 336
Wittline/csv-schema-inference
A tool to automatically infer columns data types in .csv files
Language:Jupyter Notebook35 3 54
Wittline/pyDag
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
Language:Python24 3 13
Wittline/D3JS-Dashboard
Building Responsive DashBoard with D3.js and ASP.NET MVC from scratch (SQL SERVER - SSIS - API REST)
Language:C#13 3 03
Wittline/wbz
A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format.
Language:Python13 2 03
Wittline/docker-livy
Dockerizing and Consuming an Apache Livy environment
Language:HTML11 4 09
Wittline/csv-estimate-rows
Language:Python4 2 0
Wittline/csv-shuffler
A tool to automatically Shuffle lines in .csv files
Language:Python4 2 20
Wittline/livyc
Apache Spark as a Service with Apache Livy Client
Language:Python3 3 01
Wittline/RESTful-APIs-Nodejs
Building fast, scalable and secure RESTful services with Node, Express and MongoDB
Language:HTML3 3 03
Wittline/apache-spark-course
Apache Spark with python
Language:Jupyter Notebook2 3 0
Wittline/csv-columnar
Language:Python2 2 0
Wittline/Wittline
Take a look at my repository
2 2 02
Wittline/code_challenges
Scripts for different purposes
Language:Python1 2 01
Wittline/csv-generator
Language:Python1 2 01
Wittline/csv-splitter
csv-splitter
Language:Python1 2 0
Wittline/model-catalog-grpc
A gRPC service to consume any machine learning model stored in a model catalog through a single endpoint.
1 2 0
Wittline/awesome-twitter-data
A list of Twitter datasets and related resources.
1 0
Wittline/bulk_json_sqlite
Efficiently Bulk Import a Large JSON File into SQLite
2 0
Wittline/data-eng-frubana
Language:Python2 0
Wittline/Data-Quality
Data Quality
Language:Python2 0
Wittline/dictionary-substitute
Dictionary substitute Python Coding Task
Language:Python2 1
Wittline/fastapi-jwt
Jwt with fastapi
Language:Python2 0
Wittline/fastapi-template
Completely Scalable FastAPI based template for Machine Learning, Deep Learning and any other software project which wants to use Fast API as an API framework.
Language:Python1 0
Wittline/github-readme-stats
:zap: Dynamically generated stats for your github readmes
Language:JavaScript1 0
Wittline/learning-golang
Learning golang
2 0
Wittline/nlp-recipes
Natural Language Processing Best Practices & Examples
Language:Python1 0
Wittline/ray-sql
Distributed SQL Query Engine in Python using Ray
Language:Rust1 0
Wittline/similarity-search-duckdb
Language:Jupyter Notebook2 0
Wittline/tuboleta
tuboleta.mx
Language:JavaScript1 0