Pinned Repositories
dataset_deduplication_sparkml
Dataset deduplication using the spark ML lib and Scala
owl-data-sanitizer
A pyspark lib to validate data quality
ronald-smith-angel's Repositories
ronald-smith-angel/owl-data-sanitizer
A pyspark lib to validate data quality
ronald-smith-angel/dataset_deduplication_sparkml
Dataset deduplication using the spark ML lib and Scala
ronald-smith-angel/airflow
Apache Airflow
ronald-smith-angel/charts
Curated applications for Kubernetes
ronald-smith-angel/cube-summation-approach
Cube Summation problem (https://www.hackerrank.com/challenges/cube-summation) solution using a python dictionaries approach.
ronald-smith-angel/datahub
The Metadata Platform for the Modern Data Stack
ronald-smith-angel/files-partitioner-HDFS-like
File Partitioner of random data that simulates HDFS (version 1) data node behaviours but storing locally.
ronald-smith-angel/poetry
Python dependency management and packaging made easy.
ronald-smith-angel/predictor-flask-hd
Service to classify hand written digits using tensorflow + keras + flask
ronald-smith-angel/research-papers
Research papers - master degree in computer science (Distributed Systems + IA)