jpacerqueira-zz
Big Data Architect / Consultant 👍 Working @ 6point6.co.uk - London Previous @ GFT Group && Perform/DAZN Group London
FuelBigData.comLondon
Pinned Repositories
airflow-executions
Apache Airflow for K8s Clusters with Docker-compose orchestration. Example includes used in Workflows for Jobs like WebHooks and WebScrapers
Akamai-log-Analysis-SparkML-H2o
Transformation of Akamai Logs with Spark ETL and discover of Values and similarities in logs used SparkML and H2O ML
DeepLearning-MalwareDetection
jpac-sparklyr
H2O and sparklyr setup in Rstudio with demo/trials for Hadoop Spark
Jupyter_Spark_H2O_Kafka_Client_Setup
This is the core of project lost_saturn . The project lost_saturn project is a modern approach to datascience, focus on enabling DataScience on containerised environments everywhere. Built first with a local setup and transformed into a container solution. It has tools centralized in Jupyter , with Spark and AutoML H2O.ai . Ideal to run Notebooks in Jupyter in WSL (Windows Subsystem Linux), or Docker containers with Ubunto 18.4 LTS
project_lost_saturn
This is the core of project lost_saturn . The project lost_saturn project is a modern approach to datascience, focus on enabling DataScience on containerised environments everywhere. Built first with a local setup and transformed into a container solution. It has tools centralized in Jupyter , with Spark and AutoML H2O.ai . Ideal to run Notebooks in Jupyter in WSL (Windows Subsystem Linux), or Docker containers with Ubunto 18.4 LTS
spark-on-kubernetes
An Deployment and Setup of Apache Spark for multi-tenant usage in Kubernetes Clusters. This deploys 1 Executor per K8S POD , scales linearly.
SparkElasticSearchPublisher
Elasticsearch publisher using Hadoop as source and Spark 1.6 as ETL engine :: Running package for Cloudera CDH 5.9.0 Cluster
technical-test-Jupyter-Spark-Delta-Pandas
Technical Test Github Repo for Container of Test
Terraform_start6Nodes_cdh5.xCluster
AWScli Terraform for 6 Node Cloudera CDH with Hadoop Spark Hive
jpacerqueira-zz's Repositories
jpacerqueira-zz/Akamai-log-Analysis-SparkML-H2o
Transformation of Akamai Logs with Spark ETL and discover of Values and similarities in logs used SparkML and H2O ML
jpacerqueira-zz/Jupyter_Spark_H2O_Kafka_Client_Setup
This is the core of project lost_saturn . The project lost_saturn project is a modern approach to datascience, focus on enabling DataScience on containerised environments everywhere. Built first with a local setup and transformed into a container solution. It has tools centralized in Jupyter , with Spark and AutoML H2O.ai . Ideal to run Notebooks in Jupyter in WSL (Windows Subsystem Linux), or Docker containers with Ubunto 18.4 LTS
jpacerqueira-zz/project_lost_saturn
This is the core of project lost_saturn . The project lost_saturn project is a modern approach to datascience, focus on enabling DataScience on containerised environments everywhere. Built first with a local setup and transformed into a container solution. It has tools centralized in Jupyter , with Spark and AutoML H2O.ai . Ideal to run Notebooks in Jupyter in WSL (Windows Subsystem Linux), or Docker containers with Ubunto 18.4 LTS
jpacerqueira-zz/technical-test-Jupyter-Spark-Delta-Pandas
Technical Test Github Repo for Container of Test
jpacerqueira-zz/Terraform_start6Nodes_cdh5.xCluster
AWScli Terraform for 6 Node Cloudera CDH with Hadoop Spark Hive
jpacerqueira-zz/airflow-executions
Apache Airflow for K8s Clusters with Docker-compose orchestration. Example includes used in Workflows for Jobs like WebHooks and WebScrapers
jpacerqueira-zz/DeepLearning-MalwareDetection
jpacerqueira-zz/jpac-sparklyr
H2O and sparklyr setup in Rstudio with demo/trials for Hadoop Spark
jpacerqueira-zz/spark-on-kubernetes
An Deployment and Setup of Apache Spark for multi-tenant usage in Kubernetes Clusters. This deploys 1 Executor per K8S POD , scales linearly.
jpacerqueira-zz/SparkElasticSearchPublisher
Elasticsearch publisher using Hadoop as source and Spark 1.6 as ETL engine :: Running package for Cloudera CDH 5.9.0 Cluster
jpacerqueira-zz/als-benchmark-scripts
Scripts to benchmark distributed Alternative Least Squares (ALS)
jpacerqueira-zz/cluster-management-python-pyspark-ngrams-samples
cluster-management-python-pyspark-ngrams-samples
jpacerqueira-zz/confluent-kafka-xperiments
Experimentation of confluent Kafka Tools and Client solutions
jpacerqueira-zz/container_cryptominer
jpacerqueira-zz/Databricks-cloud-formation
jpacerqueira-zz/Docker-Container-Jupyter
Docker-Container for Jupyter Notebooks using as a baseline hook other repo
jpacerqueira-zz/FiveCoolTest
Techical assignment
jpacerqueira-zz/Hadoop
Hadoop Cloudera investigations
jpacerqueira-zz/jpac-flume-logs
My adaptation of the flume-logs ingestion process
jpacerqueira-zz/MyDockerSetupNordVPN
jpacerqueira-zz/TensorFlowJava
TensorFlow in Java. If Google Can do it! I can Do it!