dataproc-cluster
There are 23 repositories under dataproc-cluster topic.
Wittline/pyDag
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
naranjja/gcp-jupyter-sql
Run Jupyter Notebooks (and store data) on Google Cloud Platform.
anjijava16/GCP_Data_Enginner_Utils
GCP_Data_Enginner
MarieeCzy/METAR-Data-Engineering-and-Machine-Learning-Project
An educational project to build an end-to-end pipline for near real-time and batch processing of data further used for visualisation and a machine learning model.
jaiswalanshul/gcp_dataproc_spark_airflow
Data Workflows with GCP Dataproc, Apache Airflow and Apache Spark
bilalsp/yelp_etl
Yelp ETL Pipeline in Apache Spark on Google Cloud Dataproc
Keval-Gandevia/BigDataETLAndSentimentAnalysis
A Java based project aims to extract news articles from large .sgm file, process them and load them into MongoDB Database. It includes an Apache Spark job for word frequency analysis directly from .sgm files, and a sentiment analysis implementation using a Bag-of-Words model in Java.
mr-ubik/google-nembo
Collection of personal resources on Google Cloud
pietrocarbo/scala-ble
A Scala Spark based project to experiment with map-reduce algorithms on big data graph shaped
akaliutau/gcp-prod-spark-cluster
Deploying production ready environment for Spark cluster
bche3/Big-Data-Project-Voter-Turnout-Prediction
Data Science Project: Predicting voter turnout in swing states in the United States based on 2020 General Election data through big data analytics
brauseo/desafio-dataproc
Criando um ecossitema Hadoop totalmente gerenciado com Google Cloud Platform: O desafio consiste em efetuar um processamento de dados utilizando o produto Dataproc do GCP. Esse processamento irá efetuar a contagem das palavras de um livro e informar quantas vezes cada palavra aparece no mesmo.
Cyang18/MusicProducer
This is a distributed system that utilizes Apache Spark through Dataproc. We use the Spotify API to send song data to Apache Spark, which then forwards the information to Google Cloud Services. The system processes this data to recommend songs based on the extracted information.
jjtoharia/Kaggle_Outbrain
Kaggle - Outbrain Click Prediction (Oct-2016 - Jan-2017)
natmurad/cloudbigdata
Content about how to create big data ecosystems on the Cloud
InspiredcL/data-science-on-gcp
Código fuente: Análisis de Vuelos basado en trabajo de Valliappa Lakshmanan.
jonathanAmancioSales/Hadoop_Dataproc_Google_Cloud_Platform_DIO
Projeto do Curso "Criando um Ecossistema Hadoop Totalmente Gerenciado com Google Cloud Dataproc" do Bootcamp Data Engineer da Digital Innovation One
mihir-robotics/pyspark-gcp-project
PySpark Job that runs in Dataproc cluster, loads data from Cloud Storage to BigQuery table.
tirthmehta/Google-Cloud-Platform-based-Hadoop-Map-Reduce
Determination of which words occur in a dataset of textbooks along with each word's occurrence count identification with the help of Google Cloud Platform based Dataproc cluster formation.
vasisthasinghal/Yelp-Review-Classification
Training a classification model as a Dataproc Job and using Kafka/PubSub connector for real-time prediction using pre-trained models
vishnudxb/gcloud-dataproc-creation
Creating gcloud dataproc cluster with this github action