spark-cluster

There are 26 repositories under spark-cluster topic.

minhky2185/healthcare_data_pipeline
An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.
Language:Python16 1 06
mgarralda/hadoop-spark-cluster
Repository containing Docker images for create a cluster Spark on Hadoop Yarn.
Language:Jupyter Notebook9 1 13
AIxHunter/Spark-k8s-pod-template
Steps to deploy a Spark app to Kubernetes cluster using spark-submit or a pod template
Language:Shell6 1 01
aimanamri/raspberry-pi4-hadoop-spark-cluster
This is a self-documentation of learning distributed data storage, parallel processing, and Linux OS using Apache Hadoop, Apache Spark and Raspbian OS. In this project, 3-node cluster will be setup using Raspberry Pi 4, install HDFS and run Spark processing jobs via YARN.
Language:Shell5 1 00
SinghHarshita/Clustering-Algorithms-Spark
KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.
Language:Jupyter Notebook5 1 00
shuaicj/spark-cluster
A spark cluster based on docker-compose.
Language:Shell3 2 04
vaibhavmagon/Spark-Python-MovieReviews
Script to run and find similarities between movies from a movie lens data set using Python & Spark Clustering.
Language:Python3 1 01
ayseirmak/DistributedFraudDetection
In this study, we propose to use a distributed storage and computation system in order to track money transfers instantly. In particular, we keep our transaction history in a distributed file system as a graph data structure. We try to detect illegal activities by using Graph Neural Networks (GNN) in distributed manner.
Language:Python2 2 01
karamolegkos/Diastema
This is my contribution in the project Diastema
Language:Python2 2 0
longNguyen010203/Spark-Processing-AWS
👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊
Language:Python2 1 00
dazayzeh/Installing-spark-standalone-to-a-cluster-manually
I'll walk you through launching a cluster manually using Spark standalone deploy mode, as well as connecting an app to the cluster, launching the app, where to view the monitoring and logging.
1 1 00
matthieuvion/spark-cluster
Steps to deploy a local spark cluster w/ Docker. Bonus: a ready-to-use notebook for model prediction on Pyspark using spark.ml Pipeline() on a well known dataset
Language:Jupyter Notebook1 1 00
shuaicj/spark-cluster-zk
A spark cluster containing multiple spark masters based on docker-compose.
Language:Shell1 1 01
aiden-dai/ai-cluster
Start clusters in virtualbox VMs
0 1 00
DanMolenhouse/Distributed-Systems-Project5-Hadoop-and-Spark
In this project, we used both Hadoop / MapReduce and Spark to do distributed computing. The first task was to perform a series of operations using a Mapper and Reduce java file that was implemented on a Hadoop server. The second task was to perform similar operations, but on Spark instead.
Language:Java0 1 00
harshkavdikar1/GeoSpatial-DataAnalysis-With-Spark
A distributed application to identify top 50 taxi pickup locations in New York by analyzing over 1 billion records using apache spark, hadoop file system and scala.
Language:Scala0 1 00
pientaa/opening-black-box
Deep dive into Spark UDFs' characteristics.
Language:Jupyter Notebook0 1 230
RammySekham/spark-kb
Spark standalone architecture, local architecture and reading hadoop file formats i.e. avro, parquet and ORC
Language:Jupyter Notebook0 1 01
silencebingo/hadoop-spark-cluster
A Hadoop and Spark Cluster on Docker
Language:Shell0 0 00
Turnipdo/Spark-Standalone-Cluster-Setup
To facilitate the initial setup of Apache Spark, this repository provides a beginner-friendly, step-by-step guide on setting up a master node and two worker nodes.
Language:Python0 1 00
ansjin/docker-spark
docker spark standalone
Language:Dockerfile2 01
euiyounghwang/spark_job_interface_service
spark_job_interface_service
Language:Python1 0
flaviostutz/spark-submit-scala
Spark submit extension from bde2020/spark-submit for Scala with SBT
Language:Scala2 0
itsayushthada/SVD-on-Spark
Language:Jupyter Notebook1 0
kumarvna/terraform-azurerm-hdinsight
Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.
Language:HCL2 15
minsusun/deploy-spark-cluster
configs for deploying the spark clusters on docker and k8s !!
Language:Shell

spark-cluster

minhky2185/healthcare_data_pipeline

mgarralda/hadoop-spark-cluster

AIxHunter/Spark-k8s-pod-template

aimanamri/raspberry-pi4-hadoop-spark-cluster

SinghHarshita/Clustering-Algorithms-Spark

shuaicj/spark-cluster

vaibhavmagon/Spark-Python-MovieReviews

ayseirmak/DistributedFraudDetection

karamolegkos/Diastema

longNguyen010203/Spark-Processing-AWS

dazayzeh/Installing-spark-standalone-to-a-cluster-manually

matthieuvion/spark-cluster

shuaicj/spark-cluster-zk

aiden-dai/ai-cluster

DanMolenhouse/Distributed-Systems-Project5-Hadoop-and-Spark

harshkavdikar1/GeoSpatial-DataAnalysis-With-Spark

pientaa/opening-black-box

RammySekham/spark-kb

silencebingo/hadoop-spark-cluster

Turnipdo/Spark-Standalone-Cluster-Setup

ansjin/docker-spark

euiyounghwang/spark_job_interface_service

flaviostutz/spark-submit-scala

itsayushthada/SVD-on-Spark

kumarvna/terraform-azurerm-hdinsight

minsusun/deploy-spark-cluster