apache-spark-cluster
There are 10 repositories under apache-spark-cluster topic.
nchammas/flintrock
A command-line tool for launching Apache Spark clusters.
PiercingDan/spark-Jupyter-AWS
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
aamargajbhiye/big-data-projects
This project has customization likes custom data sources, plugins written for the distributed systems like Apache Spark, Apache Ignite etc
josemarialuna/ExternalValidity
This package contains the code for calculating external clustering validity indices in Spark. The package includes Chi Index among others.
akaltsikis/Markov_Cluster_Algorithm
Implementations of Markov Clustrer Algorithm (MCL) and Regularized Markov Cluster Algorithm (R-MCL) in Apache Spark
arturobp3/Steam_Analysis_For_Gamers
Analysis performed on data from the Steam platform using Apache Spark and Cloud services such as Amazon Web Services.
ashsProjects/Distributed_Analytics_of_US_Residential_Zoning
This is a project that aims to do distributed analytics using clusters using a spatial dataset. Our goal with this project was to analyze the impact of single family rresidential zoning in the US and correlate it to quality of life measures in an effort to dissuade a segregation of zoning types and promote inclusivity.
savvydatainsights/spark
Apache Spark cluster lab.
erjan/data_engineering_japan_visas_pyspark
data enginerring project - visualize visa numbers by country, time issued from japan
SayamAlt/Bank-Customer-Churn-Prediction-using-PySpark
Successfully established a machine learning model using PySpark which can accurately classify whether a bank customer will churn or not up to an accuracy of more than 86% on the test set.