aws-emr-clusters

There are 40 repositories under aws-emr-clusters topic.

RubensZimbres/Repo-2019
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Language:Jupyter Notebook139 12 173
terraform-aws-modules/terraform-aws-emr
Terraform module to create AWS EMR resources 🇺🇦
Language:HCL26 2 1725
AWS-Big-Data-Projects/Run-a-Spark-job-within-Amazon-EMR
Run a Spark job within Amazon EMR
Language:Java12 1 21
suvayu/emr-scripts
Shell scripts for AWS EMR clusters
Language:Shell7 2 02
felipeazucares/Airflow-EMR-Redshift
EMR + Hadoop to Redshift ELT workflow using spark steps API and orchestrated by Apache-Airflow, which ingests disparate datasets focused around 7Gb of I94 arrivals information to produce a simple star schema in Redshift
Language:Python5 2 02
khushal2405/Daily-Incremental-load-ETL-pipeline-for-Ecommerce-company-using-AWS-Lambda-and-Apache-airflow
Daily Incremental load ETL pipeline for Ecommerce company using AWS Lambda and AWS EMR cluster, Deployed using Apache airflow in a docker container.
Language:Python5 2 02
fermat01/ETL-Data-Pipeline-using-AWS-EMR-Spark-Glue-Athena
ETL Data pipeline using aws services
Language:Python4 1 03
abhibalani/emr_lambda
Lambda to start EMR and run a map reduce job
Language:Python3 1 01
anuragkr29/TightCommunityDetection
Detect Tight Communities in a social Network
Language:Scala2 3 02
dvu4/udacity-data-engineering
Data Engineering Projects including Data Modeling, Data Warehouse, Data Lake Development
Language:Jupyter Notebook2 2 01
nikhilsu/Product-review-analysis-Spark-MongoDB
Performing various product review analysis on Amazon dataset using Apache Spark and MongoDB
Language:Java2 1 01
rigganni/AWS-Spark-Million-Song-ETL
Load data from the Million Song Dataset into a final dimensional model stored in S3.
Language:Python2 1 00
rshinde03/Default-Credit-Data-Analysis-and-Prediction-Using-Big-Data
Credit defaulting results in a large profit loss to banks and other credit lenders. The success of the banking industry results in the ability to understand risk. This project uses big data technologies like Mapreduce, HDFS along with PySpark and AWS for analysis of credit history and its prediction
Language:Jupyter Notebook2 2 01
silviomori/covid19-datalake
Language:Python2 2 00
Adith-Rai/Reddit-Stock-Sentiment-Analyzer
A Cloud based Reddit stock sentiment analyzer that analyzes overall sentiment from a configurable selection of stock subreddits for each stock. The architecture utilizes AWS MSK (Kafka), AWS EMR (PySpark) and AWS Lambda (Python 3) for maximum scalability and the OpenAI API for sentiment analysis through prompt engineering.
Language:Python1 2 00
johnnyiller/cluster_funk
An opinionated framework for running big data jobs
Language:Python1 0 00
kacperstyslo/most-wanted-programming-skills-finder
With this app, you can see what programming skills are most in-demand in the current job market.
Language:Python1 1 00
m1theus/aws-emr-terraform
Example for provisioning AWS EMR service with Terraform
Language:HCL1 1 01
nihil21/DocxAnonymizer-spark
Stand-alone Scala & Java tool to anonymize OOXML Documents (DOCX)
Language:Java1 0 0
sagardua297/udacity-data-engineering-nd
Data Pipeline Analytics Platform is an end-to-end generic Big Data pipeline. Involves following tech stack: AWS S3, AWS Redshift, AWS EMR Cluster, Apache Spark, Apache Airflow.
Language:Python1 1 00
SRVivek1/pyspark-rdd-dataframe-examples
PySpark RDD and DataFrame Examples
Language:Python1 1 00
UCloudM/Steam_Analysis_For_Gamers
Analysis performed on data from the Steam platform using Apache Spark and Cloud services such as Amazon Web Services.
Language:Python1 1 03
xianchen2/Analyzing_10GB_of_Yelp_Reviews_Data
AWS EMR backed Spark cluster for analyzing Yelp Data
Language:Jupyter Notebook1 2 01
AhmedDouaya/Deploiement_modele_cloud
Language:Jupyter Notebook0 1 00
AleGuarnieri/Data-Lake-ETL
Udacity project: implementing an ETL to process data with Apache Spark and store them in AWS S3 storage
Language:Python0 1 00
arjunsawhney1/scalable-ML
In this repo, I build a LogisticRegression prediction model with Dask and PySpark and initialize an AWS EMR cluster to run the entire pipeline.
Language:Python0 0 00
Chan2k20/Wine-Prediction-Prediction-Model-On-AWS-EMR
Implemented random forest machine learning algorithm using pyspark on AWS EMR to classify the wines. The model is then deployed in docker container.
Language:Python0 1 01
EricPaul075/OCP8-Big-data-project-deployed-in-AWS-cloud
Define a big data architecture and perform distributed machine learning calculations on an EMR cluster using AWS
Language:Jupyter Notebook0 1 00
im612/P8_big_data
A scalable prototype of an image recognition engine deployed on AWS.
Language:Jupyter Notebook0 1 00
justinapnguyen/Big_Data_Wrangling_with_Google_Books_Ngrams
In this project, the skills learned in the Big Data Fundamentals unit will be utilized to load, filter, and visualize a large real-world dataset within a cloud-based distributed computing environment using Hadoop, Spark, Hive, and the S3 filesystem.
Language:Jupyter Notebook0 1 00
marcus-repo/etl-spark
ETL Pipeline extracts JSON files from AWS S3 bucket and transforms these using an AWS EMR Spark Cluster and stores the data into an AWS S3 bucket in parquet file format.
Language:Python0 1 00
matbragan/emr-airflow
Developing a Flow with EMR and Airflow
Language:Python0 1 00
mochan42/Deploy-a-CNN-in-AWS-image-features-extraction-and-ACP
A CNN is deployed in AWS to extract image features in the context of distributed computing.
Language:Jupyter Notebook0 1 00
SagarFall2022/BigData
Realtime data pipeline
Language:Jupyter Notebook0 1 00
tugberkcapraz/capstone_sparkify
Predicting customer churn for the music app, Sparkify, using PySpark on AWS EMR clusters
Language:Jupyter Notebook0 1 00
polarbeargo/Udacity-nd027-Data-Lake
Language:Python2 01

aws-emr-clusters

RubensZimbres/Repo-2019

terraform-aws-modules/terraform-aws-emr

AWS-Big-Data-Projects/Run-a-Spark-job-within-Amazon-EMR

suvayu/emr-scripts

felipeazucares/Airflow-EMR-Redshift

khushal2405/Daily-Incremental-load-ETL-pipeline-for-Ecommerce-company-using-AWS-Lambda-and-Apache-airflow

fermat01/ETL-Data-Pipeline-using-AWS-EMR-Spark-Glue-Athena

abhibalani/emr_lambda

anuragkr29/TightCommunityDetection

dvu4/udacity-data-engineering

nikhilsu/Product-review-analysis-Spark-MongoDB

rigganni/AWS-Spark-Million-Song-ETL

rshinde03/Default-Credit-Data-Analysis-and-Prediction-Using-Big-Data

silviomori/covid19-datalake

Adith-Rai/Reddit-Stock-Sentiment-Analyzer

johnnyiller/cluster_funk

kacperstyslo/most-wanted-programming-skills-finder

m1theus/aws-emr-terraform

nihil21/DocxAnonymizer-spark

sagardua297/udacity-data-engineering-nd

SRVivek1/pyspark-rdd-dataframe-examples

UCloudM/Steam_Analysis_For_Gamers

xianchen2/Analyzing_10GB_of_Yelp_Reviews_Data

AhmedDouaya/Deploiement_modele_cloud

AleGuarnieri/Data-Lake-ETL

arjunsawhney1/scalable-ML

Chan2k20/Wine-Prediction-Prediction-Model-On-AWS-EMR

EricPaul075/OCP8-Big-data-project-deployed-in-AWS-cloud

im612/P8_big_data

justinapnguyen/Big_Data_Wrangling_with_Google_Books_Ngrams

marcus-repo/etl-spark

matbragan/emr-airflow

mochan42/Deploy-a-CNN-in-AWS-image-features-extraction-and-ACP

SagarFall2022/BigData

tugberkcapraz/capstone_sparkify

polarbeargo/Udacity-nd027-Data-Lake