aws-emr
There are 129 repositories under aws-emr topic.
adornes/spark_python_ml_examples
Spark 2.0 Python Machine Learning examples
adornes/spark_scala_ml_examples
Spark 2.0 Scala Machine Learning examples
jwplayer/sparksteps
:star: CLI tool to launch Spark jobs on AWS EMR
dacort/demo-code
Bits of code I use during live demos
abdullahkhawer/aws-auto-terminate-idle-emr
An AWS based solution using AWS CloudWatch and AWS Lambda based on Python to automatically terminate AWS EMR clusters that have been idle for a specified period of time.
Wittline/pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
terraform-aws-modules/terraform-aws-emr
Terraform module to create AWS EMR resources 🇺🇦
ismaildawoodjee/aws-data-pipeline
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
memosstilvi/emr-cost-calculator
EMR Cost Calculator
amzn/rheoceros
Cloud-based AI / ML workflow and data application development framework
xonai-computing/xonai-dashboard
A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver
AWS-Big-Data-Projects/Analysing-Census-Data-using-aws
Use aws-emr and aws-redshift to analyse dataset of adult census of USA
AWS-Big-Data-Projects/AWS-EMR
Analyzing Big Data with Amazon EMR
AWS-Big-Data-Projects/Run-a-Spark-job-within-Amazon-EMR
Run a Spark job within Amazon EMR
ychantit/airflow_aws_utils
A collection of airflow sample workflows for data processing on aws
mauropelucchi/aws-emr-docker-integration
AWS EMR Docker integration
linghaol/CommunityDetection-Spark-AWS
A Spark application, written in Python, to figure out strongly connected components with Bi-directional Label Propagation algorithm. This project implemented an 1.3GB Twitter network dataset on AWS EMR cluster.
jkoth/Data-Lake-with-Spark-and-AWS-S3
Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
daniel-cortez-stevenson/cookiecutter-pyspark-cloud
A cookiecutter template for working with PySpark on AWS EMR
Nerdward/batch_gh_archive
Data Engineering Project with Terraform, Spark, AWS, Docker, Airflow and other tools
sjmiller8182/Warehousing-Stock-Tweet-Data
A large-scale data framework that will enable us to store and analyze financial market data and drive future predictions for investment.
adornes/spark_r_ml_examples
Spark 2.0 R/SparkR Machine Learning examples
wingkwong/aws-playground
My AWS Playground
felipeazucares/Airflow-EMR-Redshift
EMR + Hadoop to Redshift ELT workflow using spark steps API and orchestrated by Apache-Airflow, which ingests disparate datasets focused around 7Gb of I94 arrivals information to produce a simple star schema in Redshift
pratikbarjatya/spark-walmart-data-analysis-exercise
Data Analysis Exercise over Walmart Stock
khushal2405/Daily-Incremental-load-ETL-pipeline-for-Ecommerce-company-using-AWS-Lambda-and-Apache-airflow
Daily Incremental load ETL pipeline for Ecommerce company using AWS Lambda and AWS EMR cluster, Deployed using Apache airflow in a docker container.
khushal2405/ETL-pipeline-using-Airflow-and-AWS-EMR
We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics
abhibalani/emr_lambda
Lambda to start EMR and run a map reduce job
dhruv007patel/Impact-of-Covid-19-on-Aviation-Industry
This project analyzes the correlation between COVID-19 and the US aviation industry. By studying data on passenger/freight traffic and delays alongside COVID-19 trends, it provides insights into airline and passenger responses. The findings help airlines adapt to the pandemic's impact.
HarshadRanganathan/aws-emr-launcher
Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)
JainTanisha/MapReduce-Analysis-on-Amazon-Food-Review-Data
MapReduce Analysis on Amazon Food Review Dataset (Big-Data)
jomavera/dataPipelineEMR
ETL pipeline with PySpark on EMR orchestrated with Airflow
shinde-chandrakant/BigData-Ops-on-TLC-Yellow-Taxi
Analysed New York City's Yellow taxi data set with Big Data tools such as Hadoop, HBase, Sqoop, MapReduce and AWS Cloud Infrastructure.
ninjeanne/datastorm
Data Science and Engineering project - Programming for Big Data @ Simon Fraser University (SFU)
RahilBalar98/Covid-19-And-Aviation-Analysis
CMPT 732 Project - Dealt with 3 large scale databases by joining them to analysis the economic impact of Covid-19 on the airline industry. Fetched data using API and stored in AWS S3 that is retrieved by an AWS EMR cluster that does data computation. Queried into AWS Athena and visualized the results on Tableau by implementing static and dynamic dashboards.
samchenghowing/COMP4442
Analysis and monitoring system using AWS... Also the comp4442 project