Big Data Journal Projects
This Projects are done under Cloud Tech and BigdataJournal Community Group
Vadodara
Pinned Repositories
Amazon-Redshift-cluster-to-analyze-USA-Domestic-flight-data
worked with an Amazon Redshift cluster to analyze USA Domestic flight data. Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions
Analysing-Census-Data-using-aws
Use aws-emr and aws-redshift to analyse dataset of adult census of USA
Analyzing-Twitter-in-real-time-with-Kinesis-Lambda-Comprehend-and-ElasticSearch
Analyzing Twitter in real time with Kinesis, Lambda, Comprehend and ElasticSearch
AWS-Data-Lake
AWS Lake Formation makes it easy for you to set up, secure, and manage your data lakes also data discovery using the metadata search capabilities of Lake Formation in the console, and metadata search results restricted by column permissions.
aws-forest-fire-predictive-analytics
Big Data Engineering & Analytics Project
aws-serverless-data-lake-workshop
This workshop is meant to give customers a hands-on experience with mentioned AWS services. Serverless Data Lake workshop helps customers build a cloud-native and future-proof serverless data lake architecture. It allows hands-on time with AWS big data and analytics services including Amazon Kinesis Services for streaming data ingestion
big-data-ecosystem
Project developed during the Cognizant Cloud Data Engineer Bootcamp on the Digital Innovation One platform with the objective of extracting and counting words from a book in plain text format, displaying the most frequent word, through a python algorithm.
big-data-solutions
This repository provides Code examples written in Python,Spark-Scala using primarily boto3 SDK API methods and aws cli examples for majority of the AWS Big Data services. There are also nicley written Wiki articles for most of the common issues/challenges faced within BigData world.
Iot-and-Big-Data-Application-using-aws-and-apache-kafka
Iot,Big Data Analytics using Apache-kafka,spark and other aws services
IoT-Data-with-Amazon-Kinesis
Build a Visualization and Monitoring Dashboard for IoT Data with Amazon Kinesis Analytics and Amazon QuickSight
Big Data Journal Projects's Repositories
AWS-Big-Data-Projects/big-data-solutions
This repository provides Code examples written in Python,Spark-Scala using primarily boto3 SDK API methods and aws cli examples for majority of the AWS Big Data services. There are also nicley written Wiki articles for most of the common issues/challenges faced within BigData world.
AWS-Big-Data-Projects/Amazon-Redshift-cluster-to-analyze-USA-Domestic-flight-data
worked with an Amazon Redshift cluster to analyze USA Domestic flight data. Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions
AWS-Big-Data-Projects/Big-Data-Beverage-Recommender-System
Big-Data-Beverage-Recommender-System
AWS-Big-Data-Projects/big-data-ecosystem
Project developed during the Cognizant Cloud Data Engineer Bootcamp on the Digital Innovation One platform with the objective of extracting and counting words from a book in plain text format, displaying the most frequent word, through a python algorithm.
AWS-Big-Data-Projects/Analysis-Of-NYC-Yellow-Taxi
The core objective of this project is to analyse the factors for demand for taxis, to find the most pickups, drop-offs of public based on their location, time of most traffic and how to overcome the needs of the public.
AWS-Big-Data-Projects/Data-Analytics-For-Mobile-Games
Player Unknown's Battlegrounds (PUBG), is a first person shooter game where the goal is to be the last player standing. You are placed on a giant circular map that shrinks as the game goes on, and you must find weapons, armor, and other supplies in order to kill other players / teams and survive.
AWS-Big-Data-Projects/Image-Caption-Generator
In this project, a framework is developed leveraging the capabilities of artificial neural networks to “caption an image based on its significant features”.
AWS-Big-Data-Projects/big-data-challenge
Your first goal for this assignment will be to perform the ETL process completely in the cloud and upload a DataFrame to an RDS instance. The second goal will be to use PySpark or SQL to perform a statistical analysis of selected data.
AWS-Big-Data-Projects/amazon-emr-with-delta-lake
Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR
AWS-Big-Data-Projects/trn-cc-bg-aws
Crash Course Big data on AWS
AWS-Big-Data-Projects/aws-analytics-reference-architecture
AWS-Big-Data-Projects/aws-auto-terminate-idle-emr
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
AWS-Big-Data-Projects/aws-cdk-for-emr-on-eks
AWS-Big-Data-Projects/aws-data-lake-solution
A deployable reference implementation intended to address pain points around conceptualizing data lake architectures that automatically configures the core AWS services necessary to easily tag, search, share, and govern specific subsets of data across a business or with other external businesses.
AWS-Big-Data-Projects/aws-data-wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
AWS-Big-Data-Projects/aws-ddk
An open source development framework to help you build data workflows and modern data architecture on AWS.
AWS-Big-Data-Projects/aws-etl-orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
AWS-Big-Data-Projects/Data-Pipeline-with-CDK
AWS-Big-Data-Projects/data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
AWS-Big-Data-Projects/emr-serverless-samples
Example code for running Spark and Hive jobs on EMR Serverless.
AWS-Big-Data-Projects/hive-emr-on-eks
AWS-Big-Data-Projects/hopsworks
Hopsworks - Data-Intensive AI platform with a Feature Store
AWS-Big-Data-Projects/retail-demo-store
AWS Retail Demo Store is a sample retail web application and workshop platform demonstrating how AWS infrastructure and services can be used to build compelling customer experiences for eCommerce, retail, and digital marketing use-cases
AWS-Big-Data-Projects/s3uploader-ui
AWS-Big-Data-Projects/serverless-data-pipeline
Python CDK serverless data pipeline with CI/CD process and Slack notifications.
AWS-Big-Data-Projects/Simplify-Big-Data-Analytics-with-Amazon-EMR-
Simplify Big Data Analytics with Amazon EMR, published by Packt
AWS-Big-Data-Projects/spark-examples
Spark Examples
AWS-Big-Data-Projects/spark-scala-examples
This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
AWS-Big-Data-Projects/SparkLearning
A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.
AWS-Big-Data-Projects/streaming-data-solution-for-amazon-kinesis-and-amazon-msk
A solutions that automatically configures the AWS services necessary to easily capture, store, process, and deliver streaming data. This solution helps you solve for real-time streaming use cases like capturing high volume application logs, analyzing clickstream data, continuously delivering to a data lake, and more.