aws-glue-crawler
There are 36 repositories under aws-glue-crawler topic.
aws-samples/aws-glue-crawler-utilities
This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS CDK applications.
aws-samples/amazon-rds-export-to-s3-automation
This repository contains source code for the AWS Database Blog Post Reduce data archiving costs for compliance by automating RDS snapshot exports to Amazon S3
fermat01/ETL-Data-Pipeline-using-AWS-EMR-Spark-Glue-Athena
ETL Data pipeline using aws services
aws-samples/automated-datastore-discovery-with-aws-glue
Automation framework to catalog AWS data sources using Glue
GabrielDan92/AWS_Terraform_PySpark-ETL_Job
Terraform configuration that creates several AWS services, uploads data in S3 and starts the Glue Crawler and Glue Job.
masood2iq/AWS-Athena-Glue-S3-Bucket-Deployment-Through-AWSConsole
AWS Athena, Glue Database, Glue Crawler and S3 buckets deployment through AWS GUI console.
ShubhamMohanty680/Spotify_end_to_end_data_engineering
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.
Akanksha-tetwar/YouTube-Trending-video-analysis-ETL-using-AWS-Services
In this project I have used the Trending YouTube Video Statistics data from Kaggle to analyze and prepare it for usage.
dhvani-k/YouTrend_Insights_Analyzing_YouTube_Video_Landscape
An end-to-end solution for managing and analyzing YouTube video data from Kaggle, leveraging AWS services and visualized through Quicksight and Tableau
DivineSamOfficial/SmartCityProject
Smart City Realtime Data Engineering Project
masood2iq/AWS-Athena-Glue-S3-CloudFormation-Deployment-AWSConsole
AWS Athena, Glue Database, Glue Crawler and S3 buckets deployment through CloudFormation stack on AWS console.
SadafAsad/LinkedIn-Jobs-Analysis
Unveiling job market trends with Scrapy and AWS
sarah-zhan/data_pipeline_amazon_products
An end-to-end data pipeline built with AWS S3, Glue, Crawler, Athena, Tableau visulization
Saurabhkhandebharad/BigData-SK
Analyzed a multicategory e-commerce store using big data techniques on a Kaggle dataset with the help of AWS EC2, AWS S3, PySpark, AWS Glue ETL, AWS Athena, AWS CloudFormation, AWS Lambda and Power BI!
subhamay-cloudworks/0052-agapanthus-cft
Working with Glue Data Catalog and Running the Glue Crawler On Demand
subhamay-cloudworks/0090-deutzia-cft
Creating an audit table for a DynamoDB table using CloudTrail, Kinesis Data Stream, Lambda, S3, Glue and Athena and CloudFormation
VvEK-Hiremath/Airlines-Data-Pipeline-Project-AWS
Implementing data pipeline using AWS services for airlines data
desininja/Quality-Movie-Data-Pipeline
ETL pipeline using AWS services
h-fuzzy-logic/data-analytics-spring
Open data and cloud computing to answer the question: Are we losing our spring days?
imverma/DataEngineering-YouTube-Analysis-Project
An end-to-end solution for managing and analyzing YouTube video data from Kaggle, leveraging AWS services and visualized through Quicksight and Tableau
Kartik-Banga/Automated-ETL-Pipeline-for-Playstore-Data
Implemented ETL pipeline on AWS for Playstore data using Lambda, Glue Crawlers, and Glue ETL Jobs. Orchestrated workflow with Step Functions and achieved seamless integration, optimal data merging, and enhanced data quality/accessibility.
KRISHNASAIRAJ/AWS-Driven-Sales-Performance-Outlook
The Project aims to establish a robust data pipeline for tracking and analyzing sales performance using various AWS services. The process involves creating a DynamoDB database, implementing Change Data Capture (CDC), utilizing Kinesis streams, and finally, storing and querying the data in Amazon Athena.
masood2iq/AWS-Athena-Glue-CloudFormation-Deployment-on-Existing-S3-Bucket-AWSConsole
AWS Athena, Glue Database, Glue Crawler deployment through CloudFormation stack on already existing S3 buckets on AWS console.
masood2iq/Serverless-Framework-Athena-Glue-Deployment-on-Existing-S3-Bucket
AWS Athena, Glue Database, Glue Crawler deployment on existing S3 bucket through Serverless (sls) Framework.
masood2iq/Serverless-Framework-Athena-Glue-S3-Buckets-Deployment
AWS Athena, Glue Database, Glue Crawler and S3 buckets deployment through Serverless (sls) Framework
shahidmalik4/aws-glue-stepfunctions-etl
This project automates an ETL pipeline using AWS Glue, S3, Athena, and Step Functions to transform raw Airbnb data. It cleanses, enriches, and organizes the data into separate raw and transformed databases, enabling efficient querying and analysis via Athena, with automated notifications through SNS.
Shilpaar90/AWS-Capturing-Schema-Changes-In-S3
A pipeline within AWS to capture schema changes in S3 files and to update them in a DB.
ShreyasLengade/serverless_etl_pipeline
Developed an ETL pipeline for real-time ingestion of stock market data from the stock-market-data-manage.onrender.com API. Engineered the system to store data in Parquet format for optimized query processing and incorporated data quality checks to ensure accuracy prior to visualization.
Tyriek-cloud/NYC-Mobility-Survey-Analysis
An end-to-end data engineering project in which five NYC DOT datasets were modified in an ETL process and analyzed for insights.
AirtonLira/aws-bigdata-glue-athena
Este projeto tem como objetivo realizar a coleta, catalogo, governança, processamento e visualização de dados.
jibbs1703/Tickit-Data-Lake
This repository demonstrates the creation of a robust, 3-tier data lake using AWS resources.
mihirkudale/Stock-Market-Real-Time-Data-Engineering-Project
In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.
productiveAnalytics/aws-cdk-constructs-sandbox
Cloud Development Kit (AWS CDK) using TypeScript, Python and Java
subhamay-cloudworks/0053-bluebonnets-cft
Working with Glue Data Catalog and running the using S3 Event Notification and creating the entire stack using AWS CloudFormation
sumanthmalipeddi/spotify_trending_telugu
Collecting the list of songs,album and artists list details from the Spotify Music Application in specific intervals using spotipy API and performing ETL Operations using Amazon Cloud Services
TravelXML/KAFKA-PYTHON-AWS-CRAWLER-AMAZON-ATHENA
A comprehensive tutorials / steps / scripts for setting up Apache Kafka on an Amazon EC2 instance, streaming logs to S3, and querying data with AWS Glue and Amazon Athena. Includes Zookeeper configuration, producer and consumer setup, and automated data catalog creation