Pinned Repositories
airflow_kafka_cassandra_mongodb
Produce Kafka messages, consume them and upload into Cassandra, MongoDB.
aws_end_to_end_streaming_pipeline
An AWS Data Engineering End-to-End Project (Glue, Lambda, Kinesis, Redshift, QuickSight, Athena, EC2, S3)
crypto_api_kafka_airflow_streaming
Get Crypto data from API, stream it to Kafka with Airflow. Write data to MySQL and visualize with Metabase
csv_extract_airflow_docker
Writes the CSV file to Postgres, read table and modify it. Write more tables to Postgres with Airflow.
docker-airflow
Docker Apache Airflow
glue_etl_job_data_catalog_s3
Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog
kafka_spark_structured_streaming
Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra
parquet_gcs_bucket_to_bigquery_table
Parquet files will be obtained regularly from a public GCS bucket. They will be written to BQ table
send_data_to_aws_services
This repo automates the processes when we want to send remote data to AWS services such as Kinesis, S3, etc.
streaming_data_processing
Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
dogukannulu's Repositories
dogukannulu/kafka_spark_structured_streaming
Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra
dogukannulu/streaming_data_processing
Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
dogukannulu/airflow_kafka_cassandra_mongodb
Produce Kafka messages, consume them and upload into Cassandra, MongoDB.
dogukannulu/csv_extract_airflow_docker
Writes the CSV file to Postgres, read table and modify it. Write more tables to Postgres with Airflow.
dogukannulu/crypto_api_kafka_airflow_streaming
Get Crypto data from API, stream it to Kafka with Airflow. Write data to MySQL and visualize with Metabase
dogukannulu/docker-airflow
Docker Apache Airflow
dogukannulu/aws_end_to_end_streaming_pipeline
An AWS Data Engineering End-to-End Project (Glue, Lambda, Kinesis, Redshift, QuickSight, Athena, EC2, S3)
dogukannulu/glue_etl_job_data_catalog_s3
Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog
dogukannulu/parquet_gcs_bucket_to_bigquery_table
Parquet files will be obtained regularly from a public GCS bucket. They will be written to BQ table
dogukannulu/send_data_to_aws_services
This repo automates the processes when we want to send remote data to AWS services such as Kinesis, S3, etc.
dogukannulu/kaggle_projects
In this repository, I created ML algorithms for various Kaggle Competitions
dogukannulu/s3_trigger_lambda_to_rds
Send a dataframe to S3 automatically, trigger Lambda and modify dataframe, upload to RDS
dogukannulu/csv_to_kinesis_streams
This repo will write a CSV file to the Amazon Kinesis Data Streams
dogukannulu/dogukannulu
My personal repo
dogukannulu/amazon_msk_kafka_streaming
Create Kafka topic, stream the data to producer and consume on the console using Amazon MSK
dogukannulu/twitter_etl_s3
Get data via Twitter API, orchestrate with Airflow and store in S3 bucket
dogukannulu/data-generator
This repo is for generating data from existing dataset to a file or producing dataset rows as message to kafka in a streaming manner.
dogukannulu/datasets
This repo contains datasets used in trainings.
dogukannulu/IBM-Data-Science-Capstone-Project
This repository is created for IBM Data Science Professional Certificate Capstone Project
dogukannulu/read_from_s3_upload_to_rds
Upload the remote data into Amazon S3, read the data and upload to Amazon RDS MySQL
dogukannulu/docker-hadoop
dogukannulu/super_lig_streamlit
dogukannulu/prefect-example-flows
Create sample Prefect flows, deploy them as Docker containers and store within GitHub
dogukannulu/snowpipe-aws-stream-processing
Get the streaming data from the S3 bucket with SQS queue. Load into Snowflake with Snowpipe and modify the data with Snowflake task