structured-streaming
There are 93 repositories under structured-streaming topic.
lw-lin/CoolplaySpark
酷玩 Spark: Spark 源代码解析、Spark 类库等
databricks/LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
japila-books/spark-structured-streaming-internals
The Internals of Spark Structured Streaming
Azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
polomarcus/Spark-Structured-Streaming-Examples
Spark Structured Streaming / Kafka / Cassandra / Elastic
qubole/kinesis-sql
Kinesis Connector for Structured Streaming
streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
chermenin/spark-states
Custom state store providers for Apache Spark
radoslawkrolikowski/financial-market-data-analysis
Real-Time Financial Market Data Processing and Prediction application
IBM/kafka-streaming-click-analysis
Use Kafka and Apache Spark streaming to perform click stream analytics
astrolabsoftware/fink-broker
Astronomy Broker based on Apache Spark
zaleslaw/Spark-Tutorial
How to build your first Spark application with MLlib, StructuredStreaming, GraphFrames, Datasets and so on? Answer is here!
Klarrio/open-stream-processing-benchmark
This repository contains the code base for the Open Stream Processing Benchmark.
HeartSaVioR/spark-sql-kafka-offset-committer
Kafka offset committer for structured streaming query
sankamuk/PysparkCheatsheet
PySpark Cheatsheet
HeartSaVioR/spark-state-tools
Spark Structured Streaming State Tools
AndrewKuzmin/spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.5.1
aamend/spark-gdelt
Binding the GDELT universe in a Spark environment
mozilla/telemetry-streaming
Spark Streaming ETL jobs for Mozilla Telemetry
sev7e0/wow-spark
:high_brightness: spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。
qubole/s3-sqs-connector
A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).
qubole/streaminglens
Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
qubole/spark-state-store
Rocksdb state storage implementation for Structured Streaming.
zekeriyyaa/PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra
A structured streaming was applied to the robot data from ROS-Gazebo simulation environment using Apache Spark. Data is collected in Kafka, analyzed by Apache Spark and stored in Cassandra.
Neuw84/structured-streaming-avro-demo
Spark 3.0.0 Structured Streaming Kafka Avro Demo
xiaogp/recsys_structured_streaming
kafka + structured streaming + phoenix + elasticsearch 基于行为日志实现热门推荐,用户偏好推荐,召回融合策略实现。
aws-samples/iceberg-streaming-examples
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
NashTech-Labs/structured-streaming-application
Structured Streaming is a reference application showing how to easily integrate structured streaming Apache Spark Structured Streaming, Apache Cassandra and Apache Kafka for fast, structured streaming computations on data.
yjshen/spark-connector-test
A tutorial on how to use pulsar-spark-connector
awslabs/aws-cloudwatch-metrics-custom-spark-listener
Example Spark streaming sample codes with Custom Listeners to push streaming metrics into Amazon CloudWatch metrics
epishova/Structured-Streaming-Cassandra-Sink
An example of how to create and use Cassandra sink in Spark Structured Streaming application
Rishav273/kafkaPysparkAnalytics
Real-time ETL pipeline for financial data (kafka, pyspark) .
chenyyyang/spark-sql-custom-mq-dataSource
基于Spark 3.1.x 数据源API实现的MQ数据源示例代码
cynthia1wang/jdbcsink
This test program uses structured streaming of spark, receive message that is json type from kafka, then get counts of DNS for every people In one minute. (I think one source Ip addr is one people). Finally, the result will insert to mysql database.
TrainingByPackt/Big-Data-Processing-with-Apache-Spark-eLearning
Efficiently tackle large datasets and perform big data analysis with Spark and Python
thestyleofme/spark-explore
spark生态学习