Hungsiro506's Stars
NashTech-Labs/spark-graphx-twitter
An example of Spark and GraphX with Twitter as sample
mvogiatzis/probabilistic-counting
Distributed Probabilistic Counting on ClickStream data
jadianes/kdd-cup-99-spark
PySpark solution to the KDDCup99
kiwenlau/hadoop-cluster-docker
Run Hadoop Custer within Docker Containers
brscrt/Vert.xKafkaConsumer
keiraqz/anomaly-detection
Anomaly Detection model uses Spark for training and Spark Streaming for testing
mvogiatzis/spark-anomaly-detection
Detecting outliers in a dataset using Spark
NashTech-Labs/activator-kafka-spark-streaming.g8
This is an activator project for showcasing integration of Kafka 0.10 with Spark Streaming.
krasserm/akka-analytics
Large-scale event processing with Akka Persistence and Apache Spark
jaceklaskowski/spark-activator
Spark Streaming with Scala and Akka Activator template
eligosource/eventsourced
A library for building reliable, scalable and distributed event-sourced applications in Scala
scalanlp/breeze
Breeze is/was a numerical processing library for Scala.
NashTech-Labs/scala-design-patterns
Scala Design Patterns
joandre/MCL_spark
An implementation of Markov Clustering algorithm for Spark in Scala
killrweather/killrweather
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.
JerryLead/SparkInternals
Notes talking about the design and implementation of Apache Spark
linkedin/kafka-monitor
Xinfra Monitor monitors the availability of Kafka clusters by producing synthetic workloads using end-to-end pipelines to obtain derived vital statistics - E2E latency, service produce/consume availability, offsets commit availability & latency, message loss rate and more.
yahoo/CMAK
CMAK is a tool for managing Apache Kafka clusters
twitter/algebird
Abstract Algebra for Scala
microsoft/PowerBI-Node
Node SDK and client library for Power BI REST APIs.
twitter/scalding
A Scala API for Cascading
datastax/spark-cassandra-connector
DataStax Connector for Apache Spark to Apache Cassandra
skrusche63/spark-elastic
This project combines Apache Spark and Elasticsearch to enable mining & prediction for Elasticsearch.
donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
nielsutrecht/kafka-serializer-example
Example of how to create your own custom serializers for Kafka queues including JSON, Smile and Kryo
alexbudniy/lambda
Spark Scala Kafka Course
Hurence/logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
tresata/spark-kafka
Low level integration of Spark and Kafka
NashTech-Labs/real-time-stream-processing-engine
This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.
Stratio/sparta
Real Time Analytics and Data Pipelines based on Spark Streaming