sutugin
Software engineer with more than 10 years programming experience. Interested in data science and high load solutions. All the time trying to learn something new
Pinned Repositories
aerodrop
REST and Memcache proxy for aerospike
connectors
Connectors for Delta Lake
data-generator
User web sessions data generator written in Python, for Kafka, Kinesis or local file system sinks
shc
The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
spark-streaming-jdbc-source
sutugin's Repositories
sutugin/spark-streaming-jdbc-source
sutugin/connectors
Connectors for Delta Lake
sutugin/data-generator
User web sessions data generator written in Python, for Kafka, Kinesis or local file system sinks
sutugin/data-model-generator
Data model generator based on Scala case classes
sutugin/deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
sutugin/delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
sutugin/deordie-meetups
DE or DIE meetup made by data engineers for data engineers. Currently in Russian.
sutugin/metorikku
A simplified, lightweight ETL Framework based on Apache Spark
sutugin/odsc-west-streaming-trends
All Data, Relevant Information, Scripts, and Applications for the Open Data Science Conference (2018)
sutugin/ru-neophyte-guide-to-scala
Перевод на русский серии статей Daniel Westheide "The Neophyte's Guide to Scala"
sutugin/sbt-common-settings
collections of common plugins and settings for sbt
sutugin/scala-best-practices
A collection of Scala best practices
sutugin/scala-exercises
The easy way to learn Scala.
sutugin/scalacaster
Purely Functional Algorithms and Data Structures in Scala
sutugin/smartdata-fp-spark
sutugin/sope
Apache Spark ETL Utilities
sutugin/spark-clickhouse-plugin
The most intuitive Spark Plugin for interacting with Clickhouse
sutugin/spark-docker
Official Dockerfile for Apache Spark
sutugin/spark-http-streaming
Running Apache Spark Structured Streaming job on the local machine with an HTTP web server as a streaming source.
sutugin/spark-partition-sizing
Sizing partitions in Spark
sutugin/spark-platform
Basic Spark utilities
sutugin/spark-scala-examples
This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
sutugin/spark-scala-playground
Sample processing code using Spark 2.1+ and Scala
sutugin/spark-schema-registry
Schema Registry integration for Apache Spark
sutugin/spark-sql-kafka-offset-committer
Kafka offset committer for structured streaming query
sutugin/spark-structured-streaming-jdbc-sink
Spark Structured Streaming JDBC Sink
sutugin/spark-utils
Basic framework utilities to quickly start writing production ready Apache Spark applications
sutugin/spark_easy_datalake
sutugin/sparkMeasure
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
sutugin/waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.