Pinned Repositories
datafusion-comet
Apache DataFusion Comet Spark Accelerator
incubator-graphar
An open source, standard data file format for graph data storage and retrieval.
tsumugi-spark
SparkConnect Server plugin and protobuf messages for the Amazon Deequ Data Quality Engine.
Binning
Monotonic binning (WOE) in Python
chispa
PySpark test helper methods with beautiful error messages
feature-generation-benchmark
A database-like benchmark of feature generation from time-series data
flake8-pyspark-with-column
A flake8 plugin that detects of usage withColumn in a loop or inside reduce
PennylaneQuantumFeatureMaps
spark-connect-example
An example of SparkConnect extension.
zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
SemyonSinchenko's Repositories
SemyonSinchenko/tsumugi-spark
SparkConnect Server plugin and protobuf messages for the Amazon Deequ Data Quality Engine.
SemyonSinchenko/flake8-pyspark-with-column
A flake8 plugin that detects of usage withColumn in a loop or inside reduce
SemyonSinchenko/feature-generation-benchmark
A database-like benchmark of feature generation from time-series data
SemyonSinchenko/spark-connect-example
An example of SparkConnect extension.
SemyonSinchenko/PennylaneQuantumFeatureMaps
SemyonSinchenko/chispa
PySpark test helper methods with beautiful error messages
SemyonSinchenko/python-deequ
Python API for Deequ
SemyonSinchenko/codecrafters-http-server-rust
SemyonSinchenko/eren
PySpark Hive helper methods
SemyonSinchenko/farsante
Fake Pandas / PySpark DataFrame creator
SemyonSinchenko/GraphAr
An open source, standard data file format for graph data storage and retrieval
SemyonSinchenko/LeetCode
SemyonSinchenko/spark
Apache Spark - A unified analytics engine for large-scale data processing
SemyonSinchenko/zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
SemyonSinchenko/datafusion-comet
Apache DataFusion Comet Spark Accelerator
SemyonSinchenko/datahobbit
A Rust based data/CSV/Parquet file generator
SemyonSinchenko/delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
SemyonSinchenko/falsa
SemyonSinchenko/gex
Git Explorer: cross-platform git workflow improvement tool inspired by Magit
SemyonSinchenko/incubator-graphar-website
Apache GraphAr Website
SemyonSinchenko/incubator-hugegraph
A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)
SemyonSinchenko/mack
Delta Lake helper methods in PySpark
SemyonSinchenko/pyspark-ai
English SDK for Apache Spark
SemyonSinchenko/qmlcourse
Quantum Machine Learning Community Course
SemyonSinchenko/RandomRepoName
SemyonSinchenko/SemyonSinchenko
SemyonSinchenko/spark-fast-tests
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
SemyonSinchenko/ssinchenko
Personal Blog. Powered by Hugo.
SemyonSinchenko/tinkerpop
Apache TinkerPop - a graph computing framework
SemyonSinchenko/unitycatalog
Open, Multi-modal Catalog for Data & AI