bartosz25
Freelance Data Engineer // Apache Spark enthusiast, cloud user, Scala & Python. Share everything on www.waitingorcode.com blog.
Freelance Data Engineerremote
Pinned Repositories
acid-file-formats
Code for Apache Hudi, Apache Iceberg and Delta Lake analysis
beam-learning
data-ai-summit-2024
Visits sessionization pipeline used for the talk
data-engineering-design-patterns-book
Code snippets for Data Engineering Design Patterns book
data-generator
User web sessions data generator written in Python, for Kafka, Kinesis or local file system sinks
data-generator-blogging-platform
sessionization-demo
spark-docker
Repository containing Docker images for Spark master and slave
spark-playground
Code snippets used in demos recorded for the blog.
spark-scala-playground
Sample processing code using Spark 2.1+ and Scala
bartosz25's Repositories
bartosz25/spark-scala-playground
Sample processing code using Spark 2.1+ and Scala
bartosz25/data-engineering-design-patterns-book
Code snippets for Data Engineering Design Patterns book
bartosz25/spark-playground
Code snippets used in demos recorded for the blog.
bartosz25/beam-learning
bartosz25/spark-docker
Repository containing Docker images for Spark master and slave
bartosz25/data-ai-summit-2024
Visits sessionization pipeline used for the talk
bartosz25/acid-file-formats
Code for Apache Hudi, Apache Iceberg and Delta Lake analysis
bartosz25/data-generator
User web sessions data generator written in Python, for Kafka, Kinesis or local file system sinks
bartosz25/sessionization-demo
bartosz25/scala-learn
Some learning test showing Scala features, such as implicits, guards, pattern matching and more more others.
bartosz25/bigdata-sandbox
Tests of some tools which can be used on Big Data projects: ZooKeeper, Kafka, Cassandra, Spark and so on.
bartosz25/kafka-playground
bartosz25/data-generator-blogging-platform
bartosz25/data-ai-summit-2020
You will find here the demo codes for my Data+AI 2020 talk about customizing Apache Spark state store.
bartosz25/graphx-cytoscape-jetty-websockets-data-visualization
Real-time data visualization with Apache Spark GraphX, Cytoscape.js, Jetty and websockets
bartosz25/bartosz25
bartosz25/delta-lake-playground
bartosz25/stream-the-word-workshop-ndc-porto-2024
Code snippets for the NDC Porto 2024 2-hours workshop on stream processing with Apache Spark Structured Streaming and Apache Flink
bartosz25/airflow-playground
bartosz25/case-study
bartosz25/delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
bartosz25/elasticsearch-playground
bartosz25/flink-playground
bartosz25/idempotency-ndc-porto-2024
Demo for idempotency examples presented as part of my talk at NDC Porto 2024: https://ndcporto.com/agenda/embrace-the-failure-stay-idempotent-0v69/06fl3582ypx
bartosz25/paris.py-cerberus-pyspark-talk
Demo code for my talk about Cerberus integration with PySpark
bartosz25/pulsar-playground
bartosz25/python-playground
bartosz25/recipes-fork
The Immerok Apache Flink Cookbook is a collection of examples of Apache Flink applications in the format of "recipes". Each recipe explains how you can solve a specific problem by leveraging one or more of the APIs of Apache Flink. The recipes can be extended or provide a basis for solving your requirements with Apache Flink.
bartosz25/structured-streaming-for-batch-workshop
bartosz25/wfc-playground