IndIgo71's Stars
microsoft/generative-ai-for-beginners
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
ValdikSS/GoodbyeDPI
GoodbyeDPI — Deep Packet Inspection circumvention utility (for Windows)
DataExpert-io/data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
Yorko/mlcourse.ai
Open Machine Learning Course
bol-van/zapret
DPI bypass multi platform
igorbarinov/awesome-data-engineering
A curated list of data engineering tools for software developers
CodedotAl/gpt-code-clippy
Full description can be found here: https://discuss.huggingface.co/t/pretrain-gpt-neo-for-open-source-github-copilot-model/7678?u=ncoop57
dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
danielbeach/data-engineering-practice
Data Engineering Practice Problems
AlexIoannides/pyspark-example-project
Implementing best practices for PySpark ETL jobs and applications.
HariSekhon/Dockerfiles
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak
datacontract/datacontract-cli
CLI to manage your datacontract.yaml files
bitol-io/open-data-contract-standard
Home of the Open Data Contract Standard (ODCS).
stefmolin/pandas-workshop
An introductory workshop on pandas with notebooks and exercises for following along. Slides contain all solutions.
dataflint/spark
Performance Observability for Apache Spark
dmitryburov/algorithm-practice
Алгоритмы и структуры данных. Собираю задачи из Яндекса, Тинькофф, CodeRun, LeetCode, Codewars и др.
crescentpartha/CheatSheets-for-Developers
A collection of programming CheatSheets for developers to boost your productivity and quick review to remember while working.
QuantumFluxx/karpov_courses
🐳 Проектная деятельность. Здесь хранятся лекции, практические задания и проекты с karpov_courses. Ссылка: https://karpov.courses/
dqops/dqo
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.
osalvador/tePLSQL
PL/SQL Template engine
josephmachado/docker_for_data_engineers
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
LexxaRRioo/rzv_data_engineering_series_s01e01
Open episode of the data engineering practice course
ongxuanhong/de02-pyspark-optimization
carteakey/data-pipeline-compose
Docker Compose for big data processing using Hadoop, Hive, PySpark, Spark, Jupyter, and Airflow.
ayyoubmaul/dag-factory
SA01/spark-read-jdbc-tutorial
This repository contains the code and examples for my article on Medium, which explains how to parallelize reading data from JDBC sources in Apache Spark.
SA01/docker-spark-cluster
A simple spark standalone cluster for your testing environment purposses
k0rsakov/debezium_cdc_example
Пример создания CDC через Debezium
sensiarion/async_arch_course
Учебный проект для курса "Асинхронная архитектура"