kkfaisal/around-dataengineering

A Data Engineering & Machine Learning Knowledge Hub

A very Long never ending Learning around Data Engineering & Machine Learning

Weekly Digest

The Data Engineering

Level 0

Level 1

Gyaan

Infrastructure

Machine Learning

MLOPS

Project

Insightful

Paper

Distributed System

Crazy

The Snowflake Paper - Core idea is to build an enterprise-ready #datawarehouse solution for the #cloud 🎉📰📕
Most important points around Distributed #dataengineering Platform
Fundamental of #distributedsystems Scaling - Avoiding Co-ordination 🎊♨️🔆
Technical Debt in #dataengineering #softwareengineering 🔕💡🔕
Paper on Wander Join: Online Aggregation via Random Walks 📃💭📑 Join problem
The Delta Lake Paper - High-Performance ACID Table Storage 📋💡📋
Dynamo - AWS Highly Available Key-value Store #distributedsystem 💬💡🎉
An Efficient and Syntactically Idiomatic Approach to Management of Streams and Tables, A Single SQL for all 💡📩📩
Secure & Robust Machine Learning in #healthcare 💊🧪🥳
Progress in Medical Science using #deeplearning 💊💡💉
The Amazon Redshift Paper - A fast, fully managed, petabyte-scale data warehouse solution that makes it simple and cost-effective to efficiently analyze large volumes of data using existing #businessintelligence tools 📂📰💭
Advancing #drugdiscovery via Artificial Intelligence 💊🏥🏥
Apache Calcite is a dynamic data management framework 🎉📚🎉
Lakehouse - A Paper on new Generation of #datawarehouse technology 💡🔎💡
Calvin: Fast Distributed Transactions for Partitioned Database Systems 📝📝
Presto or Trino - #SQL on Everything ( The Design, Motivation & Performance) #presto 💭🎊💡
Design - Exactly Once Delivery & Transactional Messaging in Apache Kafka
Apache Kafka Paper : Distributed Messaging System for Log Processing
Paper: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size
Paper: Ground is an open-source data context service, a system to manage all the information that informs the use of data
Azure Data Lake Store(ADLS) is a fully-managed, elastic, scalable, and secure file system that supports #hadoop distributed file system (HDFS) and Cosmos semantics
An LFU (Least Frequently Used) Cache eviction algorithm of O(1) Runtime complexity

NA

Cloud