datasherlock's Stars
public-apis/public-apis
A collective list of free APIs
DataExpert-io/data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
jtleek/datasharing
The Leek group guide to data sharing
hackclub/putting-the-you-in-cpu
A technical explainer by @kognise of how your computer runs programs, from start to finish.
swirldev/swirl_courses
:mortar_board: A collection of interactive courses for the swirl R package.
databricks/Spark-The-Definitive-Guide
Spark: The Definitive Guide's Code Repository
AlexIoannides/pyspark-example-project
Implementing best practices for PySpark ETL jobs and applications.
ProgrammingHero1/romantic-alexa
jcw024/lichess_database_ETL
pipeline for migrating lichess data into postgresql
aws-samples/sql-extractor-from-ssis-packages
aws-samples/cloudformation-handle-dynamic-reference-in-awsglue-streaming
This blog explains a solution architecture to handle fast changing reference data stored in DynamoDB through an AWS Glue Streaming job
datasherlock/spark-config-calculator
The Spark Configuration Tool is a Streamlit-based application designed to assist users in optimizing Apache Spark configurations. It allows users to input various parameters related to cluster, node, and executor configurations, providing recommended Spark configurations based on those inputs.