kylepierce's Stars
Velir/dbt-ga4
dbt Package for modeling raw data exported by Google Analytics 4. BigQuery support, only.
googleapis/google-cloud-python
Google Cloud Client Library for Python
pathwaycom/llm-app
LLM App templates for Dynamic RAG. Ready to run with Docker,β‘in sync with your data sources.
mage-ai/mage-ai
π§ Build, run, and manage data pipelines for integrating and transforming data.
LewisCharlesBaker/droughty
Droughty helps keep your workflow dry
holistics/dbml
Database Markup Language (DBML), designed to define and document database structures
fraibacas/prefect-orion
trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
httpie/cli
π₯§ HTTPie CLI β modern, user-friendly command-line HTTP client for the API era. JSON support, colors, sessions, downloads, plugins & more.
mermaid-js/mermaid
Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown
dgilland/pydash
The kitchen sink of Python utility libraries for doing "stuff" in a functional way. Based on the Lo-Dash Javascript library.
capitalone/DataProfiler
What's in your data? Extract schema, statistics and entities from datasets
macbre/sql-metadata
Uses tokenized query returned by python-sqlparse and generates query metadata
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
alex/nyt-2020-election-scraper
localstack/localstack
π» A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline
datastacktv/data-engineer-roadmap
Roadmap to becoming a data engineer in 2021
nteract/papermill
π Parameterize, execute, and analyze notebooks
amundsen-io/amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
simdjson/simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
spotify/luigi
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
jbesomi/texthero
Text preprocessing, representation and visualization from zero to hero.
cube-js/cube
π Cube β The Semantic Layer for Building Data Applications
philipperemy/name-dataset
The Python library for names.
ricklamers/gridstudio
Grid studio is a web-based application for data science with full integration of open source data science frameworks and languages.
pdpipe/pdpipe
Easy pipelines for pandas DataFrames.
MassMove/AttackVectors
A repository to monitor attack vectors from state-backed information operations
python-streamz/streamz
Real-time stream processing for python
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.