cvd
Passionate 2 things - Modern Data Stack and Pizza. Come work @BlueOrangeDigital on both! We are Hiring!
CTO @ Blue Orange DigitalWashington, DC
cvd's Stars
Snowflake-Labs/terraform-provider-snowflake
Terraform provider for managing Snowflake accounts
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
alirezamika/autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
atlanhq/camelot
Camelot: PDF Table Extraction for Humans
PrefectHQ/legacy-ui
The home of the Prefect 1 UI
PrefectHQ/server
The Prefect API and backend
HPI-Information-Systems/Metanome
The source repository of the Metanome tool
JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
super-linter/super-linter
Combination of multiple linters to run as a GitHub Action or standalone
mingrammer/diagrams
:art: Diagram as Code for prototyping cloud system architectures
xuebinqin/U-2-Net
The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."
apache/arrow
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
great-expectations/great_expectations
Always know what to expect from your data.
J535D165/recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
dapr/dapr
Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.
online-ml/river
🌊 Online machine learning in Python
hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
explosion/spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
Urigo/SOFA
The best way to create REST APIs - Generate RESTful APIs from your GraphQL Server
bmabey/pyLDAvis
Python library for interactive topic model visualization. Port of the R LDAvis package.
andkret/Cookbook
The Data Engineering Cookbook
dask/dask-cloudprovider
Cloud provider cluster managers for Dask. Supports AWS, Google Cloud Azure and more...
codesandbox/codesandbox-client
An online IDE for rapid web development
donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
krallin/tini
A tiny but valid `init` for containers
dask/dask-ml
Scalable Machine Learning with Dask
dask/dask
Parallel computing with task scheduling
rapidsai/notebooks
RAPIDS Sample Notebooks
rapidsai/dask-cudf
[ARCHIVED] Dask support for distributed GDF object --> Moved to cudf