eddyzhow's Stars
tesseract-ocr/tesseract
Tesseract Open Source OCR Engine (main repository)
apache/spark
Apache Spark - A unified analytics engine for large-scale data processing
mingrammer/diagrams
:art: Diagram as Code for prototyping cloud system architectures
getredash/redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
EthicalML/awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
twintproject/twint
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
aio-libs/aiohttp
Asynchronous HTTP client/server framework for asyncio and Python
cayleygraph/cayley
An open-source graph database
codelucas/newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
quickwit-oss/tantivy
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
postgresml/postgresml
Postgres with GPUs for ML/AI apps.
vespa-engine/vespa
AI + Data, online. https://vespa.ai
Azure/azure-sdk-for-python
This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
ckan/ckan
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
amundsen-io/amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
pyinfra-dev/pyinfra
pyinfra turns Python code into shell commands and runs them on your servers. Execute ad-hoc commands and write declarative operations. Target SSH servers, local machine and Docker containers. Fast and scales from one server to thousands.
adilkhash/Data-Engineering-HowTo
A list of useful resources to learn Data Engineering from scratch
indradb/indradb
A graph database written in rust
gunnarmorling/awesome-opensource-data-engineering
An Awesome List of Open-Source Data Engineering Projects
strapdata/elassandra
Elassandra = Elasticsearch + Apache Cassandra
jodal/pykka
🌀 Pykka makes it easier to build concurrent Python applications.
mateusz-brainhub/awesome-cto-resources
:bulb: A community-curated list of awesome resources to help you grow as a CTO
Machine-Learning-Tokyo/papers-with-annotations
Research papers with annotations, illustrations and explanations
Hydrospheredata/mist
Serverless proxy for Spark cluster
thespianpy/Thespian
Python Actor concurrency library
Fitblip/wsstat
Websocket stress testing made beautiful
MLBazaar/BTB
A simple, extensible library for developing AutoML systems
o19s/hello-ltr
Set of Jupyter notebooks demonstrating Learning to Rank integrated with Solr and Elasticsearch
MLBazaar/MLBlocks
A library for composing end-to-end tunable machine learning pipelines.
arsenvlad/docker-presto-adls-wasb
Example of a single node Presto with Azure Data Lake Store (ADLS) and Azure Storage Blob (WASB) access via Hive metastore