odedfos's Stars
ekzhu/datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
unitycatalog/unitycatalog
Open, Multi-modal Catalog for Data & AI
raimon49/pip-licenses
Dump the license list of packages installed with pip.
spulec/freezegun
Let your Python tests travel through time
streamlit/streamlit
Streamlit — A faster way to build and share data apps.
Kanaries/pygwalker
PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis
glic3rinu/passlib
Automatically exported from code.google.com/p/passlib
minio/minio
MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license.
Sinaptik-AI/pandas-ai
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
databrickslabs/tempo
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
realpython/materials
Bonus materials, exercises, and example projects for our Python tutorials
YotpoLtd/metorikku
A simplified, lightweight ETL Framework based on Apache Spark
awesome-spark/awesome-spark
A curated list of awesome Apache Spark packages and resources.
cluster-apps-on-docker/spark-standalone-cluster-on-docker
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. :zap:
grantjenks/python-sortedcontainers
Python Sorted Container Types: Sorted List, Sorted Dict, and Sorted Set
awslabs/deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
debezium/debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
minrk/findspark
Studio3T/robomongo
Native cross-platform MongoDB management tool
svenkreiss/pysparkling
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
databricks/LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
apache/superset
Apache Superset is a Data Visualization and Data Exploration Platform
getredash/redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
apache/spark
Apache Spark - A unified analytics engine for large-scale data processing
grpc-ecosystem/awesome-grpc
A curated list of useful resources for gRPC
ashwin711/georaptor
Python Geohash Compression Tool
python-pendulum/pendulum
Python datetimes made easy
kuchin/awesome-cto
A curated and opinionated list of resources for Chief Technology Officers, with the emphasis on startups
microsoft/vscode
Visual Studio Code
MacPass/MacPass
A native macOS KeePass client