odedfos

odedfos's Stars

ekzhu/datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Language:Python2.6k296
unitycatalog/unitycatalog
Open, Multi-modal Catalog for Data & AI
Language:Java2.6k423
raimon49/pip-licenses
Dump the license list of packages installed with pip.
Language:Python33049
spulec/freezegun
Let your Python tests travel through time
Language:Python4.2k272
streamlit/streamlit
Streamlit — A faster way to build and share data apps.
Language:Python36.7k3.2k
Kanaries/pygwalker
PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis
Language:Python13.7k718
glic3rinu/passlib
Automatically exported from code.google.com/p/passlib
Language:Python153
minio/minio
MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license.
Language:Go49.5k5.6k
Sinaptik-AI/pandas-ai
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
Language:Python14k1.4k
databrickslabs/tempo
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
Language:Jupyter Notebook31753
realpython/materials
Bonus materials, exercises, and example projects for our Python tutorials
Language:HTML4.8k5.3k
YotpoLtd/metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Language:Scala585155
awesome-spark/awesome-spark
A curated list of awesome Apache Spark packages and resources.
Language:Shell1.7k333
cluster-apps-on-docker/spark-standalone-cluster-on-docker
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. :zap:
Language:Jupyter Notebook473191
grantjenks/python-sortedcontainers
Python Sorted Container Types: Sorted List, Sorted Dict, and Sorted Set
Language:Python3.6k205
awslabs/deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Language:Scala3.3k544
debezium/debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
Language:Java11k2.6k
minrk/findspark
Language:Python51472
Studio3T/robomongo
Native cross-platform MongoDB management tool
Language:C++9.3k801
svenkreiss/pysparkling
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Language:Python26845
databricks/LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Language:Scala1.2k744
apache/superset
Apache Superset is a Data Visualization and Data Exploration Platform
Language:TypeScript63.8k14.3k
getredash/redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Language:Python26.8k4.4k
apache/spark
Apache Spark - A unified analytics engine for large-scale data processing
Language:Scala40.3k28.4k
grpc-ecosystem/awesome-grpc
A curated list of useful resources for gRPC
7.7k579
ashwin711/georaptor
Python Geohash Compression Tool
Language:Python18816
python-pendulum/pendulum
Python datetimes made easy
Language:Python6.3k388
kuchin/awesome-cto
A curated and opinionated list of resources for Chief Technology Officers, with the emphasis on startups
26.2k1.6k
microsoft/vscode
Visual Studio Code
Language:TypeScript166k30.1k
MacPass/MacPass
A native macOS KeePass client
Language:Objective-C6.8k462