Pinned Repositories
arrow
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
iceberg-python
Apache PyIceberg
data-diff
Compare tables within or across databases
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
dbt-duckdb
dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
duckdb
DuckDB is an analytical in-process SQL database management system
mongo-arrow
MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.
polars
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
astrospark
An MSU astrophysics data mining and engineering project
sergun's Repositories
sergun/astrospark
An MSU astrophysics data mining and engineering project