pyarrow
There are 55 repositories under pyarrow topic.
vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
ibis-project/ibis
the portable Python dataframe library
uber/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
narwhals-dev/narwhals
Lightweight and extensible compatibility layer between dataframe libraries!
wheretrue/biobear
Work with bioinformatic files using Arrow, Polars, and/or DuckDB
dacort/faker-cli
Command-line interface to quickly generate fake CSV and JSON data
RandomFractals/chicago-crimes
Exploring Chicago crimes dataset with Jupyter notebooks, DuckDB, Malloy and new Panel/PyScript data and dashboard tools.
icaropires/pdf2dataset
Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features
kraina-ai/overturemaestro
An open-source tool for reading OvertureMaps data with multiprocessing and additional Quality-of-Life features
ismailhammounou/db2ixf
db2ixf is a python package with a CLI that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files.
milesgranger/flaco
(PoC) A very memory-efficient way to read data from PostgreSQL
zen-xu/pyarrow-stubs
Type annotations for pyarrow
gizmodata/gizmosql
A Flight SQL Server implementation - with DuckDB and SQLite back-ends.
vipinc007/ParquetViewer
A web application for viewing Apache Parquet files . This is a Python + Flask application
legout/pydala
Poor mans simple python api for creating a local or remote datalake based on several (pyarrow) datasets using duckdb
SaelKimberly/rxls
Reading both XLSX and XLSB files, fast and memory-safe, with Python, into PyArrow
DanielAvdar/pandas-pyarrow
Seamlessly switch Pandas DataFrame backend to PyArrow.
asierra01/pyarrow_to_db2
ibm_db extension to load a pyarrow table to db2
jaysnm/dremio-arrow
Dremio Arrow Flight Client
lykmapipo/Python-Spark-Log-Analysis
Python scripts to process, and analyze log files using PySpark.
mercator-labs/oakstore
highspeed timeseries pandas dataframe database
xbrianh/xdlake
A loose implementation of the deltalake protocol, written in Python on top of pyarrow, focused on extensibility, customizability, and distributed data.
dr-saad-la/Pyarrow-Tuts
Pyarrow Tutorials
kiwi0fruit/featherhelper
Concise interface to cache numpy arrays and pandas dataframes
legout/pydala2
poor man´s data lake - Simple api to efficiently query your parquet datasets using Duckdb or polars
lykmapipo/NYC-TLC-Trip-Data
Python scripts to download, process, and analyze the New York City Taxi and Limousine Commission (TLC) Trip Record Data dataset
psmyth94/biosets
A bioinformatics extension of 🤗 Datasets library, built for ML applications on biological and omics data, offering easy integration of metadata and low-code data management tools.
thread53/pqviewer
View Apache Parquet Files In Your Terminal
adavis444/pyarrow-alpine-wheel
Dockerfile and Python 3.9 wheel for PyArrow 3.0.0 built on Alpine 3.14 (does not include Plasma or Parquet)
BenyaminZojaji/mongodb_tutorial
MongoDB tutorial repository
HuangRicky/manylinux2014builds
manylinux2014 Python pkg builds
k3ssdev/ParquetScriptTools
Colección de scripts en Python con PyArrow y Pandas para facilitar el manejo eficiente de archivos Parquet. Incluye herramientas para visualizar esquemas, convertir a CSV, verificar duplicados y fusionar archivos Parquet.
miraisolutions/apache-arrow-flight-python-example
Code examples / snippets for website news post
namansnghl/SQLify
Text to SQL Semantic Parser with LLMs
stefur/swemaps
Maps of Sweden in GeoParquet
svjack/PyArrowExpressionCastToolkit
A small cast tookit class drived from _ParquetDatasetV2 to support cast in filters argument