parquet

There are 487 repositories under parquet topic.

multiprocessio/dsq
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
Language:Go3.7k 27 58151
roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Language:Rust3.2k 43 156179
apache/parquet-java
Apache Parquet Java
Language:Java2.6k 93 1.6k1.4k
jqnatividad/qsv
CSVs sliced, diced & analyzed.
Language:Rust2.4k 16 49070
apache/drill
Apache Drill is a distributed MPP query layer for self describing data
Language:Java1.9k 153 135979
uber/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Language:Python1.8k 40 308285
gchq/Gaffer
A large-scale entity and relation database supporting aggregation of properties
Language:Java1.8k 138 1.6k351
apache/parquet-format
Apache Parquet Format
Language:Thrift1.8k 66 180427
rilldata/rill
Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
Language:Go1.6k 10 1.7k111
quiltdata/quilt
Quilt is a data mesh for connecting people with actionable data
Language:Jupyter Notebook1.3k 19 12191
paradigmxyz/cryo
cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes
Language:Rust1.1k 10 58102
bigdatagenomics/adam
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Language:Scala996 100 1.2k309
Cinchoo/ChoETL
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Language:C#796 50 283134
HariSekhon/DevOps-Python-tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Language:Python763 41 6340
mukunku/ParquetViewer
Simple Windows desktop application for viewing & querying Apache Parquet files
Language:C#750 12 6590
tonbo-io/tonbo
A portable embedded database using Arrow.
Language:Rust619 13 4241
DerwenAI/kglab
Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
Language:Jupyter Notebook574 20 8565
developmentseed/lonboard
A Python library for fast, interactive geospatial vector data visualization in Jupyter.
Language:Python574 11 17828
ranaroussi/pystore
Fast data store for Pandas time-series data
Language:Python556 37 6099
RandomFractals/vscode-data-preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Language:TypeScript550 13 31859
kylebarron/parquet-wasm
Rust-based WebAssembly bindings to read and write Apache Parquet data
Language:Rust510 7 11219
Netflix/iceberg
Iceberg is a table format for large, slow-moving tabular data
Language:Java476 349 6859
apache/parquet-cpp
Apache Parquet
442 49 0193
skale-me/skale
High performance distributed data processing engine
Language:JavaScript399 22 3753
moshe/elasticsearch_loader
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Language:Python398 22 5483
jorgecarleitao/parquet2
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Language:Rust352 13 5659
ironSource/parquetjs
fully asynchronous, pure JavaScript implementation of the Parquet file format
Language:JavaScript347 15 103175
segmentio/parquet-go
Go library to read/write Parquet files
Language:Go341 11 164104
spotify/ratatool
A tool for data sampling, data generation, and data diffing
Language:Scala341 27 9555
julien040/anyquery
Query anything (JSON, CSV, GitHub, Notion, Airtable, GSheets, emails, etc.) with SQL
Language:Go338 2 1010
sksamuel/centurion
Kotlin Bigdata Toolkit
Language:Kotlin327 22 446
grai-io/grai-core
Language:Python289 2 2320
Eugene-Mark/bigdata-file-viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Language:Java287 3 3054
fraugster/parquet-go
Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Language:Go287 11 5253
manojkarthick/pqrs
Command line tool for inspecting Parquet files
Language:Rust283 6 2427
mjakubowski84/parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Language:Scala283 7 18066

parquet

multiprocessio/dsq

roapi/roapi

apache/parquet-java

jqnatividad/qsv

apache/drill

uber/petastorm

gchq/Gaffer

apache/parquet-format

rilldata/rill

quiltdata/quilt

paradigmxyz/cryo

bigdatagenomics/adam

Cinchoo/ChoETL

HariSekhon/DevOps-Python-tools

mukunku/ParquetViewer

tonbo-io/tonbo

DerwenAI/kglab

developmentseed/lonboard

ranaroussi/pystore

RandomFractals/vscode-data-preview

kylebarron/parquet-wasm

Netflix/iceberg

apache/parquet-cpp

skale-me/skale

moshe/elasticsearch_loader

jorgecarleitao/parquet2

ironSource/parquetjs

segmentio/parquet-go

spotify/ratatool

julien040/anyquery

sksamuel/centurion

grai-io/grai-core

Eugene-Mark/bigdata-file-viewer

fraugster/parquet-go

manojkarthick/pqrs

mjakubowski84/parquet4s