parquet
There are 444 repositories under parquet topic.
multiprocessio/dsq
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
apache/parquet-java
Apache Parquet
jqnatividad/qsv
CSVs sliced, diced & analyzed.
apache/drill
Apache Drill is a distributed MPP query layer for self describing data
uber/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
gchq/Gaffer
A large-scale entity and relation database supporting aggregation of properties
apache/parquet-format
Apache Parquet
rilldata/rill
Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
quiltdata/quilt
Quilt is a data mesh for connecting people with actionable data
paradigmxyz/cryo
cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes
bigdatagenomics/adam
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Cinchoo/ChoETL
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
HariSekhon/DevOps-Python-tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
mukunku/ParquetViewer
Simple windows desktop application for viewing & querying Apache Parquet files
DerwenAI/kglab
Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
ranaroussi/pystore
Fast data store for Pandas time-series data
RandomFractals/vscode-data-preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
kylebarron/parquet-wasm
Rust-based WebAssembly bindings to read and write Apache Parquet data
developmentseed/lonboard
A Python library for fast, interactive geospatial vector data visualization in Jupyter.
Netflix/iceberg
Iceberg is a table format for large, slow-moving tabular data
apache/parquet-cpp
Apache Parquet
skale-me/skale
High performance distributed data processing engine
moshe/elasticsearch_loader
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
jorgecarleitao/parquet2
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
ironSource/parquetjs
fully asynchronous, pure JavaScript implementation of the Parquet file format
segmentio/parquet-go
Go library to read/write Parquet files
spotify/ratatool
A tool for data sampling, data generation, and data diffing
sksamuel/centurion
Kotlin Bigdata Toolkit
Eugene-Mark/bigdata-file-viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
fraugster/parquet-go
Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
mjakubowski84/parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
cldellow/sqlite-parquet-vtable
A SQLite vtable extension to read Parquet files
manojkarthick/pqrs
Command line tool for inspecting Parquet files
awslabs/amazon-s3-find-and-forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)