parquet

There are 444 repositories under parquet topic.

  • multiprocessio/dsq

    Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

    Language:Go3.7k2758151
  • roapi/roapi

    Create full-fledged APIs for slowly moving datasets without writing a single line of code.

    Language:Rust3.1k43150172
  • apache/parquet-java

    Apache Parquet

    Language:Java2.5k9301.4k
  • qsv

    jqnatividad/qsv

    CSVs sliced, diced & analyzed.

    Language:Rust2.3k1442264
  • apache/drill

    Apache Drill is a distributed MPP query layer for self describing data

    Language:Java1.9k155124984
  • uber/petastorm

    Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

    Language:Python1.8k41307281
  • gchq/Gaffer

    A large-scale entity and relation database supporting aggregation of properties

    Language:Java1.7k1401.6k355
  • apache/parquet-format

    Apache Parquet

    Language:Thrift1.7k650421
  • rill

    rilldata/rill

    Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.

    Language:Go1.4k101.6k100
  • quiltdata/quilt

    Quilt is a data mesh for connecting people with actionable data

    Language:Jupyter Notebook1.3k1911892
  • paradigmxyz/cryo

    cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes

    Language:Rust99685887
  • bigdatagenomics/adam

    ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

    Language:Scala9671001.2k304
  • Cinchoo/ChoETL

    ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

    Language:C#74749280134
  • HariSekhon/DevOps-Python-tools

    80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

    Language:Python736426335
  • mukunku/ParquetViewer

    Simple windows desktop application for viewing & querying Apache Parquet files

    Language:C#671136279
  • kglab

    DerwenAI/kglab

    Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.

    Language:Jupyter Notebook562208365
  • pystore

    ranaroussi/pystore

    Fast data store for Pandas time-series data

    Language:Python540375997
  • vscode-data-preview

    RandomFractals/vscode-data-preview

    Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

    Language:TypeScript5311331857
  • kylebarron/parquet-wasm

    Rust-based WebAssembly bindings to read and write Apache Parquet data

    Language:Rust478710519
  • lonboard

    developmentseed/lonboard

    A Python library for fast, interactive geospatial vector data visualization in Jupyter.

    Language:Python4671214826
  • Netflix/iceberg

    Iceberg is a table format for large, slow-moving tabular data

    Language:Java4673476859
  • apache/parquet-cpp

    Apache Parquet

  • skale-me/skale

    High performance distributed data processing engine

    Language:JavaScript399223753
  • moshe/elasticsearch_loader

    A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

    Language:Python396225482
  • jorgecarleitao/parquet2

    Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

    Language:Rust349145658
  • ironSource/parquetjs

    fully asynchronous, pure JavaScript implementation of the Parquet file format

    Language:JavaScript34615103170
  • segmentio/parquet-go

    Go library to read/write Parquet files

    Language:Go3371016496
  • spotify/ratatool

    A tool for data sampling, data generation, and data diffing

    Language:Scala337299555
  • sksamuel/centurion

    Kotlin Bigdata Toolkit

    Language:Kotlin32622346
  • Eugene-Mark/bigdata-file-viewer

    A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

    Language:Java28233054
  • fraugster/parquet-go

    Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.

    Language:Go280115253
  • mjakubowski84/parquet4s

    Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

    Language:Scala278717468
  • grai-io/grai-core

    Language:Python27522320
  • cldellow/sqlite-parquet-vtable

    A SQLite vtable extension to read Parquet files

    Language:C++265134131
  • pqrs

    manojkarthick/pqrs

    Command line tool for inspecting Parquet files

    Language:Rust26062225
  • amazon-s3-find-and-forget

    awslabs/amazon-s3-find-and-forget

    Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

    Language:Python234173636