/parquet-reader

A browser-based client-only reader of parquet files

Primary LanguageJavaScriptMIT LicenseMIT

parquet-reader

Goal: write a client-side, browser-based reader of parquet files leveraging web assembly and recent developments in arrow2 and parquet2.

How:

  • build a wasm package based on parquet2 that can read a parquet file from bytes.
  • create a react app that uses that library and offer a UI to load a file from the computer and navigate on it (header, footer, groups, pages)
  • do not interact with any server (other than serving the original JS/HTML): only use client-side code so that users can depend on this without concerns of data exfiltration, etc.

Down the road:

  • expose method to read the file from s3 using some authentication model
  • expose method to read the file from azure using some authentication model
  • offer UI to explore the data on the file (e.g. using web assembly and arrow2)

Probable tech stack:

  • Rust for parquet reading
  • web-pack .wasm -> js-equivalent
  • react to UI and usual frontend stuff
  • material UI for visualization
  • AWS cloudfront for deployment

How to build

cd parquet-reader
wasm-pack build --out-dir ../wasm-build
cd ..
cd client
npm start

select a parquet file from your computer (e.g. download this one) and see the file's metadata in the console as a JS map.