Pinned Repositories
arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
arrow-rs
Official Rust implementation of Apache Arrow
data-geonames
delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
delta-rs
A native Rust library for Delta Lake
dragnet
Just the facts -- web page content extraction
kplay
node-url-poller
spark-boilerplate
A boilerplate for spark projects with docker support for local development and scripts for emr support.
xianwill's Repositories
xianwill/spark-boilerplate
A boilerplate for spark projects with docker support for local development and scripts for emr support.
xianwill/node-url-poller
xianwill/kplay
xianwill/arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
xianwill/arrow-rs
Official Rust implementation of Apache Arrow
xianwill/data-geonames
xianwill/delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
xianwill/delta-rs
A native Rust library for Delta Lake
xianwill/dragnet
Just the facts -- web page content extraction
xianwill/hotdog
Hotdog is a syslog-to-Kafka forwarder which aims to get log entries into Apache Kafka as quickly as possible.
xianwill/json-schema
JSON Schema validator for java, based on the org.json API
xianwill/JustJson
JSON helper library for Android
xianwill/kafka-delta-ingest
A highly efficient daemon for streaming data from Kafka into Delta Lake
xianwill/kafkajs
A modern Apache Kafka client for node.js
xianwill/python-readability
fast python port of arc90's readability tool, updated to match latest readability.js!
xianwill/readability
A standalone version of the readability lib
xianwill/rust-dataframe
A Rust DataFrame implementation, built on Apache Arrow
xianwill/scribd.github.io
The Scribd technology site, where we share the challenges in building the world's largest library
xianwill/slipstream
A tool for doing on-the-fly message validation for Kafka.