Apache Arrow
Build Status | Code Coverage |
Powering In-Memory Analytics
Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.
Major components of the project include:
- The Arrow Columnar In-Memory Format
- C++ libraries
- C bindings using GLib
- Go libraries
- Java libraries
- JavaScript libraries
- Plasma Object Store: a shared-memory blob store, part of the C++ codebase
- Python libraries
- Ruby libraries
- Rust libraries
Arrow is an Apache Software Foundation project. Learn more at arrow.apache.org.
What's in the Arrow libraries?
The reference Arrow libraries contain a number of distinct software components:
- Columnar vector and table-like containers (similar to data frames) supporting flat or nested types
- Fast, language agnostic metadata messaging layer (using Google's Flatbuffers library)
- Reference-counted off-heap buffer memory management, for zero-copy memory sharing and handling memory-mapped files
- Low-overhead IO interfaces to files on disk, HDFS (C++ only)
- Self-describing binary wire formats (streaming and batch/file-like) for remote procedure calls (RPC) and interprocess communication (IPC)
- Integration tests for verifying binary compatibility between the implementations (e.g. sending data from Java to C++)
- Conversions to and from other in-memory data structures
Getting involved
Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved:
- Join the mailing list: send an email to dev-subscribe@arrow.apache.org. Share your ideas and use cases for the project.
- Follow our activity on JIRA
- Learn the format
- Contribute code to one of the reference implementations
How to Contribute
We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the github.com/apache/arrow repository.
If you are looking for some ideas on what to contribute, check out the JIRA issues for the Apache Arrow project. Comment on the issue and/or contact dev@arrow.apache.org with your questions and ideas.
If you’d like to report a bug but don’t have time to fix it, you can still post it on JIRA, or email the mailing list dev@arrow.apache.org
To contribute a patch:
- Break your work into small, single-purpose patches if possible. It’s much harder to merge in a large change with a lot of disjoint features.
- Create a JIRA for your patch on the Arrow Project JIRA.
- Submit the patch as a GitHub pull request against the master branch. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Prefix your pull request name with the JIRA name (ex: apache#240).
- Make sure that your code passes the unit tests. You can find instructions how to run the unit tests for each Arrow component in its respective README file.
- Add new unit tests for your code.
Thank you in advance for your contributions!