/zed

A cloud-native, searchable data lake for semi-structured and structured data

Primary LanguageGoBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Zed Tests GoPkg

Zed is a system for search, analytics, and data transformation designed to work at any scale, from a simple command-line tool called zq on your desktop (kind of like jq) to a large-scale, distributed cluster running on your servers or in the cloud (like a search cluster or a data warehouse).

Zed is based on a new type of data called super-structured data, which can represent both relational tables and JSON. Super-structured data provides a new and easier approach for data introspection, shaping, pipeline management, data quality, test, and debug, all while intermixing the flexibility of the JSON document model with the efficiency of warehouse analytics.

For a non-technical user, Zed is as easy to use as web search while for a technical user, Zed exposes its technical underpinnings in a gradual slope, providing as much detail as desired, packaged up in the easy-to-understand ZSON data format and the Zed Lake API.

Why?

We think that you shouldn't have to set up one system for search and another completely different system for historical analytics. And the same search/analytics system that works at cloud scale should run easily as a lightweight command-line tool on your laptop.

And rather than having to set up complex ETL pipelines with brittle transformation logic, managing your data lake should be as easy as git.

And while schemas are a great way to model and organize your data, they often get in the way when you are just trying to store or transmit your semi-structured data.

Finally, we believe a lightweight data store that provides easy search and analytics would be a great place to store data sets for data science and data engineering experiments running in Python and providing easy integration with your favorite Python libraries.

How?

Zed solves all these problems with a new format called ZSON, which is a superset of JSON and the relational models. ZSON is syntax-compatible with JSON but it has a comprehensive type system that you can use as little or as much as you like. Zed types can be used as schemas.

Zed also has a cloud-based object design that was modeled after the git design pattern. Commits to the lake are transactional and consistent. Search index updates are also transactionally consistent with any ingested data, and searches can run with or without indexes.

Quick Start

Detailed documentation is available.

The quickest way to get running on macOS, Linux, or Windows is to download a pre-built release binary. You can find these binaries on the GitHub releases page.

If you have Go installed, you can easily install zed and zq by running

go install github.com/brimdata/zed/cmd/{zed,zq}@latest

On macOS and Linux, you can use brew to install zq:

brew install brimdata/tap/zq

To install zed, use

brew install brimdata/tap/zed

Once installed, you can run the query engine from the command-line using zq:

echo '"hello, world"' | zq -

Or you can run a Zed lake service, load it with data using zed load, and hit the API. In one shell, run the server:

mkdir scratch
zed serve -lake scratch

And in another shell, run the client:

zed create Demo
zed use Demo@main
echo '{s:"hello, world"}' | zed load -
zed query "from Demo"

You can also use zed from Python. After you install the Zed Python:

pip3 install "git+https://github.com/brimdata/zed#subdirectory=python/zed"

You can hit the Zed service from a Python program:

import zed

# Connect to the default lake at http://localhost:9867.  To use a
# different lake, supply its URL via the ZED_LAKE environment variable
# or as an argument here.
client = zed.Client()

# Begin executing a Zed query for all records in the pool named "Demo".
# This returns an iterator, not a container.
records = client.query('from Demo')

# Stream records from the server.
for record in records:
    print(record)

See the python/zed for more details.

Brim

You can use the Brim app to explore, query, and shape the data in your Zed lake.

We originally developed Brim for security-oriented use cases (having tight integration with Zeek, Suricata, and Wireshark), but we are actively extending Brim with UX for handling generic data sets to support data science, data engineering, and ETL use cases.

Building from Source

It's also easy to build zed from source:

git clone https://github.com/brimdata/zed
cd zed
make install

This installs binaries in your $GOPATH/bin.

If you don't have Go installed, download and install it from the Go install page. Go version 1.17 or later is required.

Contributing

See the contributing guide on how you can help improve Zed!

Join the Community

Join our Public Slack workspace for announcements, Q&A, and to trade tips!

Acknowledgment

We modeled this README after Philip O'Toole's brilliantly succinct description of rqlite.