Aerospike data browser is a stand-alone desktop application that lets you discover data stored in the Aerospike database and run ANSI SQL queries against it.
- View namespace/set/schema of data stored in Aerospike, without the need to know the schema apriori.
- Run ANSI SQL queries for data exploration, including aggregate functions for descriptive analytics.
- Visualize data.
Note: The data browser is not an analytics tool and should not be used for complex analysis of large datasets.
It is a stack combining Quix, Presto, and the Aerospike Connector for Presto, and is dockerized. The Quix UI provides a DB Explorer and a SQL editor, in addition to a notebook manager for managing your notebooks. Presto exposes a JDBC interface to Quix and uses the Aerospike Connector to translate SQL queries into API calls to the DB.
- Aerospike server is not a part of this package, so make sure you have a server instance running and populated with data that you would like to explore. Refer to the blog on launching an Aerospike server in a docker environment if you need to and use one of the Aerospike clients to populate it with data.
- Docker Desktop is installed.
Run the docker command:
docker build . -t aerospike-data-browser
docker run -d -p 3000:3000 -p 8081:8081 --name data-browser aerospike-data-browser
OR if you choose not to build, but rather use the image from docker hub:
docker run -d -p 3000:3000 -p 8081:8081 --name data-browser reugn/aerospike-data-browser
Quix runs on port 3000 by default, hence launch the Aerospike server on any other port e.g. 3300 if you are planning to run both on the localhost. Below is the list of environment variables you can specify to configure the data browser using the -e option.
Variable | Description | Default Value |
---|---|---|
AS_HOSTLIST | Aerospike host list, a comma separated list of potential hosts to seed the cluster. | |
TABLE_DESC_DIR | Path of the directory containing table description files. | /usr/lib/presto/etc/aerospike |
SPLIT_NUMBER | Number of Presto splits. See Parallelism section for more information. | 4 |
CACHE_TTL_MS | Number of milliseconds to keep the inferred schema cached. Data browser uses heuristics to rapidly infer schema of the data stored in Aerospike. | 1800000 |
DEFAULT_SET_NAME | Table name for the default set. This is used when your namespace has a null set or no sets. | __default |
RECORD_KEY_NAME | Column name for the record's primary key. Use this in the WHERE clause for queries involving primary key (PK) comparisons. | __key |
RECORD_KEY_HIDDEN | If set to false, the primary key column will be available in the result set. | true |
INSERT_REQUIRE_KEY | Require the primary key on INSERT queries. Although we recommend that you provide a primary key, you can choose not to by setting this property to false, in which case a UUID is generated for the PK. You can view it by setting aerospike.record-key-hidden to false for future queries. | true |
Make sure that the data browser container is running by using the docker ps -a
command.
Access the UI at http://localhost:3000/ (assuming that you haven’t changed the default port).
covid_stats
and write_set
are two sets that were created in the namespace called test
in an Aerospike cluster called aerospike
.
You can now view the namespace and the sets in the DB explorer on the side bar.
You can ignore the information schema and system tables in the DB explorer because those are not required.
Additionally, you can also view the schema of the sets. Notice that there’s no need to specify the schema. The data browser detects (infers) the schema for you.
The UI has a built-in SQL editor that you can use to run ANSI SQL queries. Here are the supported SQL statements:
- SHOW CATALOGS
- SHOW SCHEMAS
- USE
- SHOW TABLES
- DESCRIBE
- SELECT
- INSERT INTO
- EXPLAIN
Note that a catalog corresponds to an Aerospike cluster name, schema to a namespace, and table to an Aerospike set name. See SQL syntax page for examples and more information.
If you click on the triple dots next to the set name, a sample SQL query is already generated for you that you can edit as you see fit. Here’s an example:
SELECT *
FROM aerospike.test.covid_stats
LIMIT 1000
Note: If you choose not to define sets (which is optional in Aerospike), but rather directly store data into a namespace in your data model,
you would have to use the null set or the __default
set (see __default
set in the DB Explorer). The default set contains all the records that were stored in the namespace.
It’s name can be changed by configuring the parameter DEFAULT_SET_NAME
.
SELECT *
FROM aerospike.test.__default
LIMIT 1000
To use the primary key in a WHERE clause, use __key
in the WHERE clause. You can change the name of the key column, which defaults to __key
, by configuring the environment variable RECORD_KEY_NAME
in the docker run command. For example,
SELECT *
FROM aerospike.test.covid_stats
WHERE __key = “New York”
You can also visualize the data using the charting capabilities built into the QUIX UI.
Finally, you can run aggregate functions for descriptive analytics. For example,