/mariadb-columnstore-quickstart

This repo provides steps to get you up and running with MariaDB ColumnStore using a Docker container.

Primary LanguageJupyter NotebookMIT LicenseMIT

MariaDB ColumnStore Quick Start

MariaDB ColumnStore is a columnar storage engine that utilizes a massively parallel distributed data architecture. ColumnStore is designed for big data scaling to process petabytes of data, linear scalability and exceptional performance with real-time response to analytical queries.

This README provides steps to get you up and running with MariaDB ColumnStore using a Docker container

Prerequisites

Before getting started with this walkthrough you will need:

Overview

This walkthrough will step you through the process of installing, accessing and configuring single-node MariaDB ColumnStore instance, made available through MariaDB Community Server within a Docker container.

Note: To use this download, import and use this data on an existing, non-container MariaDB Server instance (including MariaDB SkySQL) jump to step #4.

1. Using a Docker image to create a Docker container

Pull down the MariaDB Community Server (with ColumnStore) image and create a new container by executing the following command in a terminal window:

$ docker run -d -p 3306:3306 --name mcs_container mariadb/columnstore

2. Enter the newly created container

$ docker exec -it mcs_container bash

Note: The next several steps involve work within the Docker container, but that is not a hard requirement. The scripts within this repository will also work outside of the container as well.

3. Install git using yum

$ yum install git

4. Clone this repository

$ git clone https://github.com/mariadb-developers/mariadb-columnstore-quickstart.git

5. Download flight data

The sample data used in this example comes from the United States Department of Transportation, which provides millions of records of flight data spanning many years.

Use the following command to execute a script that will download US domestic flight data between a start and end year.

$ ./get_flight_data.sh

Note: Keep in mind that there are millions of flight records that can take up gigabytes of storage space.

6. Create schemas and load data

This repository includes the following schema:

  • travel (database)
    • airlines (ColumnStore table) - airlines providing flights within the United States
    • airports (ColumnStore table) - airports within the United States
    • flights (ColumnStore table) - US domestic flight records

In this sample, the create_and_load.sh script will be used to create the schemas (via schema.sql) and load the following tables:

Execute the following command to execute a script to create the schema and load data.

$ ./create_and_load.sh

Additional uses

The create_and_load.sh script can also be used by specifying by database details like host, port, user, and password.

./create_and_load.sh [host] [port_number] [user] [password]

$ ./create_and_load.sh 127.0.0.1 3306 app_user Password123!

The script can also be used with a MariaDB SkySQL database (by including a path to the Certificate authority chain file). For example:

./create_and_load.sh [host] [port_number] [user] [password] [ca_file_path]

$ ./create_and_load.sh analytics-demo.mdb0001390.db.skysql.net 5001 DB00003799 Password123 skysql_chain.pem

Support and Contribution

Please feel free to submit PR's, issues or requests to this project project directly.

If you have any other questions, comments, or looking for more information on MariaDB please check out:

Or reach out to us diretly via:

License

License