/elasticsearch-sql

CLI app and a Clojure library that uses the Elasticsearch SQL API

Primary LanguageClojureApache License 2.0Apache-2.0

Clojars Project cljdoc badge Tests bb compatible

esql - Elasticsearch SQL API CLI Client

A CLI tool and a Clojure library to query Elasticearch with SQL.

Use cases

  1. Filter with SQL query and export millions of docs from Elasticsearch to a CSV file:
ESQL_ELASTICSEARCH_HOSTS=http://localhost:9200 ./eqsl --query="SELECT * FROM logs LIMIT 1000000" --format=csv
  1. Clojure library that exposes hits as an IReduceInit, so it is easy to use with transducers:
(into [] (comp) (esql/reducible {:elasticsearch_hosts "http://localhost:9200"
                                 :query               "SELECT * FROM index LIMIT 1"
                                 :format              "csv"}))

Install

 brew install dainiusjocas/brew/esql

Why?

The initial idea was to play with the Elasticsearch specification:

  • Convert it to a Malli schema;
  • Generate a CLI API from the Malli schema for the malli-cli;
  • Make a useful CLI tool.

CLI

Available params:

ESQL CLI parameters:
  Short  Long option                    Default                  Description
         --columnar                                              If true, the results in a columnar fashion: one row represents all the values of a certain column from the current page of results.
         --delimiter                    ","                      The CSV format accepts a formatting URL query attribute, delimiter, which indicates which character should be used to separate the CSV values.
         --dry-run                                               Prints configuration map with defaults.
         --elasticsearch-hosts          "http://localhost:9200"  Elasticsearch host
         --fetch-size                   1000                     The maximum number of rows (or entries) to return in one response
         --field-multi-value-leniency   true                     Throw an exception when encountering multiple values for a field (default) or be lenient and return the first value from the list (without any guarantees of what that will be - typically the first in natural ascending order).
         --filter                                                Optional Elasticsearch query DSL for additional filtering.
         --format                       "csv"                    Elasticsearch SQL can return the data in several formats
  -h     --help                                                  Display usage summary and exit.
         --index-using-frozen           false                    If true, the search can run on frozen indices. Defaults to false.
         --keep-alive                   "5d"                     Retention period for an async or saved synchronous search.
         --keep-on-completion           false                    If true, Elasticsearch stores synchronous searches if you also specify the wait_for_completion_timeout parameter. If false, Elasticsearch only stores async searches that don’t finish before the wait_for_completion_timeout.
         --page-timeout                 "45s"                    The timeout before a pagination request fails.
         --params                                                Values for parameters in the query.
         --query                                                 SQL query to execute
         --request-timeout              "90s"                    The timeout before the request fails.
         --time-zone                                             Time-zone in ISO 8601 used for executing the query on the server. More information available here.
         --wait-for-completion-timeout                           Period to wait for complete results. Defaults to no timeout, meaning the request waits for complete search results. If the search doesn’t finish within this period, the search becomes async.

Library

From the Clojars:

lt.jocas/elasticsearch-sql {:mvn/version "RELEASE"}

Development

Install:

Run bb tasks for a list of all available tasks.

Roadmap

  • Babashka compatible (with an exception of cbor and smile formats)
  • bb task to generate Malli schema for Elasticsearch SQL params input
  • CLI interface that documents all params based on malli-cli
  • Transducer ready
  • Installable GraalVM native-image
    • brew
    • windows
  • Install with bbin
  • Test in the JVM with testcontainers
  • Generate an environment variable defaults for the Malli schema:
    • "UPPERCASE_REPLACE_WITH_UNDERSCORES"
    • with prefix? yes ESQL_*
    • ELASTICSEARCH_HOSTS from kibana docker
  • Split into lib, and app deps.edn aliases. Native-image builds the app alias
  • Deploy lib to Clojars
  • Handle params schema [:vector :any], cli --params 1 --params 2 into a vector
  • Babashka pod
  • Docker image
  • "Patient reverse fetch" (if possible)

License

Copyright © 2023 Dainius Jocas.

Distributed under The Apache License, Version 2.0.