/gidari

Transport web data to local/remote storage using Gidari

Primary LanguageGoApache License 2.0Apache-2.0

Gidari

PkgGoDev Build Status Go Report Card Discord

Gidari is a "web-to-storage" tool for querying web APIs and persisting the resulting data onto local storage. A configuration file is used to define how this querying and storing should occur. Once you have a configuration file, you can intiate this transport using the command gidari --config <configuration.yml>. See here for a quick demonstration.

Installation

TODO

Usage

Using Gidari is a two step process:

  1. Create a configuraiton file to instruct the binary on how to make the RESful HTTP requests and where to store the data
  2. Run gidari --config your_configuration.yml --verbose

Configuration

The configuration is a YAML file used to define a set of rules for making RESTful HTTP requests and where to store the data. See here for example configurations.

Key Required Type Description
url Y string The URL for the RESTful API for making requests
authentication N map Data required for authenticating the web API requests
connectionStrings Y list List of connection strings for communicating with local/remote storage
rateLimit Y map Data required for limiting the number of requests per second, avoiding 429 errors
rateLimit.burst Y int Number of requests that can be made per second
rateLimit.period Y int Period for the rateLimit.burst
truncate N boolean Truncate all tables in the database before performing request upserts
requests N list List of requests to receive data from the web API for upserting into local/remote storage
request.endpoint Y string Endpoint for making the RESTful API request
table N string Name of the table in the remote/local storage for upserting data. This field defaults to the last string in the endpoint path
timeseries N map Data required for upserting time series data, which can be very resource intensive
timeseries.startName Y string Name of the query/path parameter for the "start" date of the time series
timeseries.endName Y string Name of the query/path parameter for the "end" date of the time series
timeseries.period Y int How often (in seconds) to build a new datetime range to batch over. For example, if your datetime range spans 24 hours and your period is 3600 then the request will be broken up into 24 smaller requests spanning the datetime range
timseries.layout Y string The layout for how to build a datetime to query over. For example, if your time series uses RFC3339 then the layout should be "2006-01-02T15:04:05Z07:00"
query N map This is a non-deterministic map that holds the query parameters for a request

SQL

TODO

NoSQL

The NoSQL use case should require no overhead from the user. Just include the connection string in the connectionString list of the configuration file.

Repository

The repository and proto packages are the only packages within the application that are public-facing stable API with the purpose of communicating CRUD requests to the storage devices used in the web-to-storage transfers.

Contributing

Follow this guide for information on contributing.

Resources