/data

Data extraction, transformation, processing and visualisation

Primary LanguageGo

Data Extraction, Transformation, Processing and Visualisation

This repository contains various data extraction, transformation processing and visualization tools in golang. Currently it contains the following interfaces:

  • data.Table provides you with a way to ingest, transform and process data tables in comma-separated value format and output in CSV, ASCII and SQL formats;
  • data.DOM provides a document object model which can read and write the XML format in addition to validating the XML;
  • data.Canvas provides a drawing canvas on which graphics primitives such as lines, circles, text and rectangles can be placed. Additionally transformation, grouping and stylizing of primitives can be applied. Canvases can currently be written in SVG format, the intention is to also allow rendering using OpenGL later.
  • data.Set and data.Series are data structures to store ordered sets of labels, real numbers, points and datetime values. They can subsequently be used to generate graphics (charts and maps, for example) or as input vectors to algorithms.

There are also some additional packages which act as a basis for the interfaces:

Documentation

I have published the documention at data.mutablelogic.com. You can also see the following useful sources of information:

Usage and Examples

There are various examples in the cmd folder. In order to build the examples, use the following command:

bash% git clone git@github.com:djthorpe/data.git
bash% cd data
bash% make
bash% cd build/cmd

A temporary build folder is created on build. To run the tests or clean, use make test and make clean respectively. There is more information about the examples in the documentation.

Project Status

This module is currently in development and the status of each package is as follows:

  • pkg/table is mostly feature-complete:
    • Requires code to change width of table in ASCII mode;
    • Add code for stylizing output in ASCII mode (color, bold, underline, italic);
    • A test is failing and needs to be fixed.
  • pkg/dom is mostly feature-complete;
  • pkg/dtd has just been started and needs to be writen, to validated parsed XML documents against a DTD definition.
  • pkg/canvas is in development. There is work to:
    • Ensure the following primitives & features are supported:
      • Linear Gradients
      • Patterns
      • Line dashes
    • Ensure as many SVG files can be parsed as possible;
    • Integrate with stylesheets (see below).
  • pkg/stylesheet has not been started and needs to be integrated into canvas, so that style can be defined both on elements and at the head of an SVG document, or an external stylesheet can be referenced.
  • pkg/color is in development. There is work to:
    • Requires some more tests and documentation (in progress);
  • pkg/geom is in development.
    • Requires some tests and documentation (in progress);
  • pkg/viz is in development.
    • Scales from all sets;
    • Legend from labelset
    • Line plots from point sets;
    • Bar & pie charts from real sets.
  • pkg/set is in development. There is work to:
    • Documentation
    • Tests

Further to these, the following areas need to be implemented:

  • Rendering using SDL (both on-screen and bitmaps), PDF, OpenGL and OpenVG
  • UI and flex
  • statistical and learning algorithms are to be implemented.

Contributing and Filing Issues

  • File an issue or question on github.
  • Feel free to fork this repository. Any pull requests are gratefully received. Licensed under Apache 2.0, please read that license about using, distribution and forking. Licensed works, modifications, and larger works may be distributed under different terms and without source code.

License

This repository is released under the Apache license:

> Copyright 2021 David Thorpe and all other authors of this software.
>
> Licensed under the Apache License, Version 2.0 \(the "License"\); you may not use this file except in compliance with the License. You may obtain a copy of the License at:
>
>   http://www.apache.org/licenses/LICENSE-2.0
>
> Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.