/data-projector

Load CSV datasets and map / transform the data for use.

MIT LicenseMIT

data-projector

Load CSV datasets and map / transform the data for use.

  • Load CSV datasets
  • Guess types and cast data fields to types
  • Calculate stats: global, fields and pairwise (field by field correlations etc)
  • Map datasets to other datasets using transform functions

This is designed to take a specification object (JSON) and load a dataset and optionally map values to requested ranges.

The JSON specification objects can be saved in your application for use in presets.

Status: ALPHA

Currently all transformation functions are to be supplied when you

API

project(functions, path, statsParams, mapParams) ⇒ Object

Load a dataset from disk, calculate statistics and apply transformations

Returns: Object - Dataset

Param Type Description
functions Object Named function registery
path String
statsParams Object
mapParams Array.<Object>

readParseDataset(path) ⇒ Promise.<Object>

Load and parse a dataset from path. Stats are not yet calculated so types are unknown and all fields are strings.

Returns: Promise.<Object> - Promise for a dataset

Param Type Description
path String Absolute path to file

loadDataset(path, functions, statsParams) ⇒ Promise.<Object>

Load and parse a dataset and calculate stats and coerce types of field values.

Returns: Promise.<Object> - Promise for a dataset

Param Type Description
path String Absolute path to file
functions Object Named function registery
statsParams Object The stats object from params

createDataset(data, fields, path) ⇒ Object

Create a dataset object from an array of objects

Returns: Object - dataset - {data, fields, path}

Param Type Description
data Object [{field: value, field2: value}, ...]
fields Array.<String> Field names
path String

_calculateStats(functions, statsParams, dataset) ⇒ Object

Calculate statistics (minval, maxval, avg etc.) for a dataset using a stats specification.

Returns: Object - stats

Param Type Description
functions Object Named function registery
statsParams Object The stats object from params
dataset Object As returned by loadDataset or from a previous transformation.

calculateStats(functions, statsParams, dataset) ⇒ Object

Calculate statistics and return a new dataset objects with .stats set

Returns: Object - dataset

Param Type Description
functions Object Named function registery
statsParams Object
dataset Object

castTypes(dataset) ⇒ Object

Having guessed types with calculateStats, cast all fields to the guessed types.

  • This converts '1.1' to 1.1
  • Enums of strings to their integer indices
  • Date strings to Date objects
  • String fields with high cardinality remain strings

Returns: Object - Dataset object with values cast to guessed types

Param Type Description
dataset Object Dataset object

mapDataset(functions, mapParams, dataset)

mapDataset

Map input fields to output fields using mapping functions as specified in mapParams

{
   input: 'inFieldName',
   output: 'outFieldName'
   fn: 'linear',  // named function in functions registry
   args: [0, 1]   // parameters for linear mapping function
}

fn may be a String key to a function in the functions registery or a function(stats, fieldName, [...args], value)

Param Type Description
functions Object Named function registery
mapParams Array.<Object>
dataset Object

makeMapFunction(functions, stats, mapParam) ⇒ function

makeMapFunction from mapParam

mapParam: .fn .args

Where fn is a Function or a String key to lookup Function in functions

Function should accept: (stats, fieldName, ...args, value)

Args are optional array of params to configure your mapping function. eg. [minval, maxval]

This curries the function and calls it with: (stats, fieldName, ...args) and returns that mapping function which accepts just value and returns the mapped value.

Returns: function - any => any

Param Type Description
functions Object Named function registery
stats Object
mapParam Object

getRow(dataset, fields) ⇒ Object

Get a single row as an Object.

As this function is curried you can bake in dataset and fields:

getter = getRow(dataset, null);  // returns a function with first two args satisfied
getter(12);  // get row 12

Returns: Object - - The object for this row.

Param Type Description
dataset Object
fields Array.<string> | null Optionally select just the fields you need. null selects all fields.

getCell(dataset, field, index) ⇒ mixed

Get a single data value (row, column)

As this function is curried you can bake in dataset and field:

 getter = getCell(dataset, 'sepalLength');
 getter(12);  // get value at row 12, field 'sepalLength'

Returns: mixed - - The value for this cell.

Param Type Description
dataset Object
field String key of the field to select
index Number integer index of row

getColumn(dataset, field) ⇒ Array.<mixed>

Get all values for a column

As this function is curried you can bake in dataset:

 getter = getColumn(dataset);
 getter('sepalLength');  // get the array of values for the sepalLength field

Returns: Array.<mixed> - - Array of values for this field

Param Type Description
dataset Object
field String key of the field to select