Infer data types from CSV columns.
This package provides a single interface for generating the datatype for a given row-column formatted dataset. We support the following datatypes:
- DATE
- TIME
- DATETIME
- NUMBER
- INT
- FLOAT
- CURRENCY
- PERCENT
- STRING
- ARRAY
- OBJECT
- ZIPCODE
- BOOLEAN
- GEOMETRY
- GEOMETRY_FROM_STRING
- PAIR_GEOMETRY_FROM_STRING
- NONE
npm install type-analyzer
Parameters
data
Array required An array of row objectrules
Array optional An array of custom regex rulesoptions
Object optional Option objectoptions.ignoreDataTypes
Array optional Data types to ignore
var Analyzer = require('type-analyzer').Analyzer;
var data = [
{
"ST_AsText": "MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))",
"name": "san_francisco",
"lat": "37.7749295",
"lng": "-122.4194155",
"launch_date": "2010-06-05",
"added_at": "2010-06-05 12:00"
},
{
"ST_AsText": "MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))",
"name": "paris",
"lat": "48.856666",
"lng": "2.3509871",
"launch_date": "2011-12-04",
"added_at": "2010-06-05 12:00"
},
]
var colMeta = Analyzer.computeColMeta(data);
rules
You can pass in an array of custom rules. For example. if you want to ensure that a column full of ids represented as numbers is identified as a column of strings. Rules can be matched with either exact name
of the column, or regex
used to match names. Note: Analyzer prefers rules using name over regex since better performance.
var Analyzer = require('type-analyzer').Analyzer;
var colMeta = Analyzer.computeColMeta(data, [{name: 'id', dataType: 'STRING'}]);
// or
var colMeta = Analyzer.computeColMeta(data, [{regex: /id/, dataType: 'STRING'}]);
options.ignoreDataTypes
You can also pass in ignoreDataTypes
to ignore certain types. This will improve your type checking performance.
var DATA_TYPES = require('type-analyzer').DATA_TYPES;
var colMeta = Analyzer.computeColMeta(arr, [], {ignoredDataTypes: DATA_TYPES.CURRENCY})[0].type,
And it will short cut around the usual analysis system and give you back the column formatted as you'd expect.
You can import all availale types as a constant.
Breaking changes with v1.0.0: Regex has moved into src, but can more easily be accessed from the module.exports from the root. As part of a larger clean up many extraneous util files were removed.