Performant parser for textual data
- Parse 100K rows in ~550 ms (may vary with data)
- Very low memory footprint (~10 MB)
- Supports parsing from file, string or buffers
- Supports streaming output
- Passes CSV Acid Test suite csv-spectrum
Parsing the following bit of data
id,firstName,lastName,jobTitle
1,Jed,Hoppe,Customer Markets Supervisor
2,Cristian,Miller,Principal Division Specialist
3,Kenyatta,Schimmel,Product Implementation Executive
will produce
[ { id: '1',
firstName: 'Jed',
lastName: 'Hoppe',
jobTitle: 'Customer Markets Supervisor' },
{ id: '2',
firstName: 'Cristian',
lastName: 'Miller',
jobTitle: 'Principal Division Specialist' },
{ id: '3',
firstName: 'Kenyatta',
lastName: 'Schimmel',
jobTitle: 'Product Implementation Executive' } ]
npm install text2json --save
- Parse the entire file into JSON
'use strict'
let Parser = require('text2json').Parser
let rawdata = './data/file_100.txt'
let parse = new Parser({hasHeader : true})
parse.text2json (rawdata, (err, data) => {
if (err) {
console.error (err)
} else {
console.log(data)
}
})
- If parsing a large file, stream the output
'use strict'
let Parser = require('text2json').Parser
let rawdata = './data/file_500000.txt'
let parse = new Parser({hasHeader : true})
parse.text2json (rawdata)
.on('error', (err) => {
console.error (err)
})
.on('headers', (headers) => {
console.log(headers)
})
.on('row', (row) => {
console.log(row)
})
.on('end', () => {
console.log('Done')
})
The parser accepts following options through its constructor (all are optional)
{
hasHeader?: boolean,
headers?: string[],
newline?: string,
separator?: string,
quote?: string,
encoding?: string,
skipRows?: number,
filters?: Filters,
headersOnly?: boolean
}
hasHeader
- If true, first line is treated as header row.
- Defaults to
false
.
headers
- An array of strings to be used as headers.
- If specified, overrides header row in data.
- Defaults is an empty array
newline
- Choose between Unix and Windows line endings (
\n
or\r\n
) - Defaults to
\n
- Choose between Unix and Windows line endings (
separator
- Specify column separator
- Defaults is
,
(comma)
quote
- Specify quote character
- Default is
"
(double quotes)
encoding
(see https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings)- Use a different encoding while parsing
- Defaults to
utf-8
skipRows
- Number of rows to skip from top (excluding header)
- Default is zero rows
filters
- Filter columns based on index or header name
headersOnly
- Parse only header row
If hasHeader
is false and custom headers are not specified, parser will generate headers using a zero based index of the columns. i.e. when data has 5 columns, generated headers will be ['_1', '_2', '_3', '_4', '_5']
Header fill will also occur when number of headers given in custom headers array is less than the actual numbers of columns in the data.
headers
- emitted after parsing the header row or once header fill has completed. The payload contains an array of header names.row
- emitted once for every row parsed. Payload is an object with properties corresponding to the header row.error
- emitted once for the first error encountered. Payload is an Error object with an indicative description of the problem.end
- emitted once, when the parser is done parsing. No payload is provided with this event.
- Return columns selectively (either by column index or header name)
- Ignore header row in data and use custom header names provided in options
- Skip rows (start parsing from a given row number)