lukaswagner/csv-parser

improve and simplify interface

Closed this issue · 2 comments

While it's nice to have a completely promise-based interface, the current implemetation has usability problems.

Current State

The minimal usage is still quite complex:

const detectedColumns = await loader.open(id);
const [columns, dispatch] = loader.load({
    columns: detectedColumns.map(({ type }) => type),
    generatedColumns: [],
});
for await (const _ of dispatch());
return columns;

For users that just want their data, this is quite a lot of code. "Why do I need this loop? Why do I need to pass an empty array? What's with that redundant map call?"

But even for advanced users, the interface is unintuitive: The dispatchers done event will always be awaited before the columns are returned. Thus, the done event is redundant, as it will always be directly followed by the function returning.

Requirements

What events does the interface have to support? This is separated into the core functionality and the per-chunk updating:

  • [core] finished data: After all data is loaded, the columns should be emitted.
  • [chunks] columns initialized: Directly after the columns are created, they should be emitted.
  • [chunks] progress: Every time a chunk is parsed, an update callback should be invoked.

Proposed Interface

As the basic interface should be as simple as possible, only one promise should be returned. This promise should resolve when all data is parsed.

The two other event types should be handled using optional callbacks.

Also, the columns/generatedColumns options to load should be changed slightly:

  • columns should be of type ColumnHeader instead of DataType. This would allow renaming of columns, as well as a simpler interface (passing the detected column headers directly instead of mapping).

  • generatedColumns should be optional. This is only a quality-of-life improvement to avoid passing an empty array.

The minimal usage could look like this:

const detectedColumns = await loader.open(id);
const data = await loader.load({ columns: detectedColumns });

A more involved usage example, using custom columns and the chunk approach:

const detectedColumns = await loader.open(id);
// don't care about return value, it's already handled in init()
await loader.load({
    columns: detectedColumns
        .filter((column) => column.type === 'number')
        .map((column, i) => ({ name: `num_${i}`, type: column.type })),
    // callback function names are up to debate
    init: (columns) => renderer.initColumns(columns),
    update: (progress) => {
        console.log('progress:', progress);
        renderer.handleUpdate();
    }
});

@scheibel @XR4Z0R Would be great to hear your thoughts on this 🙂

I like your API proposal. It simplifies the minimal usage example as well as more complex scenarios. Also, I think that those two callback methods are more usable and easier to understand than a Promise-only approach.

I'm fine with the terms "init" and "update", at least no other terms come to my mind that would be more appropriate. But I would suggest to name the callback properties onInit and onUpdate, to be more explicit that these are event callbacks.