Use `frictionless.js` to detect csv dialect and schema and to validate data

Question

Use `frictionless.js` to detect csv dialect and schema and to validate data

Opened this issue 2 years ago · 1 comments

The frictionless.js package features csv dialect detection (cell separator, escape character, etc.). The detected dialect metadata can then be recorded in a standard json file inside a data package descriptor.

The package can also detect the schema (column types) and record the schema metadata as well in the data package. This schema metadata can be used not only to better describe the data, but to do data validation as well.

It would be nice to have some integration between these tools and the Rainbow CSV VSCode extension. Examples of possible features:

more detailed csv linting (the Frictionless Framework can create detailed reports of each problem it finds in a csv, e.g. duplicated rows, missing fields, wrong data type, etc.
one click generating data packages from a csv with table schema

By the way, the Frictionless Framework is available not only in Javascript, but in Python as well. I'm not sure which of the two would be best suited for using in the VSCode extension.

Answer 1 · 2022-09-10T03:09:45.000Z

Thank you, this is an interesting proposal as well as the whole idea of using "data packages", I think we all may benefit from data standards like this.

Saying that I don't think that more detailed csv linting and data package generation is the right direction for the development of Rainbow CSV extension because it significantly increases feature surface area and makes maintenance harder, which is a very important consideration for me. Perhaps it would make sense to implement the functionality you are proposing as a separate data management extension? Another benefit of having it as a new extension would be its ability to support not only csv but xls and other data formats known to the framework.
I will keep the issue open to see what other opinions folks may have about your suggestion.