eubr-bigsea/citron

[TEST CASES] Test upload of data sets with certain particularities

Opened this issue · 0 comments

Notice: bugs related to this issue must be opened in Limonero. But here is a good place to add the task.
Frequently, a user uploads a data set with some weird characteristic that breaks Limonero:

  • File name with spaces
  • File name with accents
  • Content encoded with UTF-8, UTF-16 (not supported by Lemonade)
  • Non-tabular data (rows have different number of "columns")
  • Tabular data, but strings are not delimited (with quotes, for example)
  • Tabular data with line breaks in a delimited string

There is a known bug in Spark when we combine tabular data with line breaks + encoding UTF-8. In this case, the encoding does not work.