CSV without doubts!
PSV is a text based data format for tabular data. It is similar to CSV in function, but more strictly defined. Although there is RFC 4180 for CSV, in practice there are a lot of incompatibilites between CSV implementations. I.e. trying to open a CSV file in a spreadsheet program leads to a plethora of (mostly technical) options. The goal of PSV is to simplify reading and writing tabular data by defining as much technical aspects as possible.
- PSV files are text files encoded with UTF-8
- Every line represents a row delimited by LF or CRLF characters
- The last line has no line ending
- Every row consists of one or more fields separated by a pipe character
- There are no leading or trainling pipe characters
- Some characters are escaped by a leading backslash
- carriage return: \r
- line feed: \n
- backslash: \\
- pipe character: \|
- All other characters preceeded by a backslash are treated as is
CSV:
aaa,bbb,ccc
PSV:
aaa|bbb|ccc
CSV:
"aaa","bbb","ccc"<CRLF>
zzz,yyy,xxx
PSV:
aaa|bbb|ccc<CRLF>
zzz|yyy|xxx
CSV:
"aaa","b<CRLF>
bb","ccc"<CRLF>
zzz,yyy,xxx
PSV:
aaa|b\nbb|ccc<CRLF>
zzz|yyy|xxx
CSV:
"aaa","b""bb","ccc"<CRLF>
zzz,"yy,y",xxx
PSV:
aaa|b"bb|ccc<CRLF>
zzz|yy\|y|xxx
There are no doubts about the encoding. It's always UTF-8. No BOM!
Since carriage return and line feed characters are escaped, a parser can assume, that an unescaped LF or CRLF character always represents the end of a row.
A carriage return can therefor simply be ignored, when parsing a PSV stream.
If the last line in a PSV stream contains a CRLF or LF line ending, the parser will create another row with one empty field.
Every row has at least one field. Multiple fields are separated by the pipe character "|".
A leading or trailing pipe character in a line represents an additional empty field.
The following characters in a fields content are treated special by escaping them with a leading backslash:
- carriage return: \r
- line feed: \n
- backslash: \\
- pipe character: \|
This is one of the core aspects of PSV!
A backslash can theoretically occur everywhere in a stream. If the following character is none of the ones listed in Rule 6, it will be ignored.