-
Notifications
You must be signed in to change notification settings - Fork 0
CSV Strictness
There are 3 levels of strictness at which CSV could be specified.
(similar to JSON, no options for the parser)
row terminator: LF
column separator: ,
escape character: \
enclosure: none
encoding: UTF-8
All files must be in UTF-8, use LF
as the line terminator, use ,
as the column separator, have no insignificant whitespace.
Any special characters (,
or LF
) in the field values must be escaped with the escape character (i.e. \,
and \n
).
All escaped characters must be de-escaped on parsing, in the same way as for JSON.
There must be a single header row containing column names.
There must be no non-data rows or columns.
Blank rows (e.g. at the end of the file) will be ignored.
The number of columns must be the same in all rows.
Dates and times must be in ISO 8601 format.
Example:
header_1,header_2,header_3
simple value,value with a \, in,value with a \n in
value with whitespace at the end ,value with a " in,value with a \t in
(covers most current uses; dialect options for the parser)
row terminator: LF or CRLF
column separator: , or ;
enclosure: " or '
encoding: any
If the encoding, row terminator, column separator or enclosure are not specified, they may be auto-detected.
Any field may be enclosed with the enclosure character. Fields containing special characters must be enclosed.
If the enclosure character occurs in a field value, it must be repeated (e.g. ""
, Excel) or preceded with a backslash (e.g. \"
, Unix).
There may be zero, one or multiple header rows.
There may be zero, one or multiple header columns.
Rows with less columns than the header will be padded with empty values.
All whitespace is significant.
Comment lines must be at the start of the file and prefixed with #
.
Dates and times must be in an RFC2822 or ISO 8601 format.
Example:
# a comment line
header_1;header_2;header_3
simple value;"value with a ; in";"value with a
in"
value with whitespace at the end ;"value with a "" in";"value with a in"
(covers most possible uses; dialect and parsing options for the parser)
row terminator: any, default CRLF
column separator: any, default "
escape character: any, default \
enclosure: any, default "
encoding: any, default UTF-8
Trim unenclosed whitespace: optional
Remove leading/trailing rows/columns: optional
Convert date formats: optional
Example:
free text at the top of the file
some more free text
header_1|header_2| header_3
simple value|"value with a | in"|"value with a
in"
"value with whitespace at the end "|"value with a "" in"|"value with a in"
summary row
some more rows at the end of the file