CSV Parsing

Aside: in many places where a character or boolean is specified, it could alternatively be a list, regular expression, function or class.

Open the file as a stream, with a specific encoding.
```
 Parameter: encoding
 Default: UTF-8
```
Find the rows.

Need to define the escape character first, as it changes the meaning of all other special characters:
```
 Parameter: escape
 Default: \
 Options: character, false
 Note: not supported in all parsers
```
Need to define the enclosure character early, as row terminators and column separators within enclosures do not count as special characters:
```
 Parameter: enclosure
 Default: "
```
Note: the enclosure character repeated twice in a field will count as one character, if not escaped (Excel legacy)

Note: cells may either be enclosed always, sometimes (e.g. if special characters/non-integers occur in values) or never.
```
 Parameter: row_terminator 
 Default: \n
 Examples: \n, \r, \r\n
 Examples: LF, CR, CRLF
 Examples: unix, windows, mac, unicode
```
Skip n rows.
```
 Parameter: skip_rows
 Default: 0
```
These rows could be any data at the start of the file, but are often comments (see below), free text description of the table, provenance and/or other metadata.
(optional) In the skipped rows, find comment rows.
```
 Parameter: comment_prefix
 Default: #
```
Recognise n header rows.
```
 Parameter: header_rows
 Default: 0
```
Split each row into columns.
```
 Parameter: column_separator
 Default: ,
```
Note: this character inside an enclosure does not count as a separator.
Skip n columns of each row.
```
 Parameter: skip_columns
 Default: 0
```
Recognise n header columns.
```
 Parameter: header_columns
 Default: 0
```
Read the values of each header cell.

Remove insignificant whitespace:
```
 Parameter: trim
 Default: true
 Options: true, false, start, end
```
Note: whitespace inside an enclosure is always significant

Build a key for each column, using values from the header row(s).

Note: skip header columns.

Parameter: fields
Default: [] (read from header row or column index)

Parameter: column_prefix
Default: null

* If field names are provided, use them as the key.
* If no field names but a header row is present, use the header cell values as keys.
* If there are multiple header rows, use an array of cell values.
* If no field names are provided and there is no header row, use the column index (with optional column prefix).
* If the same key is repeated multiple times, add an incrementing suffix to the key.

Read each non-header row in the table and build a key/value data table.

Skip blank rows, optionally:
```
Parameter: skip_blank_rows
Default: true
```
Blank rows may be significant, if they are used to demarcate table sections or represent missing measurements.
```
Parameter: fill_rows
Default: true
Description: If there are less cells in the row than fields, add empty values.

Parameter: trim_rows
Default: false
Description: Remove empty cells from the end of the row
Note: is this needed?
```
If there are more cells in the row than fields, throw an exception.

To generate the key for each row, use the value from any header column(s), or the row index.

Combine column and row keys to make the cell key.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV Parsing

Clone this wiki locally