Help with CSV parsing errors #18

janimo · 2017-12-01T13:10:15Z

Pandas read_csv throws an exception when encountering a line that seems to have too many fields, but it can be made to skip these bad lines and then report them on stdout if passed error_bad_lines=True. While Pandas does not make it easy to deal with these lines ( pandas-dev/pandas#5686 ) , it would be nice if csvs-to-sqlite could offer something. Maybe parsing read_csv ouput and then traversing the file and save the bad lines separately so the user can fix and reprocess them?

janimo · 2017-12-08T18:40:27Z

With --skip-errors one can take stderr output and sed/grep those lines from the csv and fix them up separately. It would still be helpful if this tool dumped the lines somewhere.

Do you see csvs-to-sql as a tool that should handle most scenarios by itself eventually (error handling, remote files, compressed formats) or being used along with other established command-line tools ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with CSV parsing errors #18

Help with CSV parsing errors #18

janimo commented Dec 1, 2017

janimo commented Dec 8, 2017

Help with CSV parsing errors #18

Help with CSV parsing errors #18

Comments

janimo commented Dec 1, 2017

janimo commented Dec 8, 2017