Skip to content

Help with CSV parsing errors #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
janimo opened this issue Dec 1, 2017 · 1 comment
Open

Help with CSV parsing errors #18

janimo opened this issue Dec 1, 2017 · 1 comment

Comments

@janimo
Copy link
Contributor

janimo commented Dec 1, 2017

Pandas read_csv throws an exception when encountering a line that seems to have too many fields, but it can be made to skip these bad lines and then report them on stdout if passed error_bad_lines=True. While Pandas does not make it easy to deal with these lines ( pandas-dev/pandas#5686 ) , it would be nice if csvs-to-sqlite could offer something. Maybe parsing read_csv ouput and then traversing the file and save the bad lines separately so the user can fix and reprocess them?

@janimo
Copy link
Contributor Author

janimo commented Dec 8, 2017

With --skip-errors one can take stderr output and sed/grep those lines from the csv and fix them up separately. It would still be helpful if this tool dumped the lines somewhere.

Do you see csvs-to-sql as a tool that should handle most scenarios by itself eventually (error handling, remote files, compressed formats) or being used along with other established command-line tools ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant