-
-
Notifications
You must be signed in to change notification settings - Fork 571
feat(import,csv,psv): Add support for importing CSV and PSV files without header rows #9204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(import,csv,psv): Add support for importing CSV and PSV files without header rows #9204
Conversation
fda0260
to
bd40600
Compare
running workflows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, David! This is a very clean fix.
It seems like the newly added tests aren't currently passing though, and I left a couple of comments pertaining to documentation, tests, and code duplication. After that, this should be good to merge.
Don't worry about the "Benchmark SQL Correctness" check failing: that's a known issue for PRs that come from forks. If the other checks are green, we'll be good. |
4c86051
to
700a067
Compare
hi @nicktobey I think I got it in a better position, so its "ready for review". FYI I realized trying to use no-headers and columns while streaming in the csv was panicking. So, I added error handling where thanks for reviewing. |
700a067
to
597b19a
Compare
// CreateCSVInfo creates a CSVInfo object based on the provided options. | ||
// This is a helper function that extracts and processes CSV-related options | ||
// from the generic options interface. | ||
func CreateCSVInfo(opts interface{}, defaultDelim string) *csv.CSVFileInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how y'all feel about this file name. Wasn't sure if I should name it after helper func, or keep it a generic helper file. 🤷♂️
@@ -325,6 +330,14 @@ func validateImportArgs(apr *argparser.ArgParseResults) errhand.VerboseError { | |||
return errhand.BuildDError("parameters %s and %s are mutually exclusive", allTextParam, schemaParam).Build() | |||
} | |||
|
|||
if apr.Contains(noHeaderParam) && !apr.Contains(columnsParam) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar handling in the issues I had testing with std streaming of csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicktobey FYI, this ^ comment is somewhat confusing. I meant in that I was similarly handling configs based on table import options, not panic handling from stdin streaming of csv
This feature allows users to import CSV files where the first row contains data rather than column names, matching the behavior found in MySQL and SQLite: - Add new flag --no-header to treat first row as data instead of column headers - Add new option --columns to specify column names when no header is present - Require --columns when creating new tables with --no-header - Provide helpful error messages with usage guidance - Update validation to ensure proper combination of options - Add comprehensive tests for the new functionality Closes: dolthub#7831
86b031d
to
ceb25a7
Compare
Good catch noticing the panic, and thanks for including the fix for that in your PR. I left one comment about some confusion on my part, but otherwise I like this. I'm running the CI workflows now. |
This commit fixes an issue where importing CSV data from stdin using --create-table would result in a nil pointer panic. It adds proper error handling for this case by: 1. Improving error checking in validateImportArgs to verify schema file is provided when using stdin with --create-table 2. Adding a nil check for rdSchema in newImportSqlEngineMover before attempting to access it 3. Updating the parameter description for --columns to clarify that it overrides header names when used without --no-header 4. Adding tests to verify error messages and behavior with stdin imports The improved error handling provides clear, helpful error messages instead of panicking, and documents the workaround of creating the table first and then using -u (update) mode for importing. Closes: dolthub#7831
ceb25a7
to
4a1c1f6
Compare
This looks good! I forgot to mention: since the newly added bats test files are testing functionality that only works when run against a local database, we need to make sure that they don't get run when testing remote functionality. (That's why the "Test Bats Unix Remote" test is failing.) We can address that by adding entries to the After that, I think this PR is good to go! (As a side note, I noticed that you've been rebasing your branch in response to PR feedback. While rebasing creates a cleaner commit history, it makes the PR process more confusing because it makes it harder to see what changed, and comments left on the old diff may not be visible on the new diff. Once a PR has comments on it, we try to avoid rebasing.) |
Thank you. I'll change that tonight. Sorry about the rebases. Also, I'm sorry if I missed that in the Contributing markdown. Thanks again |
This update ensures that the new import-no-header-csv.bats and import-no-header-psv.bats tests are skipped when running remote tests, as they only work against a local database.
Added, thanks again. |
I've merged the PR. Thank you for implementing this. I've had to work around this in the past, so I'm relieved that it's a built-in feature now. |
Awesome! Glad to hear that. Thank you for letting me contribute and reviewing my first Dolt contribution. I really appreciate it. cheers |
Summary
In short, this feature makes Dolt more compatible with MySQL/SQLite workflows and provides users with more flexibility when importing data.
Problem
Previously, Dolt always expected the first row of CSV/PSV files to contain column names. This differs from MySQL and SQLite which support importing files where the first row contains data. Users migrating from these systems or working with headerless data files couldn't import them without modifying their files.
Additionally, when users attempted to import data from stdin using --create-table, they would encounter a nil pointer panic instead of receiving a error message.
Solution
The implementation adds:
Testing