-
Notifications
You must be signed in to change notification settings - Fork 1.9k
DataFrame.LoadCsv can not load CSV with duplicate column names #6182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If anyone has same problem renamed header names can put in the parameter. This solves my issue. I do not know if LoadCsv should have this functionality inbuilt or not (or, issue to be closed or not) LoadCsv with renamed columns:
|
@luisquintanilla @torronen is this something we think should be built in to load? I can see it going both ways honestly. I would probably lean towards having it built in somehow. |
@michaelgsharp by builtin do you mean duplicate columns are automatically renamed like the snippet @torronen shared? |
+1 - I also experienced pain around large data files with multiple columns sharing names. I only had less than 50 columns and it was still troublesome to deal with. |
Code:
IDataView trainData = DataFrame.LoadCsv(TrainDatasetPath, separator: ';', header: true, guessRows: 100);
Gives exception:
DataFrame already contains a column called Target20 (Parameter 'column')
Suggestion:
It would be nice if LoadCsv would have the option to ignore or auto-rename duplicate columns.
For small CSV files it is not a big problem, but for huge CSV files renaming headers is a hassle.
The text was updated successfully, but these errors were encountered: