-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DataFrame.iat indexing with duplicate columns #11754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
The dataset was too big, and trying to reduce the example proved to be harder than I though (the dataset is merged over and over in several loops). Any hint in what could cause this knowing the traceback? Or what to look for in the resulting df so that I can come up with an example? |
pls post |
The dataset has too many columns (>2k) for data.head() to be of any use here.
Notice the first/second column types, are pasted here literally (looks like come memory corruption). Indeed, I also noticed now that if I break out of the loop with an exit() python segfaults... |
it looks like you have an actual pandas object (DataFrame maybe) inside each cell in the TYPE column. not surprised it doesn't work then. This is theory is ok, except for indexing does not work very well with this. show |
They work:
|
show the |
It's just a constant string:
I actually drop it later. To me it looks like it's the second column that could be potentially doubtful. But I'll have to dig into this more closely, now that I noticed the segfault, and it's quite reproducible, there are a few modules (such as xlrd) from my test program that I can remove by going through a few more hoops. |
Turns out, |
Indeed, that seem to be the only issue I had. By ensuring names are unique, I also don't have more segmentation faults. Would it make sense to check the value provided directly in the constructor, or is it too expensive? |
so can you post a short repro? |
|
@wavexx thanks!
|
pull-requests are welcome! |
What should be the desired outcome here? Handling duplicate columns in indexing through-out, or refusing to handle duplicate columns? |
duplicate indexing is handled pretty well, see how |
Ok. But there's probably some other bug which I still didn't figure out.
That second column turned out to be named as a single space ( I couldn't reproduce it succinctly yet, but I think pd.merge might also have some other edge cases with duplicated columns. |
@wavexx maybe, but this is a clear easy-to-repro bug. if you can find the source of the other then pls open a new report. |
Since there's intention to handle duplicated indexing, I opened a couple of issues for some cases I think should be improved. |
Related for
|
Has anybody got a shot at fixing this? I still get bitten by this from time to time. I wouldn't say duplicate columns name are "well" handled until plain ordinal intexing doesn't even work :( |
This is fixed on master
|
fixed in #32089 which is not yet released, so could also add a whatsnew to 1.1 for this issue. dafec63 is the first new commit
|
I have some weird issue in a DataFrame I'm creating from a row-based array.
Using python3 and pandas 0.17.1 (from debian unstable), I get:
Interestingly, I can otherwise manage the dataframe just fine.
The same code, running under python2.7 shows no issue.
What could be the cause of such error?
The text was updated successfully, but these errors were encountered: