-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: agg
with dictlike and non-unique col will return wrong type
#52115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
agg
with dictlike and non-unique col will return wrong type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can at least partially avoid some perf regression here, suggestions below. Can you run DataFrame / Groupby ASVs on this.
Some changes have been made, but sorry that I’m not familiar with ASVs. I’m still learning and may need some time.
|
run
run Given the above, it appears that there is a need for further performance improvement.
Therefore, it means we should replace the usage of the dict-like Additionally, considering the improvement of the code's features, it may be helpful to use list comprehension and cc @rhshadrach |
For now,
|
@luke396 - I don't understand how the changes here could increase performance. When I run ASVs on this branch in groupby, I get:
Can you confirm? |
@rhshadrach - It is possible that the differing ASV results are due to the use of different commit versions. I apologize for any confusion caused by my imprecise language, as I am not very familiar with this topic. I attempted to compare the different commit versions of the branch (ade2efe and 2cba11a) with the main branch (3c44745). The results showed a significant difference, with minimal differences between ade2efe and main, and a significant decrease with 2cba11a compared to main (even much more than the differences shown below). However, I do not believe that this significant performance change can be attributed solely to the code changes made in this PR. It could be due to changes made in the main branch's code. In any case, if we compare the results from the main branch (3c44745) to latest of this branch (5daa349), we can see the differences below.
|
The main changes I've made are replacing the use of the dictionary |
I don't believe any of the results from the ASVs posted here hit the code that is changing in this PR. Correct me if you think this is wrong. In order to get accurate results, you should not have any other processes running in the foreground (e.g. a web browser) when running the ASVs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - just some minor requests.
I think you're right! I didn't notice that other processes running might be influencing the results. I definitely need to learn more about ASVs in depth. Do you think we should run a new ASV analysis instead of relying on the previous one I posted? It's possible that the previous analysis (the last one) wasn't accurate either. |
@luke396 - I'm satisfied with the ASVs I posted; they were run without any other foreground processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Fine! 😄 |
Thanks @luke396 |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.