-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Issue Description:
Hello.
I have discovered a performance degradation in the .loc function of pandas version below 2.1 when .loc handling big DataFrame with non-unique indexes. When using pandas more than 4 indexes, .loc drastically increases to X1000 times. And I noticed that some parts of the repository depend on the pandas version below 2.1. For example, ML Projects/NLP_WebAPP_Twitter_Sentiment_Analysis_knowledge_graph/venv/Lib/site-packages/pandas/_version.py
depends on pandas 2.0.0, ML Projects/Multi Class News Classification Project/requirements.txt
depends on pandas 2.0.3. I found that many files in ML Projects/NLP_WebAPP_Twitter_Sentiment_Analysis_knowledge_graph
used the influenced api. There may be more files using the influenced api. I am not sure whether this performance problem in pandas will affect this repository. Here are some discussions on GitHub related to this issue, including #54550 and #54746.
Suggestion
I would recommend considering an upgrade to a different version of pandas >= 2.1 or exploring other solutions to optimize the performance of .loc .
Any other workarounds or solutions would be greatly appreciated.
Thank you!