-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Performance of sum vs mean on Bool arrays is 10x different #19133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you post the timings you get? And can you try both with / without bottleneck? If that doesn't explain the difference, then when you get a chance, could you upgrade to pandas master and maybe profile https://github.com/pandas-dev/pandas/blob/master/pandas/core/nanops.py a bit? |
Spot on! K = 100000000
df = pd.DataFrame(list(range(K)))
mask = df[0] > K/2 Without bottleneck: With bottleneck timings are much more inline: So the solution would be to use Bottleneck ? Thanks |
im seeing indistinguishable timings both with and without bottleneck. this is on a pre-M1 mac. could use confirmation. |
Same here, closing |
Code Sample
Problem description
Doing "sum" and "mean" on boolean pandas masks is 10x different! This clearly should not be the case, given these are identical operations.
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 33.1.1
Cython: 0.25.2
numpy: 1.12.0
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: 2.7.3 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: