Skip to content

CLN: ASV index_object benchmark #18758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mroeschke
Copy link
Member

  • Remove star imports and flake8 checked

  • Moved some index constructor benchmarks to ctors.py (probably should rename this file in the future)

  • Moved some index indexing benchmarks to indexing.py

asv dev -b ^index_object
· Discovering benchmarks
· Running 32 total benchmarks (1 commits * 1 environments * 32 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ··· Setting up /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/index_object.py:160
[  3.12%] ··· Running index_object.MultiIndexValues.time_datetime_level_values_copy                     30.1ms
[  6.25%] ··· Running index_object.MultiIndexValues.time_datetime_level_values_sliced                    544μs
[  9.38%] ··· Running index_object.Datetime.time_is_dates_only                                           357μs
[ 12.50%] ··· Running index_object.Duplicated.time_duplicated                                            228ms
[ 15.62%] ··· Running index_object.IndexAppend.time_append_int_list                                      285ms
[ 18.75%] ··· Running index_object.IndexAppend.time_append_obj_list                                      299ms
[ 21.88%] ··· Running index_object.IndexAppend.time_append_range_list                                    310ms
[ 25.00%] ··· Running index_object.Ops.time_add                                                             ok
[ 25.00%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   6.22ms 
                 int    5.53ms 
               ======= ========

[ 28.12%] ··· Running index_object.Ops.time_divide                                                          ok
[ 28.12%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   5.59ms 
                 int    20.7ms 
               ======= ========

[ 31.25%] ··· Running index_object.Ops.time_modulo                                                          ok
[ 31.25%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   15.8ms 
                 int    19.1ms 
               ======= ========

[ 34.38%] ··· Running index_object.Ops.time_multiply                                                        ok
[ 34.38%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   5.55ms 
                 int    5.82ms 
               ======= ========

[ 37.50%] ··· Running index_object.Ops.time_subtract                                                        ok
[ 37.50%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   5.53ms 
                 int    5.51ms 
               ======= ========

[ 40.62%] ··· Running index_object.Range.time_max                                                       65.6ms
[ 43.75%] ··· Running index_object.Range.time_max_trivial                                               61.5ms
[ 46.88%] ··· Running index_object.Range.time_min                                                       62.4ms
[ 50.00%] ··· Running index_object.Range.time_min_trivial                                               63.3ms
[ 53.12%] ··· Running index_object.SetOperations.time_datetime_difference                               14.2ms
[ 56.25%] ··· Running index_object.SetOperations.time_datetime_difference_disjoint                      8.42ms
[ 59.38%] ··· Running index_object.SetOperations.time_datetime_intersection                             1.35ms
[ 62.50%] ··· Running index_object.SetOperations.time_datetime_symmetric_difference                     18.5ms
[ 65.62%] ··· Running index_object.SetOperations.time_datetime_union                                     895μs
[ 68.75%] ··· Running index_object.SetOperations.time_index_datetime_intersection                       6.96ms
[ 71.88%] ··· Running index_object.SetOperations.time_index_datetime_union                              6.91ms
[ 75.00%] ··· Running index_object.SetOperations.time_int64_difference                                  13.6ms
[ 78.12%] ··· Running index_object.SetOperations.time_int64_intersection                                6.33ms
[ 81.25%] ··· Running index_object.SetOperations.time_int64_symmetric_difference                        20.0ms
[ 84.38%] ··· Running index_object.SetOperations.time_int64_union                                       12.1ms
[ 87.50%] ··· Running index_object.SetOperations.time_str_difference                                    6.12ms
[ 90.62%] ··· Running index_object.SetOperations.time_str_symmetric_difference                          11.8ms
[ 93.75%] ··· Running index_object.Sortlevel.time_sortlevel_int64                                        775ms
[ 96.88%] ··· Running index_object.Sortlevel.time_sortlevel_one                                         18.2ms
[100.00%] ··· Running index_object.Sortlevel.time_sortlevel_zero                                        20.5ms

@pep8speaks
Copy link

pep8speaks commented Dec 13, 2017

Hello @mroeschke! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 24, 2017 at 06:27 Hours UTC

@codecov
Copy link

codecov bot commented Dec 13, 2017

Codecov Report

Merging #18758 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18758      +/-   ##
==========================================
- Coverage   91.59%   91.57%   -0.03%     
==========================================
  Files         150      150              
  Lines       48959    48959              
==========================================
- Hits        44843    44833      -10     
- Misses       4116     4126      +10
Flag Coverage Δ
#multiple 89.93% <ø> (-0.03%) ⬇️
#single 41.13% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_converter.py 65.22% <0%> (-1.74%) ⬇️
pandas/util/testing.py 84.9% <0%> (+0.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cdebcf3...add0d71. Read the comment docs.

@jreback jreback added the Benchmark Performance (ASV) benchmarks label Dec 13, 2017
self.idx_rng2 = self.idx_rng[:(-1)]
fmt = '%Y-%m-%d %H:%M:%S'
self.date_str_left = Index(self.dates_left.strftime(fmt))
self.date_str_right = self.date_str_left[:-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could parametrize these over the actual operations (e.g. difference, intersection, etc). if you think its worth it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one black sheep benchmark, time_datetime_difference_disjoint, that makes it a little more work to parametrize, but if i separate it out to a new class the rest should be easy to parametrize

@jreback jreback added this to the 0.22.0 milestone Dec 13, 2017
@mroeschke
Copy link
Member Author

Parametrized the SetOperations benchmark and standarized the inputs a little more.

asv dev -b ^index_object
· Discovering benchmarks
· Running 21 total benchmarks (1 commits * 1 environments * 21 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ··· Setting up /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/index_object.py:132
[  4.76%] ··· Running index_object.MultiIndexValues.time_datetime_level_values_copy                                     25.5ms
[  9.52%] ··· Running index_object.MultiIndexValues.time_datetime_level_values_sliced                                    569μs
[ 14.29%] ··· Running index_object.Datetime.time_is_dates_only                                                           359μs
[ 19.05%] ··· Running index_object.Duplicated.time_duplicated                                                            200ms
[ 23.81%] ··· Running index_object.IndexAppend.time_append_int_list                                                      286ms
[ 28.57%] ··· Running index_object.IndexAppend.time_append_obj_list                                                      301ms
[ 33.33%] ··· Running index_object.IndexAppend.time_append_range_list                                                    301ms
[ 38.10%] ··· Running index_object.Ops.time_add                                                                             ok
[ 38.10%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   5.47ms 
                 int    5.42ms 
               ======= ========

[ 42.86%] ··· Running index_object.Ops.time_divide                                                                          ok
[ 42.86%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   5.49ms 
                 int    20.6ms 
               ======= ========

[ 47.62%] ··· Running index_object.Ops.time_modulo                                                                          ok
[ 47.62%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   15.8ms 
                 int    19.1ms 
               ======= ========

[ 52.38%] ··· Running index_object.Ops.time_multiply                                                                        ok
[ 52.38%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   5.51ms 
                 int    5.63ms 
               ======= ========

[ 57.14%] ··· Running index_object.Ops.time_subtract                                                                        ok
[ 57.14%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   5.45ms 
                 int    7.11ms 
               ======= ========

[ 61.90%] ··· Running index_object.Range.time_max                                                                       73.6ms
[ 66.67%] ··· Running index_object.Range.time_max_trivial                                                               78.6ms
[ 71.43%] ··· Running index_object.Range.time_min                                                                       77.0ms
[ 76.19%] ··· Running index_object.Range.time_min_trivial                                                               77.1ms
[ 80.95%] ··· Running index_object.SetDisjoint.time_datetime_difference_disjoint                                        10.4ms
[ 85.71%] ··· Running index_object.SetOperations.time_operation                                                             ok
[ 85.71%] ···· 
               ============= ============== ======== ======================
               --                                method                    
               ------------- ----------------------------------------------
                   dtype      intersection   union    symmetric_difference 
               ============= ============== ======== ======================
                  datetime       7.12ms      6.45ms          41.3ms        
                date_string      73.6ms      77.9ms          163ms         
                    int          3.33ms      2.74ms          21.8ms        
                  strings        66.0ms      246ms           89.6ms        
               ============= ============== ======== ======================

[ 90.48%] ··· Running index_object.Sortlevel.time_sortlevel_int64                                                        775ms
[ 95.24%] ··· Running index_object.Sortlevel.time_sortlevel_one                                                         18.2ms
[100.00%] ··· Running index_object.Sortlevel.time_sortlevel_zero                                                        18.6ms

@@ -369,3 +369,47 @@ def time_assign_with_setitem(self):
self.df[i] = np.random.randn(self.N)


class Float64(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally keep those in the index benchmarks, as indexing an index is something completely different (implementation wise) than indexing a series/dataframe

@mroeschke
Copy link
Member Author

@jorisvandenbossche consolidated those index indexing benchmarks back to index_object.py

asv dev -b ^index_object
[  0.00%] ··· Setting up /home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/index_object.py:133
[  3.85%] ··· Running index_object.MultiIndexValues.time_datetime_level_values_copy                                       25.7ms
[  7.69%] ··· Running index_object.MultiIndexValues.time_datetime_level_values_sliced                                      531μs
[ 11.54%] ··· Running index_object.Datetime.time_is_dates_only                                                             361μs
[ 15.38%] ··· Running index_object.Duplicated.time_duplicated                                                              196ms
[ 19.23%] ··· Running index_object.IndexAppend.time_append_int_list                                                        285ms
[ 23.08%] ··· Running index_object.IndexAppend.time_append_obj_list                                                        303ms
[ 26.92%] ··· Running index_object.IndexAppend.time_append_range_list                                                      300ms
[ 30.77%] ··· Running index_object.Indexing.time_boolean_array                                                                ok
[ 30.77%] ···· 
               ======== ========
                dtype           
               -------- --------
                String   35.1ms 
                Float    7.03ms 
                 Int     6.96ms 
               ======== ========

[ 34.62%] ··· Running index_object.Indexing.time_boolean_series                                                               ok
[ 34.62%] ···· 
               ======== ========
                dtype           
               -------- --------
                String   35.1ms 
                Float    7.29ms 
                 Int     7.00ms 
               ======== ========

[ 38.46%] ··· Running index_object.Indexing.time_get                                                                          ok
[ 38.46%] ···· 
               ======== ========
                dtype           
               -------- --------
                String   26.4μs 
                Float    27.8μs 
                 Int     27.4μs 
               ======== ========

[ 42.31%] ··· Running index_object.Indexing.time_slice                                                                        ok
[ 42.31%] ···· 
               ======== ========
                dtype           
               -------- --------
                String   55.7μs 
                Float    54.8μs 
                 Int     56.5μs 
               ======== ========

[ 46.15%] ··· Running index_object.Indexing.time_slice_step                                                                   ok
[ 46.15%] ···· 
               ======== ========
                dtype           
               -------- --------
                String   69.7μs 
                Float    54.0μs 
                 Int     56.7μs 
               ======== ========

[ 50.00%] ··· Running index_object.Ops.time_add                                                                               ok
[ 50.00%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   5.51ms 
                 int    5.48ms 
               ======= ========

[ 53.85%] ··· Running index_object.Ops.time_divide                                                                            ok
[ 53.85%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   5.55ms 
                 int    20.6ms 
               ======= ========

[ 57.69%] ··· Running index_object.Ops.time_modulo                                                                            ok
[ 57.69%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   15.8ms 
                 int    19.1ms 
               ======= ========

[ 61.54%] ··· Running index_object.Ops.time_multiply                                                                          ok
[ 61.54%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   5.36ms 
                 int    5.51ms 
               ======= ========

[ 65.38%] ··· Running index_object.Ops.time_subtract                                                                          ok
[ 65.38%] ···· 
               ======= ========
                dtype          
               ------- --------
                float   5.57ms 
                 int    5.50ms 
               ======= ========

[ 69.23%] ··· Running index_object.Range.time_max                                                                         62.2ms
[ 73.08%] ··· Running index_object.Range.time_max_trivial                                                                 62.7ms
[ 76.92%] ··· Running index_object.Range.time_min                                                                         62.9ms
[ 80.77%] ··· Running index_object.Range.time_min_trivial                                                                 62.0ms
[ 84.62%] ··· Running index_object.SetDisjoint.time_datetime_difference_disjoint                                          7.53ms
[ 88.46%] ··· Running index_object.SetOperations.time_operation                                                               ok
[ 88.46%] ···· 
               ============= ============== ======== ======================
               --                                method                    
               ------------- ----------------------------------------------
                   dtype      intersection   union    symmetric_difference 
               ============= ============== ======== ======================
                  datetime       6.16ms      5.93ms          30.9ms        
                date_string      69.5ms      69.3ms          134ms         
                    int          2.69ms      2.88ms          21.9ms        
                  strings        66.7ms      250ms           88.0ms        
               ============= ============== ======== ======================

[ 92.31%] ··· Running index_object.Sortlevel.time_sortlevel_int64                                                          771ms
[ 96.15%] ··· Running index_object.Sortlevel.time_sortlevel_one                                                           18.8ms
[100.00%] ··· Running index_object.Sortlevel.time_sortlevel_zero                                                          18.4ms

@jreback
Copy link
Contributor

jreback commented Dec 23, 2017

can you rebase

@mroeschke mroeschke force-pushed the asv_clean_index_object branch from 9da1480 to add0d71 Compare December 24, 2017 06:27
@mroeschke
Copy link
Member Author

rebased and all green.

@jorisvandenbossche jorisvandenbossche merged commit e85f432 into pandas-dev:master Dec 26, 2017
@jorisvandenbossche
Copy link
Member

Thanks!

hexgnu pushed a commit to hexgnu/pandas that referenced this pull request Dec 28, 2017
@mroeschke mroeschke deleted the asv_clean_index_object branch December 31, 2017 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants