open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 #1301

friedrichknuth · 2017-03-07T21:16:53Z

I noticed a big speed discrepancy between xarray versions 0.8.2 and 0.9.1 when using open_mfdataset() on a dataset ~ 1.2 GB in size, consisting of 3 files and using netcdf4 as the engine. 0.8.2 was run first, so this is probably not a disk caching issue.

Test

import xarray as xr
import time

start_time = time.time()
ds0 = xr.open_mfdataset('./*.nc')
print("--- %s seconds ---" % (time.time() - start_time))

Result

xarray==0.8.2, dask==0.11.1, netcdf4==1.2.4

--- 0.736030101776 seconds ---

xarray==0.9.1, dask==0.13.0, netcdf4==1.2.4

--- 52.2800869942 seconds ---

The text was updated successfully, but these errors were encountered:

jhamman · 2017-03-08T00:22:10Z

I've also noticed that we have a bottleneck here.

@shoyer - any idea what we changed that could impact this? Could this be coming from a change upstream in dask?

shoyer · 2017-03-08T00:38:55Z

Wow, that is pretty bad.

Try setting compat='broadcast_equals' in the open_mfdataset call, to restore the default value of that parameter prior v0.9.

If that doesn't help, try downgrading dask to see if it's responsible. Profiling results from %prun in IPython would also be helpful at tracking down the culprit.

jhamman · 2017-03-08T01:13:35Z

This is what I'm seeing for my %prun profiling:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      204   19.783    0.097   19.783    0.097 {method 'acquire' of '_thread.lock' objects}
89208/51003    2.524    0.000    5.553    0.000 indexing.py:361(shape)
        1    1.359    1.359   37.876   37.876 <string>:1(<module>)
71379/53550    1.242    0.000    3.266    0.000 utils.py:412(shape)
   538295    0.929    0.000    1.317    0.000 {built-in method builtins.isinstance}
24674/13920    0.836    0.000    4.139    0.000 _collections_abc.py:756(update)
        9    0.788    0.088    0.803    0.089 netCDF4_.py:178(_open_netcdf4_group)```

Weren't there some recent changes to the thread lock related to dask distributed?

shoyer · 2017-03-08T01:16:58Z

Hmm. It might be interesting to try lock=threading.Lock() to revert to the old version of the thread lock as well.

mangecoeur · 2017-03-08T14:20:30Z

My 2cents - I've found that with big files any %prun tends to show method 'acquire' of '_thread.lock' as one of the highest time but it's not necessarily indicative of where the perf issue comes from because it's effectively just waiting for IO which is always slow. One thing that helps get a better profile is setting dask backend to the non-parallel sync option which gives cleaner profiles.

shoyer · 2017-03-08T17:35:49Z

One thing that helps get a better profile is setting dask backend to the non-parallel sync option which gives cleaner profiles.

Indeed, this is highly recommended, see http://dask.pydata.org/en/latest/faq.html

rabernat · 2017-03-08T19:52:11Z

I just tried this on a few different datasets. Comparing python 2.7, xarray 0.7.2, dask 0.7.1 (an old environment I had on hand) with python 2.7, xarray 0.9.1-28-g1cad803, dask 0.13.0 (my current "production" environment), I could not reproduce. The up-to-date stack was faster by a factor of < 2.

najascutellatus · 2017-03-13T19:12:13Z

Data: Five files that are approximately 450 MB each.

venv1
dask 0.13.0 py27_0 conda-forge
xarray 0.8.2 py27_0 conda-forge
1.51642394066 seconds to load using open_mfdataset

venv2:
dask 0.13.0 py27_0 conda-forge
xarray 0.9.1 py27_0 conda-forge
279.011202097 seconds to load using open_mfdataset

I ran the same code in the OP on two conda envs with the same version of dask but two different versions of xarray. There was a significant difference in load time between the two conda envs.

I've posted the data on my work site if anyone wants to double check: https://marine.rutgers.edu/~michaesm/netcdf/data/

rabernat · 2017-03-13T19:39:15Z

There is definitely something funky with these datasets that is causing xarray to go very slow.

This is fast:

>>> %time dsets = [xr.open_dataset(fname) for fname in glob('*.nc')]
CPU times: user 1.1 s, sys: 664 ms, total: 1.76 s
Wall time: 1.78 s

But even just trying to print the repr is slow

>>> %time print(dsets[0])
CPU times: user 3.66 s, sys: 3.49 s, total: 7.15 s
Wall time: 7.28 s

Maybe some of this has to do with the change at 0.9.0 to allowing index-less dimensions (i.e. coordinates are optional). All of these datasets have such a dimension, e.g.

<xarray.Dataset>
Dimensions:                                     (obs: 7247697)
Coordinates:
    lon                                         (obs) float64 -124.3 -124.3 ...
    lat                                         (obs) float64 44.64 44.64 ...
    time                                        (obs) datetime64[ns] 2014-11-10T00:00:00.011253 ...
Dimensions without coordinates: obs
Data variables:
    oxy_calphase                                (obs) float64 3.293e+04 ...
    quality_flag                                (obs) |S2 'ok' 'ok' 'ok' ...
    ctdbp_no_seawater_conductivity_qc_executed  (obs) uint8 29 29 29 29 29 ...
...

rabernat · 2017-03-13T19:40:50Z

And the length of obs is different in each dataset.

>>> for myds in dsets:
            print(myds.dims)
Frozen(SortedKeysDict({u'obs': 7537613}))
Frozen(SortedKeysDict({u'obs': 7247697}))
Frozen(SortedKeysDict({u'obs': 7497680}))
Frozen(SortedKeysDict({u'obs': 7661468}))
Frozen(SortedKeysDict({u'obs': 5750197}))

friedrichknuth · 2017-03-13T19:41:25Z

Looks like the issue might be that xarray 0.9.1 is decoding all timestamps on load.

xarray==0.9.1, dask==0.13.0

da.set_options(get=da.async.get_sync)
%prun -l 10 ds = xr.open_mfdataset('./*.nc')

         167305 function calls (160352 primitive calls) in 59.688 seconds

   Ordered by: internal time
   List reduced from 625 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       18   57.057    3.170   57.057    3.170 {pandas.tslib.array_to_timedelta64}
       39    0.860    0.022    0.863    0.022 {operator.getitem}
       48    0.402    0.008    0.402    0.008 {numpy.core.multiarray.arange}
4341/2463    0.257    0.000    0.273    0.000 utils.py:412(shape)
       88    0.245    0.003    0.245    0.003 {pandas.algos.ensure_object}
       48    0.158    0.003    0.561    0.012 indexing.py:318(_index_indexer_1d)
    36/18    0.135    0.004   57.509    3.195 timedeltas.py:150(_convert_listlike)
       18    0.126    0.007    0.130    0.007 nanops.py:815(_checked_add_with_arr)
       51    0.070    0.001    0.070    0.001 {method 'astype' of 'numpy.ndarray' objects}
  676/475    0.047    0.000   58.853    0.124 {numpy.core.multiarray.array}

pandas.tslib.array_to_timedelta64 appears to be the most expensive item on the list, and isn't being run when using xarray 0.8.2.

xarray==0.8.2, dask==0.13.0

da.set_options(get=da.async.get_sync)
%prun -l 10 ds = xr.open_mfdataset('./*.nc')


         140668 function calls (136769 primitive calls) in 0.766 seconds

   Ordered by: internal time
   List reduced from 621 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
2571/1800    0.178    0.000    0.184    0.000 utils.py:387(shape)
       18    0.174    0.010    0.174    0.010 {numpy.core.multiarray.arange}
       16    0.079    0.005    0.079    0.005 {numpy.core.multiarray.concatenate}
  483/420    0.077    0.000    0.125    0.000 {numpy.core.multiarray.array}
       15    0.054    0.004    0.197    0.013 indexing.py:259(_index_indexer_1d)
        3    0.041    0.014    0.043    0.014 netCDF4_.py:181(__init__)
      105    0.013    0.000    0.057    0.001 netCDF4_.py:196(open_store_variable)
       15    0.012    0.001    0.013    0.001 {operator.getitem}
2715/1665    0.007    0.000    0.178    0.000 indexing.py:343(shape)
     5971    0.006    0.000    0.006    0.000 collections.py:71(__setitem__)

The version of dask is held constant in each test.

najascutellatus · 2017-04-04T14:11:08Z

@rabernat This data is computed on demand from the OOI (http://oceanobservatories.org/cyberinfrastructure-technology/). Datasets can be massive and so they seem to be split up in ~500 MB files when data gets too big. That is why obs changes for each file. Would having obs be consistent across all files potentially make open_mfdataset faster?

rabernat · 2017-04-04T14:27:18Z

My understanding is that you are concatenating across the variable obs, so no, it wouldn't make sense to have obs be the same in all the datasets.

My tests showed that it's not necessarily the concat step that is slowing this down. Your profiling suggest that it's a netcdf datetime decoding issue.

I wonder if @shoyer or @jhamman have any ideas about how to improve performance here.

acrosby · 2017-04-12T14:24:44Z

@friedrichknuth Did you try tests with the most recent version decode_times=True/False on a single file read?

friedrichknuth · 2017-04-12T15:42:18Z

decode_times=False significantly reduces read time, but the proportional performance discrepancy between xarray 0.8.2 and 0.9.1 remains the same.

jhamman · 2017-11-14T23:41:57Z

@friedrichknuth, any chance you can take a look at this with the latest v0.10 release candidate?

friedrichknuth · 2017-11-16T15:01:59Z

Looks like it has been resolved! Tested with the latest pre-release v0.10.0rc2 on the dataset linked by najascutellatus above. https://marine.rutgers.edu/~michaesm/netcdf/data/

da.set_options(get=da.async.get_sync)
%prun -l 10 ds = xr.open_mfdataset('./*.nc')

xarray==0.10.0rc2-1-g8267fdb
dask==0.15.4

         194381 function calls (188429 primitive calls) in 0.869 seconds

   Ordered by: internal time
   List reduced from 469 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       50    0.393    0.008    0.393    0.008 {numpy.core.multiarray.arange}
       50    0.164    0.003    0.557    0.011 indexing.py:266(_index_indexer_1d)
        5    0.083    0.017    0.085    0.017 netCDF4_.py:185(_open_netcdf4_group)
      190    0.024    0.000    0.066    0.000 netCDF4_.py:256(open_store_variable)
      190    0.022    0.000    0.022    0.000 netCDF4_.py:29(__init__)
       50    0.018    0.000    0.021    0.000 {operator.getitem}
5145/3605    0.012    0.000    0.019    0.000 indexing.py:493(shape)
2317/1291    0.009    0.000    0.094    0.000 _abcoll.py:548(update)
    26137    0.006    0.000    0.013    0.000 {isinstance}
      720    0.005    0.000    0.006    0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects}

xarray==0.9.1
dask==0.13.0


         241253 function calls (229881 primitive calls) in 98.123 seconds

   Ordered by: internal time
   List reduced from 659 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       30   87.527    2.918   87.527    2.918 {pandas._libs.tslib.array_to_timedelta64}
       65    7.055    0.109    7.059    0.109 {operator.getitem}
       80    0.799    0.010    0.799    0.010 {numpy.core.multiarray.arange}
7895/4420    0.502    0.000    0.524    0.000 utils.py:412(shape)
       68    0.442    0.007    0.442    0.007 {pandas._libs.algos.ensure_object}
       80    0.350    0.004    1.150    0.014 indexing.py:318(_index_indexer_1d)
    60/30    0.296    0.005   88.407    2.947 timedeltas.py:158(_convert_listlike)
       30    0.284    0.009    0.298    0.010 algorithms.py:719(checked_add_with_arr)
      123    0.140    0.001    0.140    0.001 {method 'astype' of 'numpy.ndarray' objects}
 1049/719    0.096    0.000   96.513    0.134 {numpy.core.multiarray.array}

rabernat mentioned this issue Mar 14, 2017

Using groupby with custom index #1308

Closed

rabernat mentioned this issue Apr 26, 2017

slow performance with open_mfdataset #1385

Open

friedrichknuth closed this as completed Nov 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 #1301

open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 #1301

friedrichknuth commented Mar 7, 2017

jhamman commented Mar 8, 2017

shoyer commented Mar 8, 2017

jhamman commented Mar 8, 2017

shoyer commented Mar 8, 2017

mangecoeur commented Mar 8, 2017

shoyer commented Mar 8, 2017

rabernat commented Mar 8, 2017

najascutellatus commented Mar 13, 2017

rabernat commented Mar 13, 2017

rabernat commented Mar 13, 2017

friedrichknuth commented Mar 13, 2017

najascutellatus commented Apr 4, 2017

rabernat commented Apr 4, 2017

acrosby commented Apr 12, 2017 •

edited

Loading

friedrichknuth commented Apr 12, 2017

jhamman commented Nov 14, 2017

friedrichknuth commented Nov 16, 2017 •

edited

Loading

open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 #1301

open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 #1301

Comments

friedrichknuth commented Mar 7, 2017

jhamman commented Mar 8, 2017

shoyer commented Mar 8, 2017

jhamman commented Mar 8, 2017

shoyer commented Mar 8, 2017

mangecoeur commented Mar 8, 2017

shoyer commented Mar 8, 2017

rabernat commented Mar 8, 2017

najascutellatus commented Mar 13, 2017

rabernat commented Mar 13, 2017

rabernat commented Mar 13, 2017

friedrichknuth commented Mar 13, 2017

najascutellatus commented Apr 4, 2017

rabernat commented Apr 4, 2017

acrosby commented Apr 12, 2017 • edited Loading

friedrichknuth commented Apr 12, 2017

jhamman commented Nov 14, 2017

friedrichknuth commented Nov 16, 2017 • edited Loading

acrosby commented Apr 12, 2017 •

edited

Loading

friedrichknuth commented Nov 16, 2017 •

edited

Loading