You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a nice option for working with in-file HFD5/netCDF4 compression: #1128 (comment)
Mixed multi-threading/multi-processing could also be interesting, if anyone wants to revive that: dask/dask#457 (I think it would work now that xarray data stores are pickle-able)
Can you remind me the motivation to use a spawning multiprocessing pool instead of a fork or forkserver solution?
For mixed multi-threading/multi-processing would a local "distributed" scheduler suffice? This would be several single-threaded processes on a single machine. The scheduler would be aware of data locality and avoid inter-node communication when possible.
Actually, I just tested it and it appears that forking also works, as long as you create the pool before opening any files. Otherwise, the netCDF library crashes (#1128 (comment)).
A local "distributed" scheduler might indeed also work, but at least when operating on a single machine it makes sense to bring all data into a single process once it's been loaded for multi-threaded data analysis.
Dask.distributed now creates a forkserver at startup. This seems to be working well so far. It nicely balances having a well defined environment and fast startup time.
How much inter-worker data transfer would you expect? It might be worth running through a few classic algorithms with it instead of the threaded scheduler and looking at performance changes. The diagnostic pages would be a nice bonus here and might help to highlight some performance issues.
If anyone is interested in this the thing to do is
$ conda install -c conda-forge dask distributed
>>> from dask.distributed import Client
>>> c = Client() # sets global scheduler by default
This is a nice option for working with in-file HFD5/netCDF4 compression:
#1128 (comment)
Mixed multi-threading/multi-processing could also be interesting, if anyone wants to revive that: dask/dask#457 (I think it would work now that xarray data stores are pickle-able)
CC @mrocklin
The text was updated successfully, but these errors were encountered: