Skip to content

Debug "Slicing is producing a large chunk" warning #300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eric-czech opened this issue Oct 5, 2020 · 1 comment
Open

Debug "Slicing is producing a large chunk" warning #300

eric-czech opened this issue Oct 5, 2020 · 1 comment

Comments

@eric-czech
Copy link
Collaborator

eric-czech commented Oct 5, 2020

I see this warning when running the function mentioned in https://github.com/pystatgen/sgkit/issues/299 on 1KG data:

/home/eczech/miniconda3/envs/sgkit-dev/lib/python3.8/site-packages/xarray/core/indexing.py:1361: PerformanceWarning:
 Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]

We should figure out how this is possible when the functions applied to a dataset do nothing other than filter within chunks. Presumably this means the chunks should only shrink unlike what is suggested in the warning.

I haven't been able to reproduce this on simulated data yet.

@tomwhite
Copy link
Collaborator

I noticed that I get the same warning (on MalariaGEN data) for

ds.isel(samples=ds.sample_cohort != -1)

but not for

ds.isel(samples=ds.sample_cohort >= 0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants