Skip to content

[RFC] Add ElementWiseMult #983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 2, 2019
Merged

[RFC] Add ElementWiseMult #983

merged 10 commits into from
Mar 2, 2019

Conversation

kevinyamauchi
Copy link
Collaborator

@kevinyamauchi kevinyamauchi commented Feb 5, 2019

Objective

This PR introduces a pipeline component for field flatness correction and image normalization via a user supplied correction image.

Overview

Per our discussion in #945, I did the following:

For illumination correction, the user would supply the full matrix. This method uses the Filter PipelineComponent. The user supplied matrix is a 5-d tensor with any singleton dimension being broadcast. For example, to correct illumination uniformly across (x, y) user would pass (1, 1, 1, x, y). To uniformly scale channels, user would pass (1, c, 1, 1, 1) where c- is a vector of length equal to the number of channels

Usage

import numpy as np

from starfish.imagestack.imagestack import ImageStack
import starfish

# Make the ImageStack
im = np.ones((1, 3, 5, 2, 2))
stack = ImageStack.from_numpy_array(im)

# Make the correction matrix
corr_mat = np.ones((1, 1, 1, 1, 3))
corr_mat[0, 0, 0, 0, 1] = 0.5
corr_mat_xr = xr.DataArray(corr_mat, dims=('r', 'x', 'z', 'y', 'c'))

# Correct...
filter_mult = starfish.image.Filter.ElementWiseMult(mult_mat=corr_mat_xr)
stack2 = filter_mult.run(stack, in_place=False, verbose=True)

To do

  • Add tests (we can probably use variants of the usage example above)

Questions

  • I am not familiar with click. Can I supply an ndarray as a default parameter?
  • For the _DEFAULT_TESTING_PARAMETERS = {"corr_mat": 0}: I assume these are passed as kwargs into some test. Where are those tests?
  • Currently, I use group_by to chunk by ROUND, CH, Z (where appropriate) and numpy broadcasting for X, Y. I think this is preferred, as it is compatible with the multiprocessing. We can also just directly multiply the mult_mat with the image thanks to the magic of numpy, but I don't think this will use the multiprocessing, as implemented. Thoughts?
  • Is there a better way to get the group_by axes (see the run() method)?

@codecov-io
Copy link

codecov-io commented Feb 5, 2019

Codecov Report

Merging #983 into master will increase coverage by 0.07%.
The diff coverage is 90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #983      +/-   ##
==========================================
+ Coverage   88.48%   88.56%   +0.07%     
==========================================
  Files         164      167       +3     
  Lines        6139     6216      +77     
==========================================
+ Hits         5432     5505      +73     
- Misses        707      711       +4
Impacted Files Coverage Δ
starfish/image/_filter/element_wise_mult.py 90% <90%> (ø)
starfish/test/image/filter/test_linear_unmixing.py 100% <0%> (ø)
starfish/image/_filter/linear_unmixing.py 96.77% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae5311a...e9e99ef. Read the comment docs.

@kevinyamauchi
Copy link
Collaborator Author

@ambrosejcarr Do you think we should pass in the option to rescale or clip (i.e., pass rescale to preserve_float_range())? I think we should.

Copy link
Collaborator

@ttung ttung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lot of your code assumes a certain ordering of the axes in ImageStack, which is explicitly not guaranteed.

@kevinyamauchi
Copy link
Collaborator Author

Ah, yes it does. I did not realize that wasn't guaranteed. Is there a way to get the order of the axes in image passed to _mult()?

@ambrosejcarr
Copy link
Member

ambrosejcarr commented Feb 9, 2019

This is great @kevinyamauchi

Ah, yes it does. I did not realize that wasn't guaranteed. Is there a way to get the order of the axes in image passed to _mult()?

I think the right approach here would be to accept an xarray as the matrix argument, with the axes labeled. In the below example, you can think of xx as similar to the ImageStack and yy as the illumination correction input.

Here's my best attempt. It gets stuck on something in xarray which I've reached out to them to ask about. Depending on what they say, we might be able to implement the functionality or a work around.

import starfish
import xarray as xr
import numpy as np

# 3d array of ones
x = np.ones([10, 10, 10])
# 3d [0, 9] range over 2nd axis, singletons in 1st and 3rd axes
y = np.arange(10)[None, :, None]

# create DataArrays
xx = xr.DataArray(x, dims=('x', 'y', 'z'))
# note reversal of (x, y) dimensions
yy = xr.DataArray(y, dims=('y', 'x', 'z'))

# if xx is our ImageStack, this provides the aligned-axes guarantee
yy_aligned = yy.transpose(*xx.dims)

# I wish this worked, see pydata/xarray#2171
xx * yy_aligned

[Edit: the work around isn't as gross as I thought it would be]

xx * yy_aligned.values

@kevinyamauchi
Copy link
Collaborator Author

kevinyamauchi commented Feb 9, 2019

I was trying to use the ImageStack.apply() method to leverage the multiprocessing support that comes with it. As I understand it, ImageStack.apply() uses _processing_workflow(), which passes an ndarray of the selected tiles to the filter function. Is there a way to force it to pass an xarray instead?

@kevinyamauchi
Copy link
Collaborator Author

@ambrosejcarr , I made the changes we discussed in our call. Would you mind taking a look? I updated the usage snippet in the PR description.

@ambrosejcarr
Copy link
Member

I was trying to use the ImageStack.apply() method to leverage the multiprocessing support that comes with it. As I understand it, ImageStack.apply() uses _processing_workflow(), which passes an ndarray of the selected tiles to the filter function. Is there a way to force it to pass an xarray instead?

#1035

@ambrosejcarr
Copy link
Member

@kevinyamauchi I ran some tests at scale to try to understand whether we need the multiprocessing.

In [0]: import starfish
   ...: import os
   ...: import xarray as xr
   ...: import numpy as np
   ...: from starfish.imagestack.parser.crop import CropParameters  # noqa 
   ...: experiment = starfish.Experiment.from_json(os.path.expanduser('~/scratch/seqfish/experiment.json')) 
   ...: fov = experiment['fov_000'] 
   ...: crop_params = CropParameters(x_slice=slice(0, 1024), y_slice=slice(1024, 2048)) 
   ...: image = fov.get_image('primary', crop_params)

In [1]: mult_array = xr.DataArray(np.arange(12)[None, :, None, None, None], dims=('r', 'c', 'z', 'y', 'x'))

In [2]: ewm = starfish.image.Filter.ElementWiseMultiply(mult_array)

In [3]: %timeit -n1 -r1 ewm.run(image, n_processes=8)
38.7 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

In [4]: %timeit -n1 -r1 image.xarray * mult_array.values
24.2 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

In [5]: res_mp = ewm.run(image)

In [6]: res_direct = image.xarray * mult_array.values

While it does eventually get faster if you add enough processes (20), I suspect it's better to just leverage numpy's vectorization. Do you mind if I simplify the method to use a single process?

@kevinyamauchi
Copy link
Collaborator Author

My intuition was that for our current tile size this would be the case. I am less sure how this works if people bring really large tiles (e.g., stitched image of a piece of tissue). What do you think about that? Is that in-scope for near-term starfish goals? I'm fine with either approach!

@ambrosejcarr
Copy link
Member

My intuition was that for our current tile size this would be the case. I am less sure how this works if people bring really large tiles (e.g., stitched image of a piece of tissue). What do you think about that? Is that in-scope for near-term starfish goals? I'm fine with either approach!

For larger tiles, @ttung is working on a workflow runner that will break them up such that they fit in memory and send them off to individual compute nodes. I think the single-process case here makes more sense.

I omitted the size from the test above, but it is (4, 12, 27, 1024, 1024), so a good size test for a single FoV.

@ambrosejcarr ambrosejcarr force-pushed the ky-add-field-flatness branch from d801353 to c98afee Compare March 2, 2019 05:53
@ambrosejcarr ambrosejcarr merged commit 0b909d2 into master Mar 2, 2019
@ambrosejcarr ambrosejcarr deleted the ky-add-field-flatness branch March 2, 2019 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants