-
Notifications
You must be signed in to change notification settings - Fork 70
[RFC] Add ElementWiseMult #983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #983 +/- ##
==========================================
+ Coverage 88.48% 88.56% +0.07%
==========================================
Files 164 167 +3
Lines 6139 6216 +77
==========================================
+ Hits 5432 5505 +73
- Misses 707 711 +4
Continue to review full report at Codecov.
|
@ambrosejcarr Do you think we should pass in the option to rescale or clip (i.e., pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a lot of your code assumes a certain ordering of the axes in ImageStack, which is explicitly not guaranteed.
Ah, yes it does. I did not realize that wasn't guaranteed. Is there a way to get the order of the axes in |
This is great @kevinyamauchi
I think the right approach here would be to accept an xarray as the matrix argument, with the axes labeled. In the below example, you can think of Here's my best attempt. It gets stuck on something in xarray which I've reached out to them to ask about. Depending on what they say, we might be able to implement the functionality or a work around.
[Edit: the work around isn't as gross as I thought it would be]
|
I was trying to use the |
@ambrosejcarr , I made the changes we discussed in our call. Would you mind taking a look? I updated the usage snippet in the PR description. |
|
@kevinyamauchi I ran some tests at scale to try to understand whether we need the multiprocessing. In [0]: import starfish
...: import os
...: import xarray as xr
...: import numpy as np
...: from starfish.imagestack.parser.crop import CropParameters # noqa
...: experiment = starfish.Experiment.from_json(os.path.expanduser('~/scratch/seqfish/experiment.json'))
...: fov = experiment['fov_000']
...: crop_params = CropParameters(x_slice=slice(0, 1024), y_slice=slice(1024, 2048))
...: image = fov.get_image('primary', crop_params)
In [1]: mult_array = xr.DataArray(np.arange(12)[None, :, None, None, None], dims=('r', 'c', 'z', 'y', 'x'))
In [2]: ewm = starfish.image.Filter.ElementWiseMultiply(mult_array)
In [3]: %timeit -n1 -r1 ewm.run(image, n_processes=8)
38.7 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
In [4]: %timeit -n1 -r1 image.xarray * mult_array.values
24.2 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
In [5]: res_mp = ewm.run(image)
In [6]: res_direct = image.xarray * mult_array.values While it does eventually get faster if you add enough processes (20), I suspect it's better to just leverage numpy's vectorization. Do you mind if I simplify the method to use a single process? |
My intuition was that for our current tile size this would be the case. I am less sure how this works if people bring really large tiles (e.g., stitched image of a piece of tissue). What do you think about that? Is that in-scope for near-term starfish goals? I'm fine with either approach! |
For larger tiles, @ttung is working on a workflow runner that will break them up such that they fit in memory and send them off to individual compute nodes. I think the single-process case here makes more sense. I omitted the size from the test above, but it is (4, 12, 27, 1024, 1024), so a good size test for a single FoV. |
181b912
to
d801353
Compare
d801353
to
c98afee
Compare
Objective
This PR introduces a pipeline component for field flatness correction and image normalization via a user supplied correction image.
Overview
Per our discussion in #945, I did the following:
Usage
To do
Questions
_DEFAULT_TESTING_PARAMETERS = {"corr_mat": 0}
: I assume these are passed as kwargs into some test. Where are those tests?group_by
to chunk by ROUND, CH, Z (where appropriate) and numpy broadcasting for X, Y. I think this is preferred, as it is compatible with the multiprocessing. We can also just directly multiply themult_mat
with theimage
thanks to the magic of numpy, but I don't think this will use the multiprocessing, as implemented. Thoughts?group_by
axes (see therun()
method)?