Skip to content

Conversation

@LucaMarconato
Copy link
Member

@LucaMarconato LucaMarconato commented Jan 8, 2023

This pr improves stability and ergonomics of coordinate systems, and in particular helps with #39

Detailed list of what will be implemented.

Some tick boxes are deleted because I have decided to perform a deeper code restructuring (introducing new transformations classes); if I had the item already implemented I then deleted the code.

For the moment I allow only one transformation per element and I am not using at all the coordinate systems. I already set the code ready to do that but will do this in the next PR to keep this PR a bit more contained.

Major (updated plan: new transformation classes)

  • implement new transformation classes which add more flexibility in terms of composability.
  • transform elements:
    • SpatialData objects
    • pa.Table objects (Points)
    • GeoDataFrame objects (Polygons)
    • AnnData objects (Shapes`
    • SpatialImage (Image and Labels)
    • MultiscaleSpatialImage (Image and Labels)
  • general case: given two elements, a points in the coordinate system of one of the two, give the coordinate in the coordinate system of the other (this deals both with intrisic and extrinsic coordinate systems).
  • update the parser to use the new transformations
  • write conversion functions "new transformations" <> "ngff transformations"
  • update the io to save and load using the old NGFF classes
  • fix the transformation with multiscale images and multiscale labels: no more transformation in the "parernt" object, but only at each level of the pyramid. Each transformation should work independently of the other. set_transform and get_transform, when applied to the whole object. Specifically: set_transform will update all of them (taking as input the one on the largest level) and get_transform will get the one of the largest level.

Major (original plan)

the following crossed out points are not needed anymore with the new transform classes
- [X] simplify the logic of inference of coordinate systems in Sequence (function check_and_infer_coordinate_systems())
- [x] make Sequence also accept the concatenation of transformations with axes mismatch (eg. xy with cyx), by adding automatically the appropriate axis permutations, insertion/deletion
- [x] setter for elements to check that the transfomration makes sense (if not, suggests how to fix) (see my next comment for a downside of this)
- [ ] save the affine transformation to the file to avoid verbosity

  • better __repr__ for coordinate transformations
  • helper function to replace a transformation within an element (on disk, in-memory is already possible) without having to rewrite the element data
    • shapes
    • points
    • polygons
    • images and labels
      • SpatialImage
      • MultiscaleSpatialImage

Minor

  • improved Sequence transformation: make input and output in contatenation inferred from init
  • better tests for Sequence, covering the case of inference of the intermediate coordinate system for nested Sequence transformations
  • coordinate systems type simplified. Before we were planning to make possible to have input_coordinate_system and output_coordinate_system as a string inside a transformation object, to deal with some edge cases of implicit coordinate systems. Now this complexity is not needed because thanks to the schema mechanism, the spatial elements are more structured, and we can easily infer from them what is the implicit coordinate system.
  • helper function to get affine matrices between coordinate systems (Affine.from_input_output_coordinate_systems())

Bonus

  • multiple transformations per element
  • make a function "transform to coordinate system"
    • to work on points
    • to work on elements
      • images
      • labels
      • shapes
      • points
      • polygons
    • to work on spatialdata objects
  • general case: given two elements, a points in the coordinate system of one of the two, give the coordinate in the coordinate system of the other (this deals both with intrisic and extrinsic coordinate systems).

…ansform from input and output coordinate system; 2. fixed but in sequence (now inferring coordinate systems from its components); 3. added missing test for inferencen of intermediate coordinate systems for nested sequences
@codecov
Copy link

codecov bot commented Jan 8, 2023

Codecov Report

Attention: Patch coverage is 93.19556% with 135 lines in your changes missing coverage. Please review.

Project coverage is 87.76%. Comparing base (0a455dc) to head (458ba9d).
Report is 604 commits behind head on main.

Files with missing lines Patch % Lines
spatialdata/_core/ngff/ngff_coordinate_system.py 79.05% 31 Missing ⚠️
spatialdata/_core/transformations.py 94.32% 27 Missing ⚠️
spatialdata/_core/_transform_elements.py 86.84% 20 Missing ⚠️
spatialdata/_core/_spatialdata_ops.py 85.00% 18 Missing ⚠️
spatialdata/_core/ngff/ngff_transformations.py 96.95% 15 Missing ⚠️
spatialdata/_core/_spatialdata.py 94.30% 9 Missing ⚠️
spatialdata/_core/core_utils.py 96.12% 5 Missing ⚠️
spatialdata/_io/read.py 93.84% 4 Missing ⚠️
spatialdata/_core/models.py 96.34% 3 Missing ⚠️
spatialdata/utils.py 95.91% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #100      +/-   ##
==========================================
+ Coverage   84.22%   87.76%   +3.54%     
==========================================
  Files          19       22       +3     
  Lines        1914     3105    +1191     
==========================================
+ Hits         1612     2725    +1113     
- Misses        302      380      +78     
Files with missing lines Coverage Δ
spatialdata/__init__.py 100.00% <100.00%> (ø)
spatialdata/_core/_spatial_query.py 77.21% <100.00%> (+0.67%) ⬆️
spatialdata/_io/format.py 87.62% <100.00%> (+8.24%) ⬆️
spatialdata/_io/write.py 96.27% <98.79%> (+0.75%) ⬆️
spatialdata/utils.py 84.33% <95.91%> (+15.76%) ⬆️
spatialdata/_core/models.py 85.53% <96.34%> (+1.71%) ⬆️
spatialdata/_io/read.py 97.66% <93.84%> (+0.20%) ⬆️
spatialdata/_core/core_utils.py 95.32% <96.12%> (+5.50%) ⬆️
spatialdata/_core/_spatialdata.py 78.67% <94.30%> (+6.34%) ⬆️
spatialdata/_core/ngff/ngff_transformations.py 96.95% <96.95%> (ø)
... and 4 more

@LucaMarconato LucaMarconato changed the title 3 improviements to transforms: 1. helper function to get an affine tr… improviements to transforms: robustness and ergonomics Jan 9, 2023
@LucaMarconato
Copy link
Member Author

LucaMarconato commented Jan 11, 2023

EDIT: this post referst to the first tentative implementation. I removed the code which generated these complex transformation in favor of the new transformation classes.

As mentioned in my previous post, a downside of "setter for elements to check that the transfomration makes sense (if not, tries to fix it)" is that very verbose transformations are generated.

For instance this code here:

def test_assign_xyz_scale_to_cyx_image():
    xyz_cs = get_default_coordinate_system(("x", "y", "z"))
    scale = Scale(np.array([2, 3, 4]), input_coordinate_system=xyz_cs, output_coordinate_system=xyz_cs)
    image = Image2DModel.parse(np.zeros((10, 10, 10)), dims=("c", "y", "x"))
    set_transform(image, scale)
    t = get_transform(image)
    pprint(t.to_dict())
    print(t.to_affine().affine)

Leads to this transformation here:

ByDimension (c, y, x -> c, y, x)
    Sequence (x, y -> x, y, z)
        ByDimension (x, y -> x, y, z)
            MapAxis (y, x -> x, y)
                y <- y
                x <- x
            Affine (y -> z)
                [0. 0.]
                [0. 1.]
        Scale (x, y, z -> x, y, z)
            [2. 3. 4.]
    Identity (c -> c)

Which saved to json is insanely verbose:

{'input': {'axes': [{'name': 'c', 'type': 'channel'},
                    {'name': 'y', 'type': 'space', 'unit': 'unit'},
                    {'name': 'x', 'type': 'space', 'unit': 'unit'}],
           'name': 'cyx'},
 'output': {'axes': [{'name': 'c', 'type': 'channel'},
                     {'name': 'y', 'type': 'space', 'unit': 'unit'},
                     {'name': 'x', 'type': 'space', 'unit': 'unit'}],
            'name': 'cyx'},
 'transformations': [{'input': {'axes': [{'name': 'x',
                                          'type': 'space',
                                          'unit': 'unit'},
                                         {'name': 'y',
                                          'type': 'space',
                                          'unit': 'unit'}],
                                'name': "xyz_subset ['x', 'y']"},
                      'output': {'axes': [{'name': 'x',
                                           'type': 'space',
                                           'unit': 'unit'},
                                          {'name': 'y',
                                           'type': 'space',
                                           'unit': 'unit'},
                                          {'name': 'z',
                                           'type': 'space',
                                           'unit': 'unit'}],
                                 'name': 'xyz'},
                      'transformations': [{'input': {'axes': [{'name': 'x',
                                                               'type': 'space',
                                                               'unit': 'unit'},
                                                              {'name': 'y',
                                                               'type': 'space',
                                                               'unit': 'unit'}],
                                                     'name': "xyz_subset ['x', "
                                                             "'y']"},
                                           'output': {'axes': [{'name': 'x',
                                                                'type': 'space',
                                                                'unit': 'unit'},
                                                               {'name': 'y',
                                                                'type': 'space',
                                                                'unit': 'unit'},
                                                               {'name': 'z',
                                                                'type': 'space',
                                                                'unit': 'unit'}],
                                                      'name': 'xyz'},
                                           'transformations': [{'input': {'axes': [{'name': 'y',
                                                                                    'type': 'space',
                                                                                    'unit': 'unit'},
                                                                                   {'name': 'x',
                                                                                    'type': 'space',
                                                                                    'unit': 'unit'}],
                                                                          'name': 'cyx_subset '
                                                                                  "['y', "
                                                                                  "'x']"},
                                                                'mapAxis': {'x': 'x',
                                                                            'y': 'y'},
                                                                'output': {'axes': [{'name': 'x',
                                                                                     'type': 'space',
                                                                                     'unit': 'unit'},
                                                                                    {'name': 'y',
                                                                                     'type': 'space',
                                                                                     'unit': 'unit'}],
                                                                           'name': 'xyz_subset '
                                                                                   "['x', "
                                                                                   "'y']"},
                                                                'type': 'mapAxis'},
                                                               {'affine': [[0.0,
                                                                            0.0]],
                                                                'input': {'axes': [{'name': 'y',
                                                                                    'type': 'space',
                                                                                    'unit': 'unit'}],
                                                                          'name': 'cyx_subset '
                                                                                  "['y']"},
                                                                'output': {'axes': [{'name': 'z',
                                                                                     'type': 'space',
                                                                                     'unit': 'unit'}],
                                                                           'name': 'xyz_subset '
                                                                                   "['z']"},
                                                                'type': 'affine'}],
                                           'type': 'byDimension'},
                                          {'input': {'axes': [{'name': 'x',
                                                               'type': 'space',
                                                               'unit': 'unit'},
                                                              {'name': 'y',
                                                               'type': 'space',
                                                               'unit': 'unit'},
                                                              {'name': 'z',
                                                               'type': 'space',
                                                               'unit': 'unit'}],
                                                     'name': 'xyz'},
                                           'output': {'axes': [{'name': 'x',
                                                                'type': 'space',
                                                                'unit': 'unit'},
                                                               {'name': 'y',
                                                                'type': 'space',
                                                                'unit': 'unit'},
                                                               {'name': 'z',
                                                                'type': 'space',
                                                                'unit': 'unit'}],
                                                      'name': 'xyz'},
                                           'scale': [2.0, 3.0, 4.0],
                                           'type': 'scale'}],
                      'type': 'sequence'},
                     {'input': {'axes': [{'name': 'c', 'type': 'channel'}],
                                'name': "cyx_subset ['c']"},
                      'output': {'axes': [{'name': 'c', 'type': 'channel'}],
                                 'name': "cyx_subset ['c']"},
                      'type': 'identity'}],
 'type': 'byDimension'}

The corresponding affine matrix is the following:

Affine (c, y, x -> c, y, x)
    [1. 0. 0. 0.]
    [0. 3. 0. 0.]
    [0. 0. 2. 0.]
    [0. 0. 0. 1.]

So maybe we should just save the affine matrix to disk, which is something reasonable:

{'affine': [[1.0, 0.0, 0.0, 0.0], [0.0, 3.0, 0.0, 0.0], [0.0, 0.0, 2.0, 0.0]],
 'input': {'axes': [{'name': 'c', 'type': 'channel'},
                    {'name': 'y', 'type': 'space', 'unit': 'unit'},
                    {'name': 'x', 'type': 'space', 'unit': 'unit'}],
           'name': 'cyx'},
 'output': {'axes': [{'name': 'c', 'type': 'channel'},
                     {'name': 'y', 'type': 'space', 'unit': 'unit'},
                     {'name': 'x', 'type': 'space', 'unit': 'unit'}],
            'name': 'cyx'},
 'type': 'affine'}

@LucaMarconato
Copy link
Member Author

LucaMarconato commented Jan 13, 2023

After discussing with @giovp I will refactor into separate classes to divide what is purely a NGFF transform from what is built on top of it for ergonomics. The reasons are the following:

  1. the NGFF classes can be in principle moved to ome-zarr-py or an object-oriented representation of the storage, which is something that we want;
  2. this approach will greatly simplify the code complexity. In the previous comments I described the implications of using the approach proposed here Influence on the order of axes when declaring different affine transformations #39. With the new classes the problem of the issue will be solved and what is written to file will be cleared (e.g. if the user takes a cyx Scale from a cyx image and assigns it to a xy points, what is saved to disk will be a xy Scale, and not a complex transformation or its affine representation). The NGFF classes will be used only when reading or writing. The in-memory representation will take advantage of the new classes.
  3. The new classes will be based on xarray like described here Using xarray to make transformations less ambiguous #47 and the order of axes will be irrelevant. Also, all the transformations will have non-ambiguous information for the axis (on the contrary NGFF Sequence transform allow for containing transformations which don't specify the axes. This increases the code complexity.
  4. New functionalities that will be easily supported will be the possibility to concatenate any sequence of transformations independently of the axes they operate on, and assign any transformation to any element, again independently of the axes they work on.

The old version of the code can be found in the previous commit, as well as in a separate branch created from it. If the new code works as planned I'll delete the branch (which I already tagged for archiving purposes and can be found here).

@LucaMarconato
Copy link
Member Author

LucaMarconato commented Feb 2, 2023

So one major comment I have so far (by just checking the _spatialdata.py file) is the type of API we want to have. Copying it from the design doc

import spatialdata as sd
from spatialdata import SpatialData

sdata = SpatialData(...)
points = sd.transform(sdata.points["image1"], tgt="tgt_space")
sdata = sd.transform(sdata, tgt="tgt_space")
I think we should have 1 way to apply transform to either spatialdata or elements, and maybe we also shouldnt' allow user to pass directly the element, but only the name and type (e.g. {"labels":"sample1"}. I think this would prevent mistakes in copy/viewes of elements from the ones already in spatialdata and the ones transformed. wdyt ?

I think that we need two functions:

  1. one for getting a transformation between one target and one source, and this would need to be a method of SpatialData since it needs to build the digraph of the transformations of the elements that are inside. This is currently the function map_coordinate_systems(). We can change the name, it's a bit confusing. Maybe map()?
  2. one method to actually transform one element to a coordinate system. Again this needs to be a method of SpatialData because it needs to call the function above internally.

The need to have methods of SpatialData is the reason why I couldn't go with this idea from the design doc:

points = sd.transform(sdata.points["image1"], tgt="tgt_space")

Still we could change the second function to this:

points = sd.transform(sdata, element_type='labels', element_name='cells', tgt="tgt_space")

Here are the pro and cons of this second approach.
Pro:

  • we can rewrite and set_transformation(), remove_transformation() and remove_all_transformations() to take as first argument the sdata object and then the string description of the element like above. In this way we can remove the *_in_memory() methods since all the functions can be static methods (or just global functions). I really like this.

Con:

My proposal is to require unique names for elements (separate PR) and after that to change the second function as described (another PR). For the moment, I don't think we will have problems of copy/views, things should work.

Copy link
Collaborator

@kevinyamauchi kevinyamauchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LucaMarconato ! I think this is a solid step in the right direction. I am approving so that we can keep things moving forward. I think we will have to continue to iterate on API, etc., but it will be easier to do so when we actually use it to make vignettes, etc.

one for getting a transformation between one target and one source, and this would need to be a method of SpatialData since it needs to build the digraph of the transformations of the elements that are inside. This is currently the function map_coordinate_systems(). We can change the name, it's a bit confusing. Maybe map()?

Perhaps it could be called something like get_transformation_between_coordinate_systems()? Probably too long, but more descriptive. In any case, I agree that map_coordinate_systems() is probably not the most clear name.

Does this method actually need to be on SpatialData? It seems to me that one could still construct the directed graph of the transformations on a given element. Perhaps I'm missing something though.

@LucaMarconato
Copy link
Member Author

Thanks for the comments @kevinyamauchi, I have removed the functions get/set/remove transformations (total of 8 functions) and replaced with 3 functions living outside the SpatialData class. I have also moved map_coordinate_systems out of the SpatialData class and renamed it to get_transformation_between_coordinate_systems().

@LucaMarconato
Copy link
Member Author

I will now merge the new points io pr, it will take some times to fix all the tests, but then we'll be ready to merge!

@giovp
Copy link
Member

giovp commented Feb 13, 2023

@LucaMarconato quick thing I noticed from working in #132 : the PointsModel.parse still requires an annotation to be passed in the case of nd.array input. I think this should not be enforced since the feature_key is not mandatory. Was it fixed here or shall I push a quick fix in main?

@LucaMarconato
Copy link
Member Author

Hi, true I didn't fix it actually. A quick fix would be great thanks.

@LucaMarconato LucaMarconato merged commit d0f0fa0 into main Feb 13, 2023
@LucaMarconato LucaMarconato deleted the feature/transform_ergonomics branch February 22, 2023 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants