Skip to content

BUG: Calling pm.set_data() on MutableData with dims causes error #6497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
twiecki opened this issue Feb 3, 2023 · 8 comments
Closed

BUG: Calling pm.set_data() on MutableData with dims causes error #6497

twiecki opened this issue Feb 3, 2023 · 8 comments

Comments

@twiecki
Copy link
Member

twiecki commented Feb 3, 2023

Describe the issue:

For some reaoson, having a pm.MutableData() that sets dims seems to turn it into a ConstantData? At least when calling pm.set_data() to update it, along with updating coords, raises an error. Not having rely it on dims makes it work.

Reproduceable code example:

import pymc as pm

coords = {"x": [1, 2, 3]}
# breaks:
with pm.Model(coords=coords):
    pm.MutableData("y", [0, 0, 0], dims="x")
    pm.set_data({"y": range(5)}, coords={"x": range(5)})

# this works:
coords = {"x": [1, 2, 3]}
with pm.Model(coords=coords):
    pm.MutableData("y", [0, 0, 0])
    pm.set_data({"y": range(5)}, coords={"x": range(5)})

Error message:

ShapeError: Resizing dimension 'x' is impossible, because a `TensorConstant` stores its length. To be able to change the dimension length, pass `mutable=True` when registering the dimension via `model.add_coord`, or define it via a `pm.MutableData` variable.

PyMC version information:

5.0.1

Context for the issue:

Happened during a workshop, confusing for beginners.

@twiecki twiecki added the bug label Feb 3, 2023
@michaelosthege
Copy link
Member

michaelosthege commented Feb 4, 2023

Not being able to resize dimension "x" in your first example is by design.

Your second example works because pm.set_data simply ignores the coords as they are not relevant for the y variable.

Maybe the exception message is not clear enough:

To be able to change the dimension length, pass mutable=True when registering the dimension via model.add_coord,
or define it via a pm.MutableData variable.

In the first example, the "x" dimension was defined when passing pm.Model(coords=...) which defaults to creating immutable dimensions since a95dd71 from #5763 with was released in v4.0.0.

More concerning is that running the second example on latest main gives a TypeError:

Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:51:29) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymc as pm
>>> coords = {"x": [1, 2, 3]}
>>> with pm.Model(coords=coords):
...     pm.MutableData("y", [0, 0, 0])
...     pm.set_data({"y": range(5)}, coords={"x": range(5)})
...
y
Traceback (most recent call last):
  File "C:\Users\zufal\miniconda3\envs\pmdev\lib\site-packages\pytensor\link\basic.py", line 111, in __set__
    self.storage[0] = self.type.filter_inplace(
  File "C:\Users\zufal\miniconda3\envs\pmdev\lib\site-packages\pytensor\graph\type.py", line 130, in filter_inplace
    raise NotImplementedError()
NotImplementedError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "E:\Source\Repos\pymc\pymc\model.py", line 1887, in set_data
    model.set_data(variable_name, new_value, coords=coords)
  File "E:\Source\Repos\pymc\pymc\model.py", line 1283, in set_data
    shared_object.set_value(values)
  File "C:\Users\zufal\miniconda3\envs\pmdev\lib\site-packages\pytensor\compile\sharedvalue.py", line 134, in set_value
    self.container.value = copy.deepcopy(new_value)
  File "C:\Users\zufal\miniconda3\envs\pmdev\lib\site-packages\pytensor\link\basic.py", line 115, in __set__
    self.storage[0] = self.type.filter(value, **kwargs)
  File "C:\Users\zufal\miniconda3\envs\pmdev\lib\site-packages\pytensor\tensor\type.py", line 199, in filter
    raise TypeError(err_msg)
TypeError: ('TensorType(int32, (?,)) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to int32, or 2) set "allow_input_downcast=True" when calling "function". Value: "array([0., 1., 2., 3., 4.])"', 'Container name "y"')
>>>

@ricardoV94
Copy link
Member

ricardoV94 commented Feb 4, 2023

IMO, we should do one of the following:

  1. Add a mutable_coords to pm.Model
  2. Revert dims to being mutable by default and freeze them in compile_pymc (i.e., replace them by constants). We can use a subclass of SharedVariable to identify them easily. We already freeze SharedVariables when using JAX samplers for example.

The current API is not discoverable nor intuitive.

@michaelosthege
Copy link
Member

We considered a pm.Model(mutable_coords=...) or pm.Model(coords_mutable=...) before and it's extremely simple to add, so I'd definitely favor that!

@twiecki
Copy link
Member Author

twiecki commented Feb 5, 2023

Is there a performance problem if coords are mutable by default? Or maybe we can just check if coords are used inside of a MutableData container and then set it mutable? Sounds a bit hackish but 🤷

@ricardoV94
Copy link
Member

ricardoV94 commented Feb 5, 2023

There's no performance penalty with option 2

Retracted there might be issues with value variables not having static shapes...

@Dhruvanshu-Joshi
Copy link
Member

Dhruvanshu-Joshi commented Feb 9, 2023

Hey I would like to take up this issue.
I understand that here we want to add a mutable_coords to pm.Model by adding something like pm.Model(mutable_coords=...).
I would like to know what this mutable_coords is going to do when we already have coords.
By my understanding, what we want mutable_coords to do is to set the dimensions of the coordinates to be mutable that can be changed by using set data.
What we can do now to make the coordinates mutable is setting 'mutable = True" when adding the coordinates to the model.
On using pm.Model(coords = coords), mutable is set to 'False' by default for pymc version > 4.1.0
So if we add mutable_coords, one can simply use pm.Model(mutable_coords = coords) to set mutable = true when the context of the model is created

Have a look at this example:

import pymc as pm
coords = {"x": range(2)}
with pm.Model(coords = coords):
    pm.MutableData("y", [0, 0, 0])
    model.add_coord("x", coords, mutable=True)
    pm.set_data({"y": [1, 2, 3, 4, 5]}, coords={"x": range(5)})

what we want to achieve is this:

import pymc as pm
coords = {"x": range(2)}
with pm.Model(mutable_coords = coords):
    pm.MutableData("y", [0, 0, 0])
    pm.set_data({"y": [1, 2, 3, 4, 5]}, coords={"x": range(5)})

However, not setting the mutable = 'True' does not produce any error

import pymc as pm
coords = {"x": range(2)}
with pm.Model(coords = coords):
    pm.MutableData("y", [0, 0, 0])
    pm.set_data({"y": [1, 2, 3, 4, 5]}, coords={"x": range(5)})

@michaelosthege
Copy link
Member

@Dhruvanshu-Joshi you can focus on this line:

self.add_coords(coords)

...and make it for iterate to call self.add_coord(..., mutable=True) similar to how add_coords does it.

@ricardoV94
Copy link
Member

Closed via #6515

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants