-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
More safety nets for resizing mutable dimensions #5817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5817 +/- ##
==========================================
+ Coverage 89.40% 89.98% +0.58%
==========================================
Files 74 73 -1
Lines 13769 13221 -548
==========================================
- Hits 12310 11897 -413
+ Misses 1459 1324 -135
|
dd0a661
to
9800783
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @michaelosthege. I think that we need to use coords
values that are supplied to set_data
to call set_dim
internally, instead of expecting the users to set_dim
before they set_data
.
pymc/model.py
Outdated
elif isinstance(length_tensor, ScalarSharedVariable): | ||
# The dimension is mutable, but was defined without being linked | ||
# to a shared variable. This is allowed, but slightly dangerous. | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the key thing that is missing here is that users can also pass coords
when they set_data
. Ideally, we should use the supplied coords
internally to call set_dim
and then update the values of the MutableData
instance. This warning should only be raised if the users haven't passed coords
along to set_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. You're welcome to try, but I think this could become more complicated than the current way with the warning.
The set_data
method doesn't actually care if the ScalarSharedVariable
corresponds to a dimension in Model.dim_lengths
or not!
Most important will be to add test cases that showcase the intended call order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the expected flow to change data and make predictions? Before we had to do:
with pm.Model(coords={"A": range(10)}) as m:
x = pm.MutableData("x", x_values, dims="A")
y = pm.MutableData("y", y_values, dims="A")
a = pm.Normal("a", 0, 1)
b = pm.Normal("b", 0, 1)
c = pm.HalfNormal("c", 1)
obs = pm.Normal("obs", mu=a + b * x, sigma=c, observed=y)
idata = pm.sample()
with m:
pm.set_data({"x": np.linspace(-2, 3, 100), "y": np.full(100, np.nan)})
ppc = pm.sample_posterior_predictive(idata)
I would like us to be able to do this:
with pm.Model(coords={"A": range(10)}) as m:
x = pm.Data("x", x_values, dims="A")
y = pm.Data("y", y_values, dims="A")
a = pm.Normal("a", 0, 1)
b = pm.Normal("b", 0, 1)
c = pm.HalfNormal("c", 1)
obs = pm.Normal("obs", mu=a + b * x, sigma=c, observed=y)
idata = pm.sample()
with m:
pm.set_data({"x": np.linspace(-2, 3, 100), "y": np.full(100, np.nan)}, coords={"A": range(10, 110)})
ppc = pm.sample_posterior_predictive(idata)
But due to the choice of default Mutable/Constant coords, we need to:
- Manually
add_coord
for "A" and say it is mutable - Call
set_dim
before callingset_data
Or we have to create the coordinate values using MutableData
, which I don't really know how to do.
pymc/tests/test_model.py
Outdated
# a warning shoudl be emitted. | ||
with pytest.warns(ShapeWarning, match="update the dimension length"): | ||
pmodel.set_data("mdata", [1, 2, 3, 4]) | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add the test where pmodel.set_data("mdata", [1, 2, 3, 4], coords={"mdim": range(4)})
.
Ideally, we want to catch the coords fed into set_data
and call set_dim
with them
Thanks for the proposed fix @michaelosthege ! I agree with @lucianopaz that it'd be great if the user didn't have to set_dim before they set_data |
Since #5763 (7ec106c to be exact) one can pass So with this, you'd create the with pm.Model() as m:
x = pm.MutableData("x", x_values, dims="A", coords={"A": range(10)})
y = pm.MutableData("y", y_values, dims="A") Then, if you add the |
I'll add the test suggested above and look into adding the |
331aab2
to
6ecc6d7
Compare
@lucianopaz please review the new test cases - they should govern which usage patterns are now supported. The only thing where I'm hestiant is introducing mutable dims through the
Please continue on this branch as you see fit. Let's get it over the finish line today |
This is a vague hand waving comment, so apologies for that. I feel the internal code to manage dims is become a bit too convoluted for its own good. Suggestion:
|
The idea of defining Thinking this further, with #5796 we could do this:
Note that this would work for multi-dimensional For wroking with GPs this would be really useful, because one has to carry data and grid coordinates around quite a lot.. I agree that the code is a little convoluted - it would probably benefit from a little extraction refactoring. But the downsides of removing If we add a kwarg to the |
Any decision regarding the default Since the commits here don't affect the thing we're still in discussion about, can we go ahead and merge this in the interest of closing #5812 and increasing overall safety? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions and questions :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ricardoV94 @twiecki I won't have time to do any commits here until Monday.
As in "I'm out, please take over the rest here and get this merged".
What's missing here, can we merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for tests to pass to make sure I didn't break anything accidentally
Model.set_dim
method for resizing dimensions that were created byadd_coord(..., mutable=True)
.Model.set_data
to anticipate that data-induced resizing can target dimensions that were created throughadd_coord(..., mutable=True)
which are not symbolically linked to the data variables.Closes #5812.