Skip to content

Add support for symbolic initval using a singledispatch approach #4912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 2, 2021

Conversation

kc611
Copy link
Contributor

@kc611 kc611 commented Aug 8, 2021

This PR was built upon #4867 as an simpler singledispatch based alternative.

Also Fixes: #4911

cc @ricardoV94 @michaelosthege

@michaelosthege
Copy link
Member

I appreciate you looking into this too, @kc611, but I see several disadvantages of the singledispatch approach:

  1. It's not simpler than the classmethod route:
    The implementation of the singledispatch strategy is much more spread all over the place. For example there is some stuff inside an if clause in Distribution.__new__, and two functions outside of the Distribution class where previously one classmethod was enough.
  2. The singledispatch based initval picking acts on TensorVariables:
    For logp/logcdf I can follow that argument, because those relate to statistical properties, but initial values for the MCMCs are a PyMC3 Model thing.
    Those TensorVariables can come from the Distribution.dist() API (as you did in the test) which doesn't even support initial values.

Unrelated to the concerns above, I will now cherry-pick some changes from #4867 to a separate PR so these diffs become easier to read.

@ricardoV94
Copy link
Member

ricardoV94 commented Aug 8, 2021

It's not simpler than the classmethod route

I agree. There is one advantage however. The method can be (re)-used in other places. For example I think the old pm.Mixture needed to access the initval of its component distributions in order to set it's own initval.

@ricardoV94 ricardoV94 marked this pull request as ready for review August 10, 2021 17:51
@ricardoV94 ricardoV94 marked this pull request as draft August 10, 2021 18:50
@brandonwillard
Copy link
Contributor

  1. It's not simpler than the classmethod route:
    The implementation of the singledispatch strategy is much more spread all over the place.

What is this measure of "spread all over the place" and what makes it relevant?

From what I can tell, this is putting the initial value computation logic within the types/classes to which it corresponds. Would you rather have all this logic implemented as a bunch of fixed if-elif statements within Model or Distribution?

If you add methods to Distribution that compute/determine initial values, then you're doing effectively the same thing as this PR—except with much less non-monkey-patching configurability.

If you're not aware, dispatching/generic functions like these are another approach to the same functionality that class methods provide—albeit with different forms of flexibility.

For example there is some stuff inside an if clause in Distribution.__new__, and two functions outside of the Distribution class where previously one classmethod was enough.

Is this "one" class method approach able to accomplish exactly the same things? It looks like these changes allow one to set the default initial values for Distributions quite flexibly and succinctly, and makes those defaults easily configurable at run-time.

2. The singledispatch based initval picking acts on TensorVariables:
For logp/logcdf I can follow that argument, because those relate to statistical properties, but initial values for the MCMCs are a PyMC3 Model thing.

In what sense does it "act on TensorVariables"? The dispatches of this feature and logp/logcdf are not on TensorVariables.

If you're referring to how default_initval takes TensorVariable arguments, then you're not seeing that this is simply a convenience. Distributions are currently one-to-one with RandomVariable types, and this means that no matter how you want to implement this you're necessarily using a RandomVariable type to determine how an initial value is computed. Whether you get that type from a TensorVariable's Op or type(self) is immaterial.

Those `TensorVariable`s can come from the `Distribution.dist()` API (as you did in the test) which doesn't even support initial values.

We're trying to assign initial values to random variable/distribution types—both at a conceptual level and at the type-level; that's the objective, and that's what this approach is doing. How are you proposing it be done without associating initial value computations/choices with Distributions/RandomVariable types?

All these arguments are phrased as "cons", but I'm not seeing any real implementation or design issues being raised. If you don't like the approach used in this PR, that's fine, but stating so in this way isn't very helpful. If you don't understand what these changes are doing or why, asking questions is better.

@ricardoV94
Copy link
Member

ricardoV94 commented Aug 11, 2021

Just a note that I refactored (and broke) the code quite a lot after @michaelosthege comment above, so the discussion could be a bit misplaced.

@brandonwillard
Copy link
Contributor

Just a note that I refactored (and broke) the code quite a lot after @michaelosthege comment above, so the discussion could be a bit misplaced.

It's not misplaced, since all the relevant comments are basically only about using a dispatch-based approach, and that hasn't changed.

@codecov
Copy link

codecov bot commented Aug 11, 2021

Codecov Report

Merging #4912 (46cb30a) into main (389f818) will increase coverage by 0.00%.
The diff coverage is 93.75%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #4912   +/-   ##
=======================================
  Coverage   74.13%   74.13%           
=======================================
  Files          86       86           
  Lines       13882    13898   +16     
=======================================
+ Hits        10291    10303   +12     
- Misses       3591     3595    +4     
Impacted Files Coverage Δ
pymc3/distributions/distribution.py 82.66% <91.66%> (+0.78%) ⬆️
pymc3/distributions/continuous.py 96.26% <100.00%> (+0.01%) ⬆️
pymc3/backends/report.py 89.51% <0.00%> (-2.10%) ⬇️

@kc611
Copy link
Contributor Author

kc611 commented Aug 14, 2021

Re-running tests to check if failure is flaky.

@kc611 kc611 closed this Aug 14, 2021
@kc611 kc611 reopened this Aug 14, 2021
@kc611 kc611 force-pushed the initval_dispatch branch from f6cb942 to ba93578 Compare August 14, 2021 12:13
@michaelosthege
Copy link
Member

@kc611, just a heads-up: yesterday @ricardoV94, @twiecki and I discussed about ongoing initival efforts and we found a way to disentangle and resolve some of the implementation difficulties we're having with different approaches.

We came up with the idea to think of a singledispatch-based re-implementation as something independent from initial values.
Thomas proposed to name that dispatched function something like "representative_point" and implement it without touching anything initial-value related.
Independently, we can then work on refactoring how/when initial values are numerified and in a 3rd step we can then change the Distribution.__new__(initval=...) signature to allow for selecting between "random", "moment".

@ricardoV94 @twiecki please comment/edit if I mixed up something.

@twiecki
Copy link
Member

twiecki commented Aug 18, 2021

Is this still required with #4942? If it is, we should probably change the naming here a bit like @michaelosthege advised above.

@ricardoV94
Copy link
Member

ricardoV94 commented Aug 18, 2021

Is this still required with #4942? If it is, we should probably change the naming here a bit like @michaelosthege advised above.

Yeah you still need this (i.e. the "moment" thing, not the symbolic initval). For transformed unconstrained variables we can do like STAN [uniform -1, 1], but unlike STAN we also sample constrained variables, in which case we still need a way to specify a stable starting point for those distributions.

Parameters are the same as for the `.dist()` method.
"""

return _representative_point(rv.owner.op, rv, *rv.owner.inputs[2:])
Copy link
Member

@ricardoV94 ricardoV94 Aug 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually size in input[1], input[2] is dtype,

@kc611 kc611 force-pushed the initval_dispatch branch from cfdfc5e to e4f03e5 Compare August 20, 2021 17:26
class TestMoment:
def test_basic(self):
rv = pm.Flat.dist()
assert get_moment(rv).eval() == np.zeros(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A random draw from a scalar RV has a shape == (). Shouldn't the moment have the same scalar shape?

@twiecki
Copy link
Member

twiecki commented Aug 30, 2021

@kc611 do you know what's up with the unit tests here?

@kc611
Copy link
Contributor Author

kc611 commented Aug 30, 2021

Ah, these failures are due to #4961 (comment), rebasing should fix them, Infact this PR should be marked ready to merge. (If the initval work is being done separately somewhere else)

@twiecki twiecki marked this pull request as ready for review August 30, 2021 15:44
@twiecki
Copy link
Member

twiecki commented Aug 30, 2021

@kc611 want to rebase then?

@kc611 kc611 force-pushed the initval_dispatch branch 2 times, most recently from fee76c1 to e89684f Compare August 30, 2021 15:55
@michaelosthege
Copy link
Member

Asking the unpleasant question: Which moment will LogNormal get? Mean or mode?

(Or will there be a mechanism where this can be configured in some way?)

Also please check my comment from above - Don't know if you fixed it, but IIRC I saw something looking like a shape problem around the corner.

@kc611
Copy link
Contributor Author

kc611 commented Aug 30, 2021

Yeah there will be some other mechanism according to #4912 (comment) , this PR just implments the get_moment part of it.

Also please check my comment from above

Just saw it. The shape problem seems to be something of a limitation caused by TensorVariable/TensorConstant type shapes being passed to aesara.tensor.zeros. Just fixed it and addressed your comment too. (Not sure if this .data approach will work every-time though)

@kc611 kc611 force-pushed the initval_dispatch branch from e89684f to 65b0680 Compare August 30, 2021 16:45
@kc611
Copy link
Contributor Author

kc611 commented Aug 30, 2021

Not sure if this .data approach will work every-time though

Turns out it doesn't :-P

Looks like the .data approach won't work on windows systems. Plus it'll cause problems with TensorVariable type shapes.

And I was wondering why I (mentally) put a green check on this in the first place. Turns out the newly added tests pass without the .data on aesara==2.2.0. So I think we'd want to update that before this goes in. That or otherwise we'll need to somehow change how shapes are being passed to those aet.zeroes/aet.ones. However sooner or later we'd like to switch to this particular approach.

Meanwhile we can use this PR to plan and implement the new functionality that was being planned in #4942

@ricardoV94
Copy link
Member

ricardoV94 commented Aug 30, 2021

Asking the unpleasant question: Which moment will LogNormal get? Mean or mode?

(Or will there be a mechanism where this can be configured in some way?)

I could not quite figure out what was the point of multiple moments. I think it was more a question of having at least one good moment for each distribution, and perhaps having a discrete alternative for continuous distributions. For the latter, it should be enough to just round the output if a discrete initval from a continuous variable is needed.

@michaelosthege
Copy link
Member

Asking the unpleasant question: Which moment will LogNormal get? Mean or mode?
(Or will there be a mechanism where this can be configured in some way?)

I could not quite figure out what was the point of multiple moments. I think it was more a question of having at least one good moment for each distribution, and perhaps having a discrete alternative for continuous distributions. For the latter, it should be enough to just round the output if a discrete initval from a continuous variable is needed.

My point is kinda that I don't want to make that decision. In v3 the user did that.

If the get_moment method could take a moment: str kwarg (in addition to *rv_inputs) the implementation for LogNormal.get_moment, for example, could be parametrized.
Not sure if that's a good design though.
Just continuing that though process: If moment: str is a standard kwarg for that method, we could make the initival kwarg also be optionally a string such as random | mean | mode | median and just forward through get_moment.

@ricardoV94
Copy link
Member

I don't think the multiple moments were there for the user convenience. They were not even advertised on the docs.

@twiecki twiecki merged commit 9e15b20 into pymc-devs:main Sep 2, 2021
@twiecki
Copy link
Member

twiecki commented Sep 2, 2021

Thanks @kc611!

def get_moment(rv: TensorVariable) -> TensorVariable:
"""Fallback method for creating an initial value for a random variable.

Parameters are the same as for the `.dist()` method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was merged already, but this part of the docstrings is wrong

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we should fix that then. CC @kc611

Copy link
Contributor Author

@kc611 kc611 Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I missed that, those docstrings were supposed to be removed.

I'm not sure what (docstring) will go in it's place though. Maybe I should just remove them for now ? We can add a proper explanation when we give the get_moment a proper entry point in the initval framework (if that's being planned)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, then just remove them for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did it in #4979

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't set symbolic initval
5 participants