Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Drop scalar parameters for multivariate dists #5447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
canyon289 opened this issue Feb 5, 2022 · 8 comments
Closed

Drop scalar parameters for multivariate dists #5447

canyon289 opened this issue Feb 5, 2022 · 8 comments

Comments

@canyon289
Copy link
Member

canyon289 commented Feb 5, 2022

See PR for reference
Originally posted by @ricardoV94 in #5437 (comment)

mu=[0, 0]

Technically, it's incorrect to specify a scalar mu for the multivariate normal, although for backwards compatibility we reshape mu behind the scenes


Reply via ReviewNB

@ricardoV94
Copy link
Member

Related to #5440

Also original PR was requesting discussion: #5386

@canyon289
Copy link
Member Author

canyon289 commented Feb 5, 2022

Issue already opened in #5446.

Edit: That discussion is subtly different. Reopening

@canyon289 canyon289 reopened this Feb 5, 2022
@fonnesbeck
Copy link
Member

But we broadcast elsewhere, right? e.g. Normal('x', 0, np.ones(2)) Seems like if we allow that we should allow it for multivariates.

@ricardoV94
Copy link
Member

ricardoV94 commented Feb 6, 2022

But we broadcast elsewhere, right? e.g. Normal('x', 0, np.ones(2)) Seems like if we allow that we should allow it for multivariates.

That's allowed (by numpy.random.normal) because its an extension from the base parameter dimensions. The multivariate case is not allowed (by numpy.random.multivariate_normal) because there's no such thing as a multivariate with scalar mean.

One is a broadcast from valid dimensions, the other is a broadcast to valid dimensions.

Another way of thinking is that we are basically doing a numpy vectorize from a base case. This fails:

import numpy as np

def func(mu, cov):
    return np.random.multivariate(mu, cov).rvs()

vfunc = np.vectorize(func, signature='(n),(n,n)->(n)')
vfunc(np.ones(1), np.eye(3))

ValueError: inconsistent size for core dimension 'n': 3 vs 1

Having said that, it doesn't bother me much if we do this helper broadcast because I don't think there's ever an ambiguity as to the intention. But we have to make our minds :)

@michaelosthege
Copy link
Member

I also expect many people running into this problem, including myself.
So if we take the broadcast out, we should most definitely replace it with an instructive error:

ShapeError: This is a multivariate distribution with {ndim_supp} support dimensions. Its mu parameter must be at least {ndim_supp}-dimensional and is not broadcasted automatically.

Could the error be added automatically for any RV just given it's ndim_supp and ndims_params?

Continuing this line of thought, does the situation apply to time series distributions too?

@canyon289
Copy link
Member Author

For Time Series distributions I currently have no strong opinion or direction. it should eventually match whatever the decision ends up being here

@Sayam753
Copy link
Member

Sayam753 commented Feb 14, 2022

If it counts, I do think that removing the auto broadcasting step and adding an error message will be helpful.

Coming to the error message:

ShapeError: This is a multivariate distribution with {ndim_supp} support dimensions. Its mu parameter must be at least {ndim_supp}-dimensional and is not broadcasted automatically.

mu parameter is 1-dimensional as per #5447 (comment). But that 1-dimensional vector has to be of length 3, to match the shape of cov.

Furthermore, if we decide on removing broadcasting step, will we be having enough information to add an error message? I am wondering because at the time of distribution initialization in .dist method, can we perform:

assert mu.shape[-1].eval() == cov.shape[-1].eval()

?

@ricardoV94
Copy link
Member

ricardoV94 commented Feb 14, 2022

ShapeError: This is a multivariate distribution with {ndim_supp} support dimensions. Its mu parameter must be at least {ndim_supp}-dimensional and is not broadcasted automatically.

Could the error be added automatically for any RV just given it's ndim_supp and ndims_params?

I guess this could be done in the baseclass of RandomVariable, in make_node where we usually make ndims checks.

Furthermore, if we decide on removing broadcasting step, will we be having enough information to add an error message? I am wondering because at the time of distribution initialization in .dist method, can we perform:

assert mu.shape[-1].eval() == cov.shape[-1].eval()

I don't think we should do that. We usually only validate broadcasting shapes at runtime. Also, inputs may be shared variables, in which case the shapes (but not ndims) can change between model evaluations.

@pymc-devs pymc-devs locked and limited conversation to collaborators Mar 12, 2022
@ricardoV94 ricardoV94 converted this issue into discussion #5585 Mar 12, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

5 participants