Skip to content

Added Multinomial distribution #1478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

benjamin-lieser
Copy link
Member

Motivation

The distribution can often be useful and it's not straight forward to sample from it

Details

It is a multidimensional distribution, so there are some design considerations.

I decided to have two structs MultinomialConst and MultinomialDyn for compile time known number of categories and for runtime known number of categories.

There is a struct Multinomial with two functions which construct the distributions.

The dynamic version is only available with the alloc feature.

I am happy to receive some feedback regarding this decision. I guess this will influence how we will deal with other multidimensional distributions like the Dirichlet which only supports the const generic version at the moment.

@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Add plots for `rand_distr` distributions to documentation (#1434)
- Add `PertBuilder`, fix case where mode ≅ mean (#1452)
- Add `Multinomail` distribution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here "ail" vs "ial"

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution and for imagining what multi-dimensional distributions should look like.

Adding another distribution is not my top priority (before rand v0.9 is released), but some quick review points are below.

Comment on lines 39 to 41
pub struct Multinomial {}

impl Multinomial {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct is just a stub used to construct the const/dyn variants? Then I'd prefer you just move Multinomial::new_constMultinomialConst::new etc.; it's more predictable and one less import for common usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that each distribution would have one common struct to construct all its variants.
Taking #494 into account, we might end up with a lot of distribution structs. Most people would never care about the exact type returned.
Having all the structs in one common doc page would probably do the same.

Comment on lines 108 to 115
pub struct MultinomialConst<'a, const K: usize, I> {
/// number of draws
n: I,
/// weights for the multinomial distribution
weights: &'a [f64; K],
/// sum of the weights
sum: f64,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the major advantages of "distribution objects" is that they can be stored in structs; usage of a lifetime parameter here makes that difficult (except in the case 'static) since Rust doesn't properly support self-referential structs (yes, I know about Ouroboros).

In other words, I think we should usually prefer copying parameters into the struct implementing the distribution (no lifetime parameter), especially in this case where we don't need alloc. It does depend on expected usage, but in this case K should normally be reasonably small.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the problem. In my usecase I would only sample from each distribution object once, so copying might matter, but probably not significantly so to make up for the additional lifetime.
This also allows us to take an iterator and make it more general

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamin-lieser another possibility here is a special MultinomialRef<'a, I>.

In that case, the Distribution trait is not appropriate however; we'd want something like your fn sample which accepts an output buffer as an argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about that and I would say a second Distribution trait which takes a buffer instead of returning the sample would be good. No idea for a good name yet.

This would even be useful for the const generic case.

Maybe this is premature optimization and the way to go is just use owned memory everywhere and ditch the const generics also.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone else wanting to follow this discussion, look here.

Comment on lines 121 to 128
pub struct MultinomialDyn<'a, I> {
/// number of draws
n: I,
/// weights for the multinomial distribution
weights: &'a [f64],
/// sum of the weights
sum: f64,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we just use weights: Vec<f64> or Box<[f64]>, though we could use tinyvec.

@dhardy dhardy added D-changes Do: changes requested E-question Participation: opinions wanted labels Jul 28, 2024
@benjamin-lieser
Copy link
Member Author

I will make a new PR for this on rand_distr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D-changes Do: changes requested E-question Participation: opinions wanted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants