-
Couldn't load subscription status.
- Fork 326
Adding Multinomial and Nested Logit Models for Consumer Choice #1654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…erface Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1654 +/- ##
==========================================
+ Coverage 90.08% 92.01% +1.92%
==========================================
Files 60 62 +2
Lines 6819 7385 +566
==========================================
+ Hits 6143 6795 +652
+ Misses 676 590 -86 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
| "alphas_": alphas, | ||
| "betas": betas, | ||
| "betas_fixed_": betas_fixed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do some have suffix of _ and others do not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to 0 out one of the alternatives alphas_ --> alpha and I was doing the same for betas_fixed_ for parameter identification.
e.g alphaz = pm.Deterministic('alphas', pt.set_subtensor(alphas_, -1, 0))....
|
|
||
| return result | ||
|
|
||
| def parse_formula(self, df, formula, depvar): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pull out this into a function instead of method. A good separation might be:
- parse_formula(formula) -> tuple[target, alt_covariates, fixed_covariates]
- check_columns(covariates, df)
- check_dependent_variable(target, ser)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, can do... probably not tonight though. Wrecked. Will try and pick it up during the week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries. take your time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pulled out the checks and made them staticmethods that are then invoked inside parse_formula. Not sure if that's exactly what you meant?
|
Ok folks. Making progress. Added a bunch of type-hints and improved doc-strings to the nested logit class. Also added a light write up to both notebooks for multinomial and nested logit on the whys and hows. Still missing a few doc-strings and will address @williambdean 's request soon too. Thanks for the feedback @juanitorduz , @williambdean . |
|
great! (also, the notebooks are missing the watermarks and the end :) ) |
Signed-off-by: Nathaniel <[email protected]>
|
Added the watermarks too @juanitorduz : |
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
|
Pretty happy with this now @juanitorduz , @williambdean . Think i addressed the concerns above. |
| @@ -0,0 +1,2488 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, these should be handled here:
| if "|" in covariates: |
| @@ -0,0 +1,2488 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we used dims in the various places here. I'm assuming that the 900 for P_central, P_room, and denom_top are same as obs (900)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's straightforward to add dims to the marginal P_central etc... it's not as straightforward to dynamically create the coords for every potential nest at each layer. Like it can be done, just a large headache. Trying not to overcomplicate this function if i can avoid it:
| def _prepare_coords(df, alternatives, covariates, f_covariates, nests): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Then let's leave out for now
|
Really cool stuff. I liked those videos that were linked |
| """Do not use, required by parent class. Prefer make_model().""" | ||
| return super().build_model(X, y, **kwargs) | ||
|
|
||
| def make_model(self, X, F, y) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to put X and F in a single xr.Dataset and pass as "X"
Or would that be unituitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that wouldn't be in the spirit of the model, the decomposition to fixed covariates and alternative specific covariates is pretty foundational to the literature and write up. Would avoid bundling them if i could.
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
|
Thanks @NathanielF ! Once @williambdean gives the last ✅ we can merge and release to get feedback from the community! 🤘 |
|
Exciting @juanitorduz ! Real curious to see if these models are useful for the community. There is quite a few directions we could go with them. |
Signed-off-by: Nathaniel <[email protected]>
|
Actually, this PR looks great, and I am sure we can improve in future iterations. So let's merge and iterate :) |
|
Awesome! Thanks @juanitorduz |

Description
I'm adding two new model classes for discrete choice style models that I intend to be part of the consumer choice module.
As it stands i'm opening the PR as a draft for discussion around the implementation choices and API design I have for these models.
Related to this issue: #1653. There is a lot of potential in the discrete choice style models for Bayesian modelling in particular, the state of the art models in this domain involves a mixed logit parameterisation for which "vanilla" implementations are pretty straightforward using Bayesian hierarchical parameterisations.
Two New Models
Main things to flag: There are now two new model files in the consumer choice folder. The simple Multinomial Logit and the Nested Logit. As outlined in the issue i've restricted the nested logit to no more than two layers of nesting. I believe this will bring us to beyond parity with packages like
mlogitin R andpylogitin python which allow for only 1 level deep nesting structures.API Discussion
The API i'm suggesting for these models differs from the typical X,y inputs on the models in pymc marketing in general. Mostly this is because I feel the use of Wilkinson style notation here is important. For instance this is how you specify the Nested Logit Model currently:
We assume a wide-data input as well:
Causal Inference and Counterfactuals
The value that these models bring is their focus on causal inference. The entire history of discrete choice models stems effectively from the observation that multinomial logit models cannot support plausible counterfactuals around market interventions (due to IIA) and more sophisticated discrete choice models like the nested logit models are able to solve this. See for instance here how a pricing intervention on a multinomial logit results in proportional re-allocation of market share to the rest of the market.
We demonstrate this problem and solution by adding 2 new notebooks to the gallery.
In the second notebook for nested logit we show how the IIA is solved by this extra nesting structure:
Fixed Attributes and Alternative Specific Attributes
One thing i've done is to ensure that the models can identify parameters for the alternative specific attributes (e.g. price) and the individually fixed attributes e.g. (income). I've done my best to benchmark the parameter identification and recovery against R's
mlogitpackage:How to Proceed?
I have not done an extensive write up of the math behind these types of models and some of the functions need more documentation and tests. But I wanted to share what I have so far to generate discussion and maybe decide on how to proceed. One immediate improvement i could think of would be to remove duplication from the nested logit and multinomial logit model classes, making them instances of a more general "DISCRETE CHOICE" class where we could re-use e.g. the formula parsing functions. Additionally i'd like to benchmark the parameter identification with a second data set and example.
Longer term i think there is room for adding a vanilla mixed-logit example too.
Anyway, open to feedback. Adding a draft PR now to check which linting, and testing failures i have.
Related Issue
Checklist
pre-commit.ci autofixto auto-fix.📚 Documentation preview 📚: https://pymc-marketing--1654.org.readthedocs.build/en/1654/