Skip to content

Heteroskedastic GPs notebook #161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 9, 2021
Merged

Conversation

JohnGoertz
Copy link
Contributor

Description

Several other GP frameworks (namely GPflow and GPy have examples
of how to model heteroskedasticity with GPs, but PyMC3 doesn't. There are hints scattered throughout the Discourse on how to do so via latent GPs through sparse approximations and/or coregionalization, but these largely rely on external links and aren't collected in one place.

I've put together a notebook that works through various ways to achieve this. Let me know if there are any changes or extensions you'd like to see!

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@twiecki
Copy link
Member

twiecki commented May 21, 2021

I just skimmed but this looks awesome - thanks so much!

@drbenvincent or @bwengals: do you have time to take a look?

@drbenvincent
Copy link
Contributor

I can take a look. I've not used GP's, so my comments would probably be restricted to clarity / accessibility.

@review-notebook-app
Copy link

review-notebook-app bot commented May 21, 2021

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2021-05-21T16:08:35Z
----------------------------------------------------------------

result -> results


JohnGoertz commented on 2021-05-22T18:52:58Z
----------------------------------------------------------------

Fixed!

@review-notebook-app
Copy link

review-notebook-app bot commented May 21, 2021

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2021-05-21T16:08:36Z
----------------------------------------------------------------

Would _, axs = ... would be clearer than having the trailing [1] ? Just an idle comment, feel free to ignore.


JohnGoertz commented on 2021-05-21T16:21:06Z
----------------------------------------------------------------

Yeah, true, I got in the habit of the [1] notation because I rarely ever use the fig handle, but you're right the other way is more clear.

@review-notebook-app
Copy link

review-notebook-app bot commented May 21, 2021

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2021-05-21T16:08:36Z
----------------------------------------------------------------

It's probably with increasing target_accept to something like 0.9 (in the sampling options) to see if that avoids the divergences


JohnGoertz commented on 2021-05-21T16:21:33Z
----------------------------------------------------------------

I'll give that a shot, good call.

JohnGoertz commented on 2021-05-22T18:55:42Z
----------------------------------------------------------------

Yeah, I had to go up to 0.95 to get down to only a couple dozen divergences for each Latent model. Unfortunately roughly doubled sampling time, so I made a note at the end pointing out the trade-off.

drbenvincent commented on 2021-05-22T19:59:41Z
----------------------------------------------------------------

Sounds good

@review-notebook-app
Copy link

review-notebook-app bot commented May 21, 2021

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2021-05-21T16:08:37Z
----------------------------------------------------------------

Any reason to manually define rather than use the commented out line?


JohnGoertz commented on 2021-05-21T16:23:37Z
----------------------------------------------------------------

Initially, I hadn't figured out how to make the kmeans initialization reproducible, but I've found out it's through numpy's legacy random.seed() , so I'll change to that now.

OriolAbril commented on 2021-05-22T11:11:42Z
----------------------------------------------------------------

This is also probably worth an issue.

JohnGoertz commented on 2021-05-22T19:06:52Z
----------------------------------------------------------------

Yeah, it should be easy enough to fix, I can raise an issue and try to fix it myself by just inserting a call to np.random.seed() within the wrapper.

I found that the LMC implementation was actually quite sensitive to the placement of inducing points. Many initializations from kmeans led to zero-gradient errors (in particular in u_rotated ) that I couldn't figure out how to fix by tweaking the priors. I couldn't decide if that was worth pointing out or if it was somewhat specific to this dataset. I found that if I just selected inducing points from the original inputs it worked fine, perhaps because the data was evenly spaced to begin with. To be honest I need to read more into what the prior on u_rotated is doing.

@review-notebook-app
Copy link

review-notebook-app bot commented May 21, 2021

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2021-05-21T16:08:37Z
----------------------------------------------------------------

Same point... It's probably with increasing target_accept to something like 0.9 (in the sampling options) to see if that avoids the divergences


JohnGoertz commented on 2021-05-22T19:08:03Z
----------------------------------------------------------------

I got this down to 20 divergences with target_accept=0.95, but that ~tripled sampling time, so probably don't want to go much higher.

@review-notebook-app
Copy link

review-notebook-app bot commented May 21, 2021

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2021-05-21T16:08:38Z
----------------------------------------------------------------

Same point again about divergences


OriolAbril commented on 2021-05-22T11:15:07Z
----------------------------------------------------------------

Also seeing the ess warning, we should check ess and quantile ess and probably use a less extreme CI probability. Estimating the 95% one with less than 200 ess on some parameters is not a good idea.

JohnGoertz commented on 2021-05-22T19:16:50Z
----------------------------------------------------------------

Increasing target_accept to 0.95 improved this for all of them. The lowest ess_bulk for this model was 300 (one of the v s), though ℓ and η were only 370 and 383. The ess_tail for those three ended up being 131, 141, and 128. The non-correlated sparse latent did much better, with the smallest ess_bulk being 850 (η) and ess_tail all over 1100. r_hat s 1.01 or less all around, though. Do you think we should point that out and/or narrow drop the CI range? My only concern with a narrower CI is that it might obscure the difference in CI width for the two noise regimes, which is kind of the whole point. Maybe just a note at the end discussing the need for longer sampling of htsc to have good confidence in the CI?

@review-notebook-app
Copy link

review-notebook-app bot commented May 21, 2021

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2021-05-21T16:08:38Z
----------------------------------------------------------------

This feels a bit out of place here, especially without a brief explanation. It looks like it's not just a larger version of the last right hand subplot?


JohnGoertz commented on 2021-05-21T17:05:27Z
----------------------------------------------------------------

Admittedly, this was a bit of a hacky way for me to get around this issue with the way thumbnails are generated for the gallery page. The current behavior probably would have caused the thumbnail to crop the final three-panel figure down to just the middle panel, which I felt wasn't the most representative. I can try to find a way around it...

@review-notebook-app
Copy link

review-notebook-app bot commented May 21, 2021

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2021-05-21T16:08:39Z
----------------------------------------------------------------

I don't know how strict people are about conforming to the suggested watermark in the style guide


JohnGoertz commented on 2021-05-21T16:30:22Z
----------------------------------------------------------------

Woops, added the -p xarray bit

Copy link
Contributor Author

Yeah, true, I got in the habit of the [1] notation because I rarely ever use the fig handle, but you're right the other way is more clear.


View entire conversation on ReviewNB

Copy link
Contributor Author

I'll give that a shot, good call.


View entire conversation on ReviewNB

Copy link
Contributor Author

Initially, I hadn't figured out how to make the kmeans initialization reproducible, but I've found out it's through numpy's legacy random.seed() , so I'll change to that now.


View entire conversation on ReviewNB

Copy link
Contributor Author

Woops, added the -p xarray bit


View entire conversation on ReviewNB

Copy link
Contributor Author

Admittedly, this was a bit of a hacky way for me to get around this issue with the way thumbnails are generated for the gallery page. The current behavior probably would have caused the thumbnail to crop the final three-panel figure down to just the middle panel, which I felt wasn't the most representative. I can try to find a way around it...


View entire conversation on ReviewNB

@bwengals
Copy link
Collaborator

this is REALLY cool

Copy link
Member

This is also probably worth an issue.


View entire conversation on ReviewNB

Copy link
Member

Also seeing the ess warning, we should check ess and quantile ess and probably use a less extreme CI probability. Estimating the 95% one with less than 200 ess on some parameters is not a good idea.


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented May 22, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-05-22T11:19:56Z
----------------------------------------------------------------

I would probably remove the jedi config and let users choose whatever they want


JohnGoertz commented on 2021-05-22T19:17:51Z
----------------------------------------------------------------

Oh good point. Jedi wasn't behaving for me, so that's boilerplate code for me, I'll take it out.

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-05-22T11:19:56Z
----------------------------------------------------------------

I am not too familiar with gp functions so I don't know how good/bad dims integration is with them. We should use dims (either at model definition or via idata_kwargs or both) and if this turns out to be too complicated, open issues to signal the areas that need work. When using dims, I would then add the posterior predictive samples to the inferencedata object and average, select... using labeled dims. i.e. instead of var(axis=0) that people reading the notebook may not know what it means it would become var(dim=("chain", "draw")). This will probably break a lot of plotting code though but I can help with that, and it will also serve to indicate missing functionality in xarray, probably along the lines of https://github.com/arviz-devs/arviz/issues/1668 like random sample generation.

It may also be worth it to tackle this in a follow-up PR, right now I'm mostly trying to gauge how much work will that be as well as how much (if any) extra clarity it can provide (the notebook is already amazing).


Copy link
Contributor Author

Fixed!


View entire conversation on ReviewNB

Copy link
Contributor Author

Yeah, I had to go up to 0.95 to get down to only a couple dozen divergences for each Latent model. Unfortunately roughly doubled sampling time, so I made a note at the end pointing out the trade-off.


View entire conversation on ReviewNB

Copy link
Contributor Author

Yeah, it should be easy enough to fix, I can raise an issue and try to fix it myself by just inserting a call to np.random.seed() within the wrapper.

I found that the LMC implementation was actually quite sensitive to the placement of inducing points. Many initializations from kmeans led to zero-gradient errors (in particular in u_rotated ) that I couldn't figure out how to fix by tweaking the priors. I couldn't decide if that was worth pointing out or if it was somewhat specific to this dataset. I found that if I just selected inducing points from the original inputs it worked fine, perhaps because the data was evenly spaced to begin with. To be honest I need to read more into what the prior on u_rotated is doing.


View entire conversation on ReviewNB

Copy link
Contributor Author

I got this down to 20 divergences with target_accept=0.95, but that ~tripled sampling time, so probably don't want to go much higher.


View entire conversation on ReviewNB

Copy link
Contributor Author

Increasing target_accept to 0.95 improved this for all of them. The lowest ess_bulk for this model was 300 (one of the v s), though ℓ and η were only 370 and 383. The ess_tail for those three ended up being 131, 141, and 128. The non-correlated sparse latent did much better, with the smallest ess_bulk being 850 (η) and ess_tail all over 1100. r_hat s 1.01 or less all around, though. Do you think we should point that out and/or narrow drop the CI range? My only concern with a narrower CI is that it might obscure the difference in CI width for the two noise regimes, which is kind of the whole point. Maybe just a note at the end discussing the need for longer sampling of htsc to have good confidence in the CI?


View entire conversation on ReviewNB

Copy link
Contributor Author

Oh good point. Jedi wasn't behaving for me, so that's boilerplate code for me, I'll take it out.


View entire conversation on ReviewNB

Copy link
Contributor

Sounds good


View entire conversation on ReviewNB

@JohnGoertz
Copy link
Contributor Author

Okay, I've re-run with a higher target_accept, which reduced but didn't eliminate divergences. However, upon attempting to demonstrate how to get the learned correlation coefficient out of the LMC, it seems it's not really working as expected. It appears to have learned a slight negative correlation (-0.1), albeit very wide 95% CI bounds (-0.9,0.9). I added a note of this, as well as a discussion of the sampler statistics (particularly ess_tail), but maybe it's best to omit the LMC from this notebook? Perhaps this isn't the best example of its use, and it may be more confusing than helpful.

@drbenvincent
Copy link
Contributor

All my points were addressed, this all looks great to me. But I'm not experienced with GP's, so I think this notebook just needs a final check over from someone with more GP knowledge?

@twiecki twiecki merged commit cc5350b into pymc-devs:main Jun 9, 2021
@twiecki
Copy link
Member

twiecki commented Jun 9, 2021

Thanks @JohnGoertz and @drbenvincent!

@JohnGoertz
Copy link
Contributor Author

That's great, thanks for looking it over and glad to contribute!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants