Skip to content

Add icdf functions for Lognormal, Half Cauchy and Half Normal distributions #6766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 26, 2023

Conversation

amyoshino
Copy link
Member

@amyoshino amyoshino commented Jun 9, 2023

What is this PR about?
Adds ICDF functions to Lognormal Distribution

Issue #6612
Comment on issues found while working on Issue #6747

References:
Lognormal:

HalfCauchy: matches formulas used in either SciPy and R extraDistr

HalfNormal: matches formulas used in either SciPy and R extraDistr

Checklist

Major / Breaking Changes

  • ...

New features

  • LogNormal icdf function

Bugfixes

  • ...

Documentation

  • ...

Maintenance

  • ...

📚 Documentation preview 📚: https://pymc--6766.org.readthedocs.build/en/6766/

@codecov
Copy link

codecov bot commented Jun 9, 2023

Codecov Report

Merging #6766 (cdc104c) into main (14e673f) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #6766   +/-   ##
=======================================
  Coverage   91.89%   91.90%           
=======================================
  Files          95       95           
  Lines       16185    16197   +12     
=======================================
+ Hits        14874    14886   +12     
  Misses       1311     1311           
Impacted Files Coverage Δ
pymc/distributions/continuous.py 97.78% <100.00%> (+0.02%) ⬆️

@amyoshino
Copy link
Member Author

I am having an issue when developing the test for this function.
The implementation of the inverse CDF for the lognormal should be very straightforward, from Wikipedia we can see that the lognormal quantile function is just the exponential of the normal quantile function:
Normal:
image
Lognormal:
image

So, I just got the already implemented icdf function for the normal and added np.exp() to it.

But it looks like the implemented function for the normal distribution icdf has a tiny difference if compared to the scipy implementation :
For mu = 2.1, sigma = 20.0, p = 0.99
np.exp(mu + sigma * -np.sqrt(2.0) * pt.erfcinv(2 * value)) - st.norm.ppf(value, mu, sigma) = 7.105427357601002e-15
(passes the test of 6 digits precision)

But when this is passed to the exponential function, the difference becomes larger than the tolerance of 6 decimals and the test fail.

image

Using the values of the failed test above, I get the same values of error locally, and the result for the icdf differs both from scipy and R stats (qlnorm function) starting on the 8th digit:
image

Investigating it further, the Scipy implementation uses a different way to compute the icdf by approximating it with a routine described by this script: https://github.com/scipy/scipy/blob/2f3831503aff159994eafa75745c9537f8db060f/scipy/special/cephes/ndtri.c#L1

Which is also different from the R stats qnorm function:
https://github.com/wch/r-source/blob/a6d764783f5010268a33e3610189983e9bb778db/src/nmath/qnorm.c#L23
That is used to get qlnorm by exponentiating it: https://github.com/wch/r-source/blob/a6d764783f5010268a33e3610189983e9bb778db/src/nmath/qlnorm.c#L29

Hence the difference in the results. Any ideas on how to handle this divergences on implementations during the tests?

@ricardoV94
Copy link
Member

ricardoV94 commented Jun 12, 2023

Small numerical differences are fine. You can get around them by either tuning decimal or choosing different set of parameters domains so that more extreme combinations don't show up in the tests.

Those are pretty extreme values you're getting with q=0.99 so it's not surprising tiny differences blow up.

For testing locally, don't forget to set n_samples=-1 so that all combinations are tested.

We have been thinking about changing the precision criteria because the one used now is pretty dumb: #6159

@amyoshino
Copy link
Member Author

Ricardo, thank you for the comments! I will follow your suggestions and submit a PR soon.

@amyoshino amyoshino marked this pull request as ready for review June 19, 2023 21:46
@amyoshino
Copy link
Member Author

amyoshino commented Jun 19, 2023

@ricardoV94, thank you for the comments on handling the precision issues. I have successfully added tests for the lognormal icdf function.
I also added the icdf functions for the HalfCauchy and HalfNormal distributions but didn't manage to use the function you pointed out in #6612.

def icdf(value, *args)
  return icdf(Full.dist(*args), value)

Instead I used the formulas consistent with the implementations found in SciPy and R extraDistr packages.

@amyoshino amyoshino changed the title adding icdf function and tests for Lognormal distribution Add icdf function for Lognormal distribution Jun 19, 2023
@amyoshino amyoshino changed the title Add icdf function for Lognormal distribution Add icdf function for Lognormal, Half Cauchy and Half Normal distributions Jun 20, 2023
@amyoshino amyoshino changed the title Add icdf function for Lognormal, Half Cauchy and Half Normal distributions Add icdf functions for Lognormal, Half Cauchy and Half Normal distributions Jun 20, 2023
Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, only some small changes needed

@ricardoV94
Copy link
Member

@ricardoV94, thank you for the comments on handling the precision issues. I have successfully added tests for the lognormal icdf function. I also added the icdf functions for the HalfCauchy and HalfNormal distributions but didn't manage to use the function you pointed out in #6612.

def icdf(value, *args)
  return icdf(Full.dist(*args), value)

Instead I used the formulas consistent with the implementations found in SciPy and R extraDistr packages.

You're right, thanks for checking. Opened related PR to fail explicitly for the icdf

@@ -856,6 +856,15 @@ def logcdf(value, loc, sigma):
msg="sigma > 0",
)

def icdf(value, loc, sigma):
res = Normal.icdf((value + 1.0) / 2.0, loc, sigma)
Copy link
Member

@ricardoV94 ricardoV94 Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit annoying because we don't allow users to create a HalfNormal with loc != 0 but in theory they could call icdf or reach this function with a HalfNormal that has a custom non-zero loc.

My question is, will this expression work for non-zero loc?

This is also a question for the HalfCauchy.

I wonder if the best solution is to reimplement these RandomVariables directly in PyMC in a way that they don't accept a loc argument (just like the HalfStudentTRV in this file). This way we don't have to worry about loc in the logp/logcdf/icdf/moment functions.

If we go down that path, we can remove the current HalfNormal and HalfCauchy RandomVariables from PyTensor as they aren't really needed there.

Copy link
Member Author

@amyoshino amyoshino Jun 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implemented formula work well even when loc != 0 and is consistent with SciPy results:

Testing HalfNormal:
image

Testing HalfCauchy:
image

Allow me some more time to bring some screenshots calling the implemented functions with pm.HalfNormal.icdf() and pm.HalfCauchy.icdf()

If you think the best solution is still to reimplement RandomVariables directly in PyMC in a way that they don't accept a loc argument, let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do it in this PR, but the fact that it isn't being tested in our suite (and not just the new icdf) is already a good argument to drop it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news that it works though!

@ricardoV94 ricardoV94 merged commit 7b08fc1 into pymc-devs:main Jun 26, 2023
@ricardoV94
Copy link
Member

Thanks @amyoshino!

@amyoshino amyoshino deleted the lognorm_icdf_2 branch June 26, 2023 13:09
@amyoshino
Copy link
Member Author

amyoshino commented Jun 26, 2023

Thanks for your guidance @ricardoV94 !

@ricardoV94
Copy link
Member

My pleasure @amyoshino. Looking forward to you next PRs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants