Add icdf functions for Lognormal, Half Cauchy and Half Normal distributions #6766

amyoshino · 2023-06-09T20:59:04Z

What is this PR about?
Adds ICDF functions to Lognormal Distribution

Issue #6612
Comment on issues found while working on Issue #6747

References:
Lognormal:

HalfCauchy: matches formulas used in either SciPy and R extraDistr

HalfNormal: matches formulas used in either SciPy and R extraDistr

Checklist

Explain important implementation details 👆
Make sure that the pre-commit linting/style checks pass.
Link relevant issues (preferably in nice commit messages)
Are the changes covered by tests and docstrings?
Fill out the short summary sections 👇

Major / Breaking Changes

...

New features

LogNormal icdf function

Bugfixes

...

Documentation

...

Maintenance

...

📚 Documentation preview 📚: https://pymc--6766.org.readthedocs.build/en/6766/

codecov · 2023-06-09T21:13:51Z

Codecov Report

Merging #6766 (cdc104c) into main (14e673f) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #6766   +/-   ##
=======================================
  Coverage   91.89%   91.90%           
=======================================
  Files          95       95           
  Lines       16185    16197   +12     
=======================================
+ Hits        14874    14886   +12     
  Misses       1311     1311

Impacted Files	Coverage Δ
pymc/distributions/continuous.py	`97.78% <100.00%> (+0.02%)`	⬆️

amyoshino · 2023-06-10T11:40:43Z

I am having an issue when developing the test for this function.
The implementation of the inverse CDF for the lognormal should be very straightforward, from Wikipedia we can see that the lognormal quantile function is just the exponential of the normal quantile function:
Normal:

Lognormal:

So, I just got the already implemented icdf function for the normal and added np.exp() to it.

But it looks like the implemented function for the normal distribution icdf has a tiny difference if compared to the scipy implementation :
For mu = 2.1, sigma = 20.0, p = 0.99
np.exp(mu + sigma * -np.sqrt(2.0) * pt.erfcinv(2 * value)) - st.norm.ppf(value, mu, sigma) = 7.105427357601002e-15
(passes the test of 6 digits precision)

But when this is passed to the exponential function, the difference becomes larger than the tolerance of 6 decimals and the test fail.

Using the values of the failed test above, I get the same values of error locally, and the result for the icdf differs both from scipy and R stats (qlnorm function) starting on the 8th digit:

Investigating it further, the Scipy implementation uses a different way to compute the icdf by approximating it with a routine described by this script: https://github.com/scipy/scipy/blob/2f3831503aff159994eafa75745c9537f8db060f/scipy/special/cephes/ndtri.c#L1

Which is also different from the R stats qnorm function:
https://github.com/wch/r-source/blob/a6d764783f5010268a33e3610189983e9bb778db/src/nmath/qnorm.c#L23
That is used to get qlnorm by exponentiating it: https://github.com/wch/r-source/blob/a6d764783f5010268a33e3610189983e9bb778db/src/nmath/qlnorm.c#L29

Hence the difference in the results. Any ideas on how to handle this divergences on implementations during the tests?

ricardoV94 · 2023-06-12T08:23:25Z

Small numerical differences are fine. You can get around them by either tuning decimal or choosing different set of parameters domains so that more extreme combinations don't show up in the tests.

Those are pretty extreme values you're getting with q=0.99 so it's not surprising tiny differences blow up.

For testing locally, don't forget to set n_samples=-1 so that all combinations are tested.

We have been thinking about changing the precision criteria because the one used now is pretty dumb: #6159

amyoshino · 2023-06-13T18:19:32Z

Ricardo, thank you for the comments! I will follow your suggestions and submit a PR soon.

amyoshino · 2023-06-19T21:48:21Z

@ricardoV94, thank you for the comments on handling the precision issues. I have successfully added tests for the lognormal icdf function.
I also added the icdf functions for the HalfCauchy and HalfNormal distributions but didn't manage to use the function you pointed out in #6612.

def icdf(value, *args)
  return icdf(Full.dist(*args), value)

Instead I used the formulas consistent with the implementations found in SciPy and R extraDistr packages.

ricardoV94

Looks great, only some small changes needed

pymc/distributions/continuous.py

ricardoV94 · 2023-06-23T14:42:05Z

@ricardoV94, thank you for the comments on handling the precision issues. I have successfully added tests for the lognormal icdf function. I also added the icdf functions for the HalfCauchy and HalfNormal distributions but didn't manage to use the function you pointed out in #6612.
def icdf(value, *args)
  return icdf(Full.dist(*args), value)
Instead I used the formulas consistent with the implementations found in SciPy and R extraDistr packages.

You're right, thanks for checking. Opened related PR to fail explicitly for the icdf

ricardoV94 · 2023-06-23T14:54:19Z

pymc/distributions/continuous.py

@@ -856,6 +856,15 @@ def logcdf(value, loc, sigma):
            msg="sigma > 0",
        )

+    def icdf(value, loc, sigma):
+        res = Normal.icdf((value + 1.0) / 2.0, loc, sigma)


This is a bit annoying because we don't allow users to create a HalfNormal with loc != 0 but in theory they could call icdf or reach this function with a HalfNormal that has a custom non-zero loc.

My question is, will this expression work for non-zero loc?

This is also a question for the HalfCauchy.

I wonder if the best solution is to reimplement these RandomVariables directly in PyMC in a way that they don't accept a loc argument (just like the HalfStudentTRV in this file). This way we don't have to worry about loc in the logp/logcdf/icdf/moment functions.

If we go down that path, we can remove the current HalfNormal and HalfCauchy RandomVariables from PyTensor as they aren't really needed there.

The implemented formula work well even when loc != 0 and is consistent with SciPy results:

Testing HalfNormal:

Testing HalfCauchy:

Allow me some more time to bring some screenshots calling the implemented functions with pm.HalfNormal.icdf() and pm.HalfCauchy.icdf()

If you think the best solution is still to reimplement RandomVariables directly in PyMC in a way that they don't accept a loc argument, let me know.

We don't need to do it in this PR, but the fact that it isn't being tested in our suite (and not just the new icdf) is already a good argument to drop it

Good news that it works though!

pymc/distributions/continuous.py

ricardoV94 · 2023-06-26T06:08:09Z

Thanks @amyoshino!

amyoshino · 2023-06-26T13:11:23Z

Thanks for your guidance @ricardoV94 !

ricardoV94 · 2023-06-26T13:12:38Z

My pleasure @amyoshino. Looking forward to you next PRs!

adding icdf function and tests

711ee4b

ricardoV94 mentioned this pull request Jun 16, 2023

Add icdf functions for distributions #6612

Open

38 tasks

amyoshino and others added 3 commits June 19, 2023 09:23

Merge branch 'pymc-devs:main' into lognorm_icdf_2

59938f7

adding adjustments on tests for the lognormal icdf function

c9e48ec

fixing changes wrong

1035cc6

amyoshino marked this pull request as ready for review June 19, 2023 21:46

amyoshino changed the title ~~adding icdf function and tests for Lognormal distribution~~ Add icdf function for Lognormal distribution Jun 19, 2023

add icdf functions to halfcauchy and half normal

8129cbd

amyoshino changed the title ~~Add icdf function for Lognormal distribution~~ Add icdf function for Lognormal, Half Cauchy and Half Normal distributions Jun 20, 2023

amyoshino changed the title ~~Add icdf function for Lognormal, Half Cauchy and Half Normal distributions~~ Add icdf functions for Lognormal, Half Cauchy and Half Normal distributions Jun 20, 2023

Merge branch 'pymc-devs:main' into lognorm_icdf_2

63680c2

ricardoV94 added the enhancements label Jun 23, 2023

ricardoV94 requested changes Jun 23, 2023

View reviewed changes

pymc/distributions/continuous.py Outdated Show resolved Hide resolved

pymc/distributions/continuous.py Outdated Show resolved Hide resolved

pymc/distributions/continuous.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Jun 23, 2023

View reviewed changes

amyoshino added 2 commits June 24, 2023 07:39

obtaining icdf from full distribution

63fd6ba

retrigger tests

08f6974

ricardoV94 reviewed Jun 25, 2023

View reviewed changes

pymc/distributions/continuous.py Outdated Show resolved Hide resolved

fixing lognormal

cdc104c

ricardoV94 approved these changes Jun 26, 2023

View reviewed changes

ricardoV94 merged commit 7b08fc1 into pymc-devs:main Jun 26, 2023

amyoshino deleted the lognorm_icdf_2 branch June 26, 2023 13:09

Uh oh!

Add icdf functions for Lognormal, Half Cauchy and Half Normal distributions #6766

Add icdf functions for Lognormal, Half Cauchy and Half Normal distributions #6766

Uh oh!

Conversation

amyoshino commented Jun 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Major / Breaking Changes

New features

Bugfixes

Documentation

Maintenance

Uh oh!

codecov bot commented Jun 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

amyoshino commented Jun 10, 2023

Uh oh!

ricardoV94 commented Jun 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amyoshino commented Jun 13, 2023

Uh oh!

amyoshino commented Jun 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ricardoV94 commented Jun 23, 2023

Uh oh!

ricardoV94 Jun 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amyoshino Jun 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 24, 2023

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 24, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ricardoV94 commented Jun 26, 2023

Uh oh!

amyoshino commented Jun 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Jun 26, 2023

Uh oh!

Uh oh!

amyoshino commented Jun 9, 2023 •

edited

Loading

codecov bot commented Jun 9, 2023 •

edited

Loading

ricardoV94 commented Jun 12, 2023 •

edited

Loading

amyoshino commented Jun 19, 2023 •

edited

Loading

ricardoV94 Jun 23, 2023 •

edited

Loading

amyoshino Jun 24, 2023 •

edited

Loading

amyoshino commented Jun 26, 2023 •

edited

Loading