Skip to content

Add sum-zero constrained intrinsic conditional autoregressive (ICAR) variable #4851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

zoj613
Copy link
Contributor

@zoj613 zoj613 commented Jul 10, 2021

THis PR is a followup on #4518 (comment) . It impelements the sum-zero constrained intrinsic conditonal autoregressive (ICAR) model as described in [1].

References

[1] Keefe M.J., Ferreira M.A.R., Franck C.T. On the formal specification of sum-zero constrained intrinsic conditional autoregressive models Spat. Stat., 24 (2018), pp. 54-65

Depending on what your PR does, here are a few things you might want to address in the description:

@zoj613 zoj613 changed the title Add sum-zero constrained intrinsic conditional autoregressive model Add sum-zero constrained intrinsic conditional autoregressive (ICAR) model Jul 10, 2021
@zoj613
Copy link
Contributor Author

zoj613 commented Jul 10, 2021

Copy of the article can be found here: #4518 (comment)

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening the PR. I left some suggestions.

I suggest you check this developer guide for the next steps (docstrings, tests, etc...): https://www.github.com/pymc-devs/pymc3/tree/main/docs%2Fsource%2Fdeveloper_guide_implementing_distribution.md

@zoj613 zoj613 changed the title Add sum-zero constrained intrinsic conditional autoregressive (ICAR) model Add sum-zero constrained intrinsic conditional autoregressive (ICAR) variable Jul 11, 2021
@codecov
Copy link

codecov bot commented Jul 17, 2021

Codecov Report

Merging #4851 (b0a5853) into main (06d1412) will decrease coverage by 2.58%.
The diff coverage is 81.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4851      +/-   ##
==========================================
- Coverage   73.09%   70.51%   -2.59%     
==========================================
  Files          86       86              
  Lines       13856    13888      +32     
==========================================
- Hits        10128     9793     -335     
- Misses       3728     4095     +367     
Impacted Files Coverage Δ
pymc3/distributions/__init__.py 100.00% <ø> (ø)
pymc3/distributions/multivariate.py 54.75% <81.81%> (-16.70%) ⬇️
pymc3/printing.py 22.22% <0.00%> (-63.64%) ⬇️
pymc3/distributions/discrete.py 79.25% <0.00%> (-19.73%) ⬇️
pymc3/distributions/continuous.py 84.33% <0.00%> (-11.93%) ⬇️
pymc3/distributions/dist_math.py 80.52% <0.00%> (-6.85%) ⬇️
pymc3/distributions/bound.py 28.44% <0.00%> (-3.67%) ⬇️
pymc3/distributions/logprob.py 94.52% <0.00%> (-1.37%) ⬇️
pymc3/distributions/transforms.py 91.30% <0.00%> (-1.09%) ⬇️
... and 6 more

@zoj613
Copy link
Contributor Author

zoj613 commented Jul 18, 2021

@ricardoV94 is there infrastructure to generate adjecency matrices for testing CAR variables? I'm having trouble creating input for the test_distributions.py test im supposed to write.

Normally, i'd use something like the following, assuming the area of interest is made up of a rectangular lattice:

import libpysal 

lattice_rows = 2
lattice_cols = 5
A = libpysal.weights.lat2SW(lattice_rows, lattice_cols, criterion='rook', row_st=False)

which gives:

>>> A.toarray()
array([[0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
       [1, 0, 1, 0, 0, 0, 1, 0, 0, 0],
       [0, 1, 0, 1, 0, 0, 0, 1, 0, 0],
       [0, 0, 1, 0, 1, 0, 0, 0, 1, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
       [0, 1, 0, 0, 0, 1, 0, 1, 0, 0],
       [0, 0, 1, 0, 0, 0, 1, 0, 1, 0],
       [0, 0, 0, 1, 0, 0, 0, 1, 0, 1],
       [0, 0, 0, 0, 1, 0, 0, 0, 1, 0]], dtype=int8)

@ricardoV94
Copy link
Member

You can hardcode some predefined matrices if it's too complicated to generate then dynamically. If it's too many you can put them in a separate file.

@zoj613
Copy link
Contributor Author

zoj613 commented Jul 19, 2021

You can hardcode some predefined matrices if it's too complicated to generate then dynamically. If it's too many you can put them in a separate file.

I added a simple function to generate such, though it's far from being robust. I am now getting weird errors locally for the logp tests. I think maybe my understanding of the Domain objects for multivariate distributions is lacking. I am getting IndexError: list index out of range and
`

Details
x = A

    def _is_sparse(x):
        """
    
        Returns
        -------
        boolean
            True iff x is a L{scipy.sparse.spmatrix} (and not a L{numpy.ndarray}).
    
        """
        if not isinstance(x, (scipy.sparse.spmatrix, np.ndarray, tuple, list)):
>           raise NotImplementedError(
                "this function should only be called on "
                "sparse.scipy.sparse.spmatrix or "
                "numpy.ndarray, not,",
                x,
            )
E           NotImplementedError: ('this function should only be called on sparse.scipy.sparse.spmatrix or numpy.ndarray, not,', A)

depending on how I define the input to check_logp

@zoj613
Copy link
Contributor Author

zoj613 commented Jul 20, 2021

@ricardoV94 any suggestions regarding the above errors?

@ricardoV94
Copy link
Member

ricardoV94 commented Jul 20, 2021

@ricardoV94 any suggestions regarding the above errors?

The edge entries in a Domain are excluded. If you have less than three entries (e.g., Domain([0, 1]), there will be nothing usable in the check_logp. Could this be your issue? For instance, of your tau domain only the middle value will ever be tested.

@zoj613
Copy link
Contributor Author

zoj613 commented Jul 20, 2021

@ricardoV94 any suggestions regarding the above errors?

The edge entries in a Domain are excluded. If you have less than three entries (e.g., Domain([0, 1]), there will be nothing usable in the check_logp. Could this be your issue? For instance, of your tau domain only the middle value will ever be tested.

I tried that using the following:

        adj_mat = Domain([random_adjacency_matrix(n, rng) for _ in range(5)])
        dom = Domain([generate_vals(n, rng) for _ in range(5)])
        tau = Domain([0.1, 0.25, 1., 1.5, 2.])

But raises the first error I got, which is: NotImplementedError: ('this function should only be called on sparse.scipy.sparse.spmatrix or numpy.ndarray, not,', A). It appears that the aesara.tensor.sharedvar.TensorSharedVariable arguments are not passed in correctly to the dist function.

See the full trace below:

Details
...
        adj_mat = Domain([random_adjacency_matrix(n, rng) for _ in range(5)])
        dom = Domain([generate_vals(n, rng) for _ in range(5)])
        tau = Domain([0.1, 0.25, 1., 1.5, 2.])
    
>       self.check_logp(
            ICAR,
            dom,
            {"A": adj_mat, "tau": tau},
            ref_icar_logp,
        )

pymc3/tests/test_distributions.py:2088: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pymc3/tests/test_distributions.py:657: in check_logp
    model, param_vars = build_model(pymc3_dist, domain, paramdomains, extra_args)
pymc3/tests/test_distributions.py:249: in build_model
    distfam(
pymc3/distributions/distribution.py:207: in __new__
    rv_out = cls.dist(*args, rng=rng, initval=None, **kwargs)
pymc3/distributions/multivariate.py:2130: in dist
    if aesara.sparse._is_sparse(A):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

x = A

    def _is_sparse(x):
        """
    
        Returns
        -------
        boolean
            True iff x is a L{scipy.sparse.spmatrix} (and not a L{numpy.ndarray}).
    
        """
        if not isinstance(x, (scipy.sparse.spmatrix, np.ndarray, tuple, list)):
>           raise NotImplementedError(
                "this function should only be called on "
                "sparse.scipy.sparse.spmatrix or "
                "numpy.ndarray, not,",
                x,
            )
E           NotImplementedError: ('this function should only be called on sparse.scipy.sparse.spmatrix or numpy.ndarray, not,', A)

../.pyenv/versions/miniconda3-4.7.12/envs/pymc3/lib/python3.9/site-packages/aesara/sparse/type.py:18: NotImplementedError
=========================================================================================================== short test summary info ===========================================================================================================
FAILED pymc3/tests/test_distributions.py::TestMatchesScipy::test_icar[3] - NotImplementedError: ('this function should only be called on sparse.scipy.sparse.spmatrix or numpy.ndarray, not,', A)
FAILED pymc3/tests/test_distributions.py::TestMatchesScipy::test_icar[20] - NotImplementedError: ('this function should only be called on sparse.scipy.sparse.spmatrix or numpy.ndarray, not,', A)
FAILED pymc3/tests/test_distributions.py::TestMatchesScipy::test_icar[100] - NotImplementedError: ('this function should only be called on sparse.scipy.sparse.spmatrix or numpy.ndarray, not,', A)
FAILED pymc3/tests/test_distributions.py::TestMatchesScipy::test_icar[500] - NotImplementedError: ('this function should only be called on sparse.scipy.sparse.spmatrix or numpy.ndarray, not,', A)

EDIT: if I comment out the lines:

+        #if aesara.sparse._is_sparse(A):
+        #    A = aesara.sparse.dense_from_sparse(floatX(A))
+        #else:
+        #    A = at.as_tensor_variable(floatX(A))
+        A = at.as_tensor_variable(floatX(A))

the the tests run as expected. Is there a subtle bug in _is_sparse that cannot handle certain input?

EDIT2: I decided to use _is_sparse_variable instead and things work now.

@zoj613 zoj613 marked this pull request as ready for review July 20, 2021 16:19
@zoj613
Copy link
Contributor Author

zoj613 commented Jul 20, 2021

I think this is ready for a round of review @brandonwillard @fonnesbeck @ricardoV94

The generation of the adjacency matrix can probably be improved by a lot to prevent the weird failures in tests.

Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this needs some updates in order for the tests to pass and to get coverage for the logp implementation.

+ 0.5 * at.log(d).sum()
- 0.5 * tau * at.dot(value, at.dot(W, value))
)
return bound(out, at.math.abs_(at.sum(value)) < 1e-06)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this effectively being clipped for numerical reasons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the sum of the values have to be 0 so I set the precision to 6 digits.

@ricardoV94
Copy link
Member

Closing this as stale.

@zoj613 let us know if you want to pick this up at another time. Probably pymc-experimental would be a better target

@ricardoV94 ricardoV94 closed this Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants