Fix Enzyme sparse matrix sparsity pattern corruption (issue #835) #860

ChrisRackauckas-Claude · 2025-12-20T13:48:57Z

Summary

Fixes the issue where Enzyme AD with sparse matrices corrupts the primal matrix's sparsity pattern (rowval, colptr)
Adds sparse-safe helper functions that operate directly on nzval arrays instead of using broadcast operations that can change sparsity
For dense matrices, falls back to standard broadcast operations

Root Cause

Enzyme.make_zero creates shadow sparse matrices that share the structural arrays (rowval, colptr) with the primal matrix. When the reverse rule executes broadcast operations like dA .-= z * transpose(y), these can change the sparsity pattern of the shadow matrix, and because the structural arrays are shared, this inadvertently corrupts the primal matrix's structure.

Evidence from investigation:

Primal rowval === shadow rowval? true
Primal colptr === shadow colptr? true

Solution

Add sparse-safe helper functions that dispatch on AbstractSparseMatrix:

_safe_add!(dst, src): For sparse matrices, adds values via dst.nzval .+= src.nzval
_safe_zero!(A): For sparse matrices, zeros via fill!(A.nzval, 0)
_sparse_outer_sub!(dA, z, y): For sparse matrices, only accumulates gradients into existing non-zero positions using a direct loop over the CSC structure

These preserve the sparsity pattern by operating only on the nzval array.

Test plan

Verify sparsity pattern is preserved after AD (rowval, colptr, nnz unchanged)
Verify matrix can still be displayed without AssertionError: _goodbuffers(S) crash
Verify dense matrix Enzyme tests still pass (note: existing tests have pre-existing failures unrelated to this change)

Fixes #835

🤖 Generated with Claude Code

ChrisRackauckas-Claude · 2025-12-21T01:45:43Z

Updated based on feedback

Addressed two concerns:

1. GPU Safety

Changed the implementation from nested loops with scalar indexing to vectorized operations:

# Build column indices vector from colptr
col_indices = _expand_colptr_to_col_indices(colptr, n_cols, nnz_count)

# Vectorized update instead of nested loop
vals .-= z[rows] .* y[col_indices]

The _expand_colptr_to_col_indices helper still uses a loop internally (dispatched on Vector{Ti}), but this is CPU-only since SparseMatrixCSC itself is CPU-only. GPU sparse matrices (like CuSparseMatrixCSC) are different types and would need handling in their respective extensions (LinearSolveCUDAExt, etc.).

2. Non-zeros concern

The dense fallback dA .-= z * transpose(y) only applies to truly dense AbstractArray types. Sparse matrices dispatch to the SparseMatrixCSC method which only updates existing positions in the sparsity pattern. This is mathematically correct for sparse AD - gradients are only meaningful at positions where the matrix can be modified.

ChrisRackauckas-Claude · 2025-12-21T02:19:28Z

Updated implementation

Based on feedback, the implementation has been revised to use a non-allocating loop for sparse matrix operations:

function _sparse_outer_sub!(dA::SparseMatrixCSC, z::AbstractVector, y::AbstractVector)
    rows = rowvals(dA)
    vals = nonzeros(dA)
    colptr = getcolptr(dA)

    # Non-allocating loop over CSC structure
    # This is efficient and cache-friendly (column-major order)
    @inbounds for col in 1:size(dA, 2)
        y_col = y[col]
        for idx in colptr[col]:(colptr[col + 1] - 1)
            vals[idx] -= z[rows[idx]] * y_col
        end
    end

    return dA
end

Key points:

Zero allocations: The loop operates directly on the existing arrays
Cache-friendly: Column-major order iteration matches CSC storage
CPU-only by design: SparseMatrixCSC is a CPU-only type; GPU sparse matrices (like CuSparseMatrixCSC) have their own types and would need handling in their respective extensions (CUDA, AMDGPU, etc.)

The SparseArrays package has been added as a dependency for the Enzyme extension.

Note: The dense matrix Enzyme tests are failing on main as a pre-existing issue (unrelated to this PR).

The issue: Enzyme.make_zero shares structural arrays (rowval, colptr) between primal and shadow sparse matrices. Broadcast operations like `dA .-= z * y'` can change the sparsity pattern, corrupting both shadow AND primal matrices. The fix: Add sparse-safe helper functions that operate directly on the nonzeros array to preserve the sparsity pattern: - _safe_add!: Add arrays preserving sparsity pattern - _safe_zero!: Zero arrays preserving sparsity pattern - _sparse_outer_sub!: Compute outer product subtraction preserving sparsity pattern Supports both CPU and GPU sparse matrices: - SparseMatrixCSC (CPU): Uses non-allocating loop for efficiency - AbstractSparseMatrixCSC (GPU): Uses vectorized operations with O(nnz) allocation for GPU-compatible broadcasting Also added SparseArrays as a dependency for the LinearSolveEnzymeExt extension. Note: The dense matrix Enzyme tests are failing on main (pre-existing issue). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

ChrisRackauckas-Claude · 2025-12-21T18:26:25Z

Updated: Added GPU sparse matrix support

Now supports both CPU and GPU sparse matrices:

CPU sparse (`SparseMatrixCSC`)

Uses non-allocating loop:

function _sparse_outer_sub!(dA::SparseMatrixCSC, z::AbstractVector, y::AbstractVector)
    rows = rowvals(dA)
    vals = nonzeros(dA)
    colptr = getcolptr(dA)

    @inbounds for col in 1:size(dA, 2)
        y_col = y[col]
        for idx in colptr[col]:(colptr[col + 1] - 1)
            vals[idx] -= z[rows[idx]] * y_col
        end
    end
    return dA
end

GPU sparse (`AbstractSparseMatrixCSC` - CuSparseMatrixCSC, ROCSparseMatrixCSC, etc.)

Uses vectorized operations with O(nnz) allocation:

function _sparse_outer_sub!(dA::AbstractSparseMatrixCSC, z::AbstractVector, y::AbstractVector)
    rows = rowvals(dA)
    vals = nonzeros(dA)
    colptr = getcolptr(dA)
    n_cols = size(dA, 2)

    # Build column indices (copies colptr to CPU, builds indices, copies back to GPU)
    col_indices = _expand_colptr_to_col_indices(rows, colptr, n_cols)

    # Vectorized update - works on GPU via broadcasting
    vals .-= z[rows] .* y[col_indices]
    return dA
end

The GPU path:

Copies colptr to CPU (one GPU→CPU transfer)
Builds column indices on CPU (fast)
Copies result to GPU (one CPU→GPU transfer)
Uses vectorized broadcasting (GPU-native)

This is 2 memory transfers total, which is acceptable for gradient computation.

ChrisRackauckas-Claude mentioned this pull request Dec 20, 2025

Enzyme + SparseArrays + LinearSolve = Not Happy #835

Open

ChrisRackauckas-Claude force-pushed the fix-enzyme-sparse-sparsity-pattern-corruption branch from e2cf5a5 to 98949fb Compare December 21, 2025 01:45

ChrisRackauckas-Claude force-pushed the fix-enzyme-sparse-sparsity-pattern-corruption branch from 98949fb to 223cc46 Compare December 21, 2025 02:19

ChrisRackauckas-Claude force-pushed the fix-enzyme-sparse-sparsity-pattern-corruption branch from 223cc46 to eb1d270 Compare December 21, 2025 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix Enzyme sparse matrix sparsity pattern corruption (issue #835) #860

Fix Enzyme sparse matrix sparsity pattern corruption (issue #835) #860

Uh oh!

ChrisRackauckas-Claude commented Dec 20, 2025

Uh oh!

ChrisRackauckas-Claude commented Dec 21, 2025

Uh oh!

ChrisRackauckas-Claude commented Dec 21, 2025

Uh oh!

ChrisRackauckas-Claude commented Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix Enzyme sparse matrix sparsity pattern corruption (issue #835) #860

Are you sure you want to change the base?

Fix Enzyme sparse matrix sparsity pattern corruption (issue #835) #860

Uh oh!

Conversation

ChrisRackauckas-Claude commented Dec 20, 2025

Summary

Root Cause

Solution

Test plan

Uh oh!

ChrisRackauckas-Claude commented Dec 21, 2025

Updated based on feedback

1. GPU Safety

2. Non-zeros concern

Uh oh!

ChrisRackauckas-Claude commented Dec 21, 2025

Updated implementation

Uh oh!

ChrisRackauckas-Claude commented Dec 21, 2025

Updated: Added GPU sparse matrix support

CPU sparse (SparseMatrixCSC)

GPU sparse (AbstractSparseMatrixCSC - CuSparseMatrixCSC, ROCSparseMatrixCSC, etc.)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CPU sparse (`SparseMatrixCSC`)

GPU sparse (`AbstractSparseMatrixCSC` - CuSparseMatrixCSC, ROCSparseMatrixCSC, etc.)