Lower polynomial to mod_arith for add/sub/mul_scalar by ZenithalHourlyRate · Pull Request #995 · google/heir

ZenithalHourlyRate · 2024-09-27T22:31:00Z

This is a draft and not cleaned up yet (e.g. BUILD/Cmake), just demoing the concept of lowering polynomial to mod_arith. Some function like mod the polynomial (PolynomialToStandard::generateOpImplementations) is not included yet.

Related to #990 and #749

Should we add a separate pass for it (polynomial-to-mod-arith, lots of duplicate code) and gradually replace polynomial-to-standard or just placing it inside the polynomial-to-standard pass?

Extra note: with mod-arith-to-arith the result for the test case is elegant

  func.func @polysub(%arg0: tensor<4xi32>, %arg1: tensor<4xi32>) -> tensor<4xi32> {
    %c-1_i32 = arith.constant -1 : i32
    %cst = arith.constant dense<-1> : tensor<4xi32>
    %0 = arith.muli %arg1, %cst : tensor<4xi32>
    %cst_0 = arith.constant dense<7681> : tensor<4xi32>
    %1 = arith.remui %0, %cst_0 : tensor<4xi32>
    %2 = arith.addi %arg0, %1 : tensor<4xi32>
    %cst_1 = arith.constant dense<7681> : tensor<4xi32>
    %3 = arith.remui %2, %cst_1 : tensor<4xi32>
    return %3 : tensor<4xi32>
  }

ZenithalHourlyRate · 2024-09-28T19:20:51Z

Turned out the lowering above is incorrect. mod_arith requires operands to be signless, whereas polynomial coefficient is signed

SubAsAdd canonicalization rule explicitly asks it to be signed (though when mod is a power-of-two > coeffType, signed and unsigned are the same)
polynomial.FromTensorOp inteprets the input as signed, implicitly. We should define it.

polynomial-to-standard currently uses remsi, and range keeping is complex (check

heir/lib/Conversion/PolynomialToStandard/PolynomialToStandard.cpp

Lines 621 to 638 in 6fd0f1b

    
           // When our coefficient modulus (cmod) is not the same power of two 
        
           // corresponding to the integer coefficient type size, we can not rely on 
        
           // the natural overflow behaviour during the computation of acc = a[i] * 
        
           // b[j] + c[i+j]. If we compute acc mod cmod then we are able to ensure 
        
           // that we will not overflow the intermediate type sized integer. This is 
        
           // because a[i] * b[j] + c[i+j] < cmod^2 + cmod < maximum of intermediate 
        
           // type. Which lets us not have to compute the modulus after the 
        
           // multiplication step and the addition step. 
        
           // 
        
           // Further, when we truncate from the intermediate type back to the 
        
           // polynomial type, we can't rely on the truncation behaviour to ensure that 
        
           // the value congruent with respect to cmod. So we will have to manually 
        
           // convert by computing acc + cmod mod cmod. 
        
           // 
        
           // Eg: Let acc = -12. Then arith.remsi acc 7 = -5 : i6, and so, 
        
           // arith.trunci -5 : i6 -> i3 = 3. -5 is not congruent with 3 mod 7. So we 
        
           // will need to compute -5 + 7 mod 7 = 2, such that acc is in [0,7) before 
        
           // truncating.

)

I agree that any signed integer can be coefficient, for the semantic of polynomial dialect (e.g. -1 or a number inside [mod, 2^31-1)). However we should keep some invariant internally. AFAIK the currently lowering (like ConvertAdd, ConvertMul) tries to keep the invariant that coefficients for polynomial are in (-mod, +mod), however, ConvertFromTensor does not keep the invariant as user may give any input and it did not remsi it.

Or should we keep it in [-mod/2, +mod/2) (for L2/L-infinity norm/noise reason, commonly seen in FHE paper)

In the definition of the polynomial dialect there should be a canonical form of the polynomial, like it has been mod coeffModulus and PolyModulus, then we can have a stable output for ToTensor (actually if it is in (-mod, +mod), we have two possible output for one number) and we can have safely lower to mod_arith in some way.

I'm curious what other FHE libraries deal with this issue.

AlexanderViand-Intel · 2024-09-28T21:07:29Z

Should we add a separate pass for it (polynomial-to-mod-arith, lots of duplicate code) and gradually replace polynomial-to-standard or just placing it inside the polynomial-to-standard pass?

What we’ve done in the past is to update the existing pipeline “in-place”, so we’d (a) create a pass polynomial-to-modarith (b) create a modarith-to-standard pass, and finally (c) switch polynomial-to-standard from being a Pass to being a pipeline consisting of the two passes above.

ZenithalHourlyRate · 2024-09-29T19:20:55Z

Tried to keep the invariant inside FromTensor using mod_arith.reduce, leading to the following behavior if the modulus is large.

    %cst = arith.constant dense<2> : tensor<1024xi32>
    %0 = arith.extsi %cst : tensor<1024xi32> to tensor<1024xi33>
    // 2 ** 32 - 1
    %1 = mod_arith.reduce %0 {modulus = 4294967295 : i64} : tensor<1024xi33>
    %2 = arith.trunci %1 : tensor<1024xi33> to tensor<1024xi32>

ZenithalHourlyRate · 2024-10-02T00:54:35Z

https://mlir.llvm.org/docs/Dialects/PolynomialDialect/#ringattr

I can not handle the case of coefficient modulus larger than the underlying type (e.g. i32). It has two cases.

modulus is a power-of-two like 2**33 and 2 ** 64, which is the same as modulus = 0, then why should we specify a modulus like this. We do have examples inside heir like this, which is caused by the polynomial-to-standard implementation and I think it should be fixed.
modulus is another number like 2 ** 32 + 1 or a prime of 64 bits. This also has ill semantic as the underlying type can not fully represent all elements and the arithmetic is not what we want because two modulo operations happen (mod 2**32) and (mod p).

Currently the documentation upstream does not define any constraint on this and it does not have a verifier for it.

j2kun · 2024-10-02T01:30:09Z

I think one complication is that MLIR does not specify any overflow semantics by default. So omitting a modulus that equals 2^32 when the type is i32 could result in lowerings that treat overflow differently to how we want. I thought I could handle that by allowing the type of the modulus to differ from the underlying type of the coefficients, and then lowerings could check for these edge cases. But since I'm not working for another month (on baby leave), I won't have the time to dive deep to recommend a fix.

@j2kun

…or ringAttr (#111016) Currently the semantic of coefficientModulus is unclear and a lowering of it faces uncertainty, for example, google/heir#995 (comment) Also, it lacks a verifier which should conform to the definition in the document. This PR tries to further define the semantic of coefficientModulus and adds a verifier for it. Cc @j2kun for review and suggestions.

@j2kun

…or ringAttr (llvm#111016) Currently the semantic of coefficientModulus is unclear and a lowering of it faces uncertainty, for example, google/heir#995 (comment) Also, it lacks a verifier which should conform to the definition in the document. This PR tries to further define the semantic of coefficientModulus and adds a verifier for it. Cc @j2kun for review and suggestions.

ZenithalHourlyRate · 2024-10-10T17:07:47Z

I think the lowering for add/sub/mul_scalar, along with auxiliary op from_tensor/constant are self-contained enough for review. #990 can be fixed by this PR.

Each of mul, ntt, and intt is large enough for a separate PR, and including them all in this one would make the review process more burdensome.

Things need discussion:

behavior change of from_tensor/constant as the result tensor would get reduced to [0, cmod)
- this may makes consequent from_tensor -> to_tensor -> from_tensor -> to_tensor inefficient; maybe canoicalization can handle this
- or we use other op like from_tensor_raw or from_tensor_reduced
wierd semantic of the last case of these runner tests involving large prime (note that runner test use --heir-polynomial-to-llvm)
The canonicalization rule SubAsAdd mentioned in Fix upstream polynomial canonicalization rules #749 is now correct, though we also have lowering for poly.sub now; which way should we go.

AlexanderViand-Intel · 2024-10-10T17:12:46Z

Awesome, thanks! Lots of good questions in your last post, so let me pick the easy one to answer xD

The canonicalization rule SubAsAdd mentioned in Fix upstream polynomial canonicalization rules #749 is now correct, though we also have lowering for poly.sub now; which way should we go.

Since lots of libraries/HW targets have an explicit idea of polynomial subtraction, I think we can drop the upstream SubAsAdd pattern from the set of canonicalization patterns. If we don't want to have separate poly->(mod)arith lowerings for both add and sub, we can simply add the SubAsAdd pattern to the patternset of that lowering pass.

ZenithalHourlyRate · 2024-10-10T17:36:30Z

Seems that I need to put elementwise-to-affine before polynomial-to-mod-arith as mentioned in #769

AlexanderViand-Intel · 2024-10-10T17:38:55Z

Seems that I need to put elementwise-to-affine before polynomial-to-mod-arith as mentioned in #769

Yeah, I think that's a good idea. Not many things use the tensor-valued versions of polynomial, but the lowerings from BG/CKKS make use of them as it's very convenient there. We made a (undocument, oops) decision at some point to only support "scalar-valued" polynomial ops in polynomial-to-x lowerings, and the elementwise-to-affine pass bridges the gap.

ZenithalHourlyRate · 2024-10-21T12:55:38Z

Rebased to the current directory organization

AlexanderViand-Intel

Thanks for putting this together! I think the final output looks good, though I'm not a huge fan of how much "modular arithmetic" logic this lowering includes. For example, the pass should, imho, not care about things such as power-of-two moduli - worrying about how to efficiently realize modular arithmetic is exactly what ModArith was introduced for, so this logic should be moved to ModArithToArith.

However, I think working code beats no code, so I'd be in favor of merging this and creating an issue for refactoring this out. Would be interested to hear @inbelic 's opinion on this, too.

tests/Dialect/Polynomial/Conversions/heir_polynomial_to_llvm/to_mod_arith/lower_add.mlir

tools/heir-opt.cpp

lib/Dialect/Polynomial/Conversions/PolynomialToModArith/PolynomialToModArith.td