Skip to content

Comments

Lower polynomial to mod_arith for add/sub/mul_scalar#995

Closed
ZenithalHourlyRate wants to merge 1 commit intogoogle:mainfrom
ZenithalHourlyRate:polynomia-to-mod-arith-add
Closed

Lower polynomial to mod_arith for add/sub/mul_scalar#995
ZenithalHourlyRate wants to merge 1 commit intogoogle:mainfrom
ZenithalHourlyRate:polynomia-to-mod-arith-add

Conversation

@ZenithalHourlyRate
Copy link
Collaborator

This is a draft and not cleaned up yet (e.g. BUILD/Cmake), just demoing the concept of lowering polynomial to mod_arith. Some function like mod the polynomial (PolynomialToStandard::generateOpImplementations) is not included yet.

Related to #990 and #749

Should we add a separate pass for it (polynomial-to-mod-arith, lots of duplicate code) and gradually replace polynomial-to-standard or just placing it inside the polynomial-to-standard pass?

Extra note: with mod-arith-to-arith the result for the test case is elegant

  func.func @polysub(%arg0: tensor<4xi32>, %arg1: tensor<4xi32>) -> tensor<4xi32> {
    %c-1_i32 = arith.constant -1 : i32
    %cst = arith.constant dense<-1> : tensor<4xi32>
    %0 = arith.muli %arg1, %cst : tensor<4xi32>
    %cst_0 = arith.constant dense<7681> : tensor<4xi32>
    %1 = arith.remui %0, %cst_0 : tensor<4xi32>
    %2 = arith.addi %arg0, %1 : tensor<4xi32>
    %cst_1 = arith.constant dense<7681> : tensor<4xi32>
    %3 = arith.remui %2, %cst_1 : tensor<4xi32>
    return %3 : tensor<4xi32>
  }

@ZenithalHourlyRate
Copy link
Collaborator Author

Turned out the lowering above is incorrect. mod_arith requires operands to be signless, whereas polynomial coefficient is signed

  • SubAsAdd canonicalization rule explicitly asks it to be signed (though when mod is a power-of-two > coeffType, signed and unsigned are the same)
  • polynomial.FromTensorOp inteprets the input as signed, implicitly. We should define it.
  • polynomial-to-standard currently uses remsi, and range keeping is complex (check
    // When our coefficient modulus (cmod) is not the same power of two
    // corresponding to the integer coefficient type size, we can not rely on
    // the natural overflow behaviour during the computation of acc = a[i] *
    // b[j] + c[i+j]. If we compute acc mod cmod then we are able to ensure
    // that we will not overflow the intermediate type sized integer. This is
    // because a[i] * b[j] + c[i+j] < cmod^2 + cmod < maximum of intermediate
    // type. Which lets us not have to compute the modulus after the
    // multiplication step and the addition step.
    //
    // Further, when we truncate from the intermediate type back to the
    // polynomial type, we can't rely on the truncation behaviour to ensure that
    // the value congruent with respect to cmod. So we will have to manually
    // convert by computing acc + cmod mod cmod.
    //
    // Eg: Let acc = -12. Then arith.remsi acc 7 = -5 : i6, and so,
    // arith.trunci -5 : i6 -> i3 = 3. -5 is not congruent with 3 mod 7. So we
    // will need to compute -5 + 7 mod 7 = 2, such that acc is in [0,7) before
    // truncating.
    )

I agree that any signed integer can be coefficient, for the semantic of polynomial dialect (e.g. -1 or a number inside [mod, 2^31-1)). However we should keep some invariant internally. AFAIK the currently lowering (like ConvertAdd, ConvertMul) tries to keep the invariant that coefficients for polynomial are in (-mod, +mod), however, ConvertFromTensor does not keep the invariant as user may give any input and it did not remsi it.

Or should we keep it in [-mod/2, +mod/2) (for L2/L-infinity norm/noise reason, commonly seen in FHE paper)

In the definition of the polynomial dialect there should be a canonical form of the polynomial, like it has been mod coeffModulus and PolyModulus, then we can have a stable output for ToTensor (actually if it is in (-mod, +mod), we have two possible output for one number) and we can have safely lower to mod_arith in some way.

I'm curious what other FHE libraries deal with this issue.

@AlexanderViand-Intel
Copy link

AlexanderViand-Intel commented Sep 28, 2024

Should we add a separate pass for it (polynomial-to-mod-arith, lots of duplicate code) and gradually replace polynomial-to-standard or just placing it inside the polynomial-to-standard pass?

What we’ve done in the past is to update the existing pipeline “in-place”, so we’d (a) create a pass polynomial-to-modarith (b) create a modarith-to-standard pass, and finally (c) switch polynomial-to-standard from being a Pass to being a pipeline consisting of the two passes above.

@ZenithalHourlyRate
Copy link
Collaborator Author

Tried to keep the invariant inside FromTensor using mod_arith.reduce, leading to the following behavior if the modulus is large.

    %cst = arith.constant dense<2> : tensor<1024xi32>
    %0 = arith.extsi %cst : tensor<1024xi32> to tensor<1024xi33>
    // 2 ** 32 - 1
    %1 = mod_arith.reduce %0 {modulus = 4294967295 : i64} : tensor<1024xi33>
    %2 = arith.trunci %1 : tensor<1024xi33> to tensor<1024xi32>

@ZenithalHourlyRate
Copy link
Collaborator Author

https://mlir.llvm.org/docs/Dialects/PolynomialDialect/#ringattr

I can not handle the case of coefficient modulus larger than the underlying type (e.g. i32). It has two cases.

  • modulus is a power-of-two like 2**33 and 2 ** 64, which is the same as modulus = 0, then why should we specify a modulus like this. We do have examples inside heir like this, which is caused by the polynomial-to-standard implementation and I think it should be fixed.
  • modulus is another number like 2 ** 32 + 1 or a prime of 64 bits. This also has ill semantic as the underlying type can not fully represent all elements and the arithmetic is not what we want because two modulo operations happen (mod 2**32) and (mod p).

Currently the documentation upstream does not define any constraint on this and it does not have a verifier for it.

@j2kun
Copy link
Collaborator

j2kun commented Oct 2, 2024

I think one complication is that MLIR does not specify any overflow semantics by default. So omitting a modulus that equals 2^32 when the type is i32 could result in lowerings that treat overflow differently to how we want. I thought I could handle that by allowing the type of the modulus to differ from the underlying type of the coefficients, and then lowerings could check for these edge cases. But since I'm not working for another month (on baby leave), I won't have the time to dive deep to recommend a fix.

j2kun pushed a commit to llvm/llvm-project that referenced this pull request Oct 5, 2024
…or ringAttr (#111016)

Currently the semantic of coefficientModulus is unclear and a lowering
of it faces uncertainty, for example,
google/heir#995 (comment)

Also, it lacks a verifier which should conform to the definition in the
document.

This PR tries to further define the semantic of coefficientModulus and
adds a verifier for it.

Cc @j2kun for review and suggestions.
Kyvangka1610 pushed a commit to Kyvangka1610/llvm-project that referenced this pull request Oct 5, 2024
…or ringAttr (llvm#111016)

Currently the semantic of coefficientModulus is unclear and a lowering
of it faces uncertainty, for example,
google/heir#995 (comment)

Also, it lacks a verifier which should conform to the definition in the
document.

This PR tries to further define the semantic of coefficientModulus and
adds a verifier for it.

Cc @j2kun for review and suggestions.
@ZenithalHourlyRate ZenithalHourlyRate force-pushed the polynomia-to-mod-arith-add branch 2 times, most recently from 7cab15e to 2efcd5d Compare October 10, 2024 16:47
@ZenithalHourlyRate ZenithalHourlyRate changed the title [WIP] Implement polynomial-to-mod-arith Lower polynomial to mod_arith for add/sub/mul_scalar Oct 10, 2024
@ZenithalHourlyRate ZenithalHourlyRate marked this pull request as ready for review October 10, 2024 16:48
@ZenithalHourlyRate ZenithalHourlyRate force-pushed the polynomia-to-mod-arith-add branch from 2efcd5d to b9f83f2 Compare October 10, 2024 16:51
@ZenithalHourlyRate
Copy link
Collaborator Author

I think the lowering for add/sub/mul_scalar, along with auxiliary op from_tensor/constant are self-contained enough for review. #990 can be fixed by this PR.

Each of mul, ntt, and intt is large enough for a separate PR, and including them all in this one would make the review process more burdensome.

Things need discussion:

  • behavior change of from_tensor/constant as the result tensor would get reduced to [0, cmod)
    • this may makes consequent from_tensor -> to_tensor -> from_tensor -> to_tensor inefficient; maybe canoicalization can handle this
    • or we use other op like from_tensor_raw or from_tensor_reduced
  • wierd semantic of the last case of these runner tests involving large prime (note that runner test use --heir-polynomial-to-llvm)
  • The canonicalization rule SubAsAdd mentioned in Fix upstream polynomial canonicalization rules #749 is now correct, though we also have lowering for poly.sub now; which way should we go.

@AlexanderViand-Intel
Copy link

Awesome, thanks! Lots of good questions in your last post, so let me pick the easy one to answer xD

Since lots of libraries/HW targets have an explicit idea of polynomial subtraction, I think we can drop the upstream SubAsAdd pattern from the set of canonicalization patterns. If we don't want to have separate poly->(mod)arith lowerings for both add and sub, we can simply add the SubAsAdd pattern to the patternset of that lowering pass.

@ZenithalHourlyRate
Copy link
Collaborator Author

Seems that I need to put elementwise-to-affine before polynomial-to-mod-arith as mentioned in #769

@AlexanderViand-Intel
Copy link

AlexanderViand-Intel commented Oct 10, 2024

Seems that I need to put elementwise-to-affine before polynomial-to-mod-arith as mentioned in #769

Yeah, I think that's a good idea. Not many things use the tensor-valued versions of polynomial, but the lowerings from BG/CKKS make use of them as it's very convenient there. We made a (undocument, oops) decision at some point to only support "scalar-valued" polynomial ops in polynomial-to-x lowerings, and the elementwise-to-affine pass bridges the gap.

@ZenithalHourlyRate ZenithalHourlyRate force-pushed the polynomia-to-mod-arith-add branch from b9f83f2 to 04f87f9 Compare October 10, 2024 17:40
@ZenithalHourlyRate ZenithalHourlyRate force-pushed the polynomia-to-mod-arith-add branch from 04f87f9 to a692f80 Compare October 21, 2024 12:41
@ZenithalHourlyRate
Copy link
Collaborator Author

Rebased to the current directory organization

Copy link

@AlexanderViand-Intel AlexanderViand-Intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together! I think the final output looks good, though I'm not a huge fan of how much "modular arithmetic" logic this lowering includes. For example, the pass should, imho, not care about things such as power-of-two moduli - worrying about how to efficiently realize modular arithmetic is exactly what ModArith was introduced for, so this logic should be moved to ModArithToArith.

However, I think working code beats no code, so I'd be in favor of merging this and creating an issue for refactoring this out. Would be interested to hear @inbelic 's opinion on this, too.

@ZenithalHourlyRate ZenithalHourlyRate force-pushed the polynomia-to-mod-arith-add branch from a692f80 to 384af07 Compare October 22, 2024 03:35
@inbelic
Copy link
Contributor

inbelic commented Oct 22, 2024

I agree, I think we should be able to postpone any of the modulus based logic to a later lowering. I will set aside time this evening to do a full review.

Copy link
Contributor

@inbelic inbelic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing this and lots of great work here!
Mostly nits and preferential comments.

Value modArithReduceOp(ConvertCommon<Op> &c, ImplicitLocOpBuilder &b,
ConversionPatternRewriter &rewriter, Value v,
int64_t shape = 0) {
// why not c.resultTensorType here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this answered?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased this comment so it is less confusing now.

};

template <typename Op, typename ModArithOp, typename ArithIOp,
typename ArithFOp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not up to date about adding different coefficient types for polynomials. Can we add tests for floating point?

Copy link
Collaborator Author

@ZenithalHourlyRate ZenithalHourlyRate Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add it in the next round of push.

c.resultElementWidth > c.coefficientModulusWidth) {
v = b.create<mod_arith::ReduceOp>(resultType, v, c.coefficientModulus);
}
// else float or natural power of two modulus larger than underlying type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we need the op if it is a float? Should we really reduce it but can't because it doesn't support floats at the moment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polynomial dialect does not have definition for coefficientModulus when coeffcientType is a float

if the coefficient type is integral, whose coefficients are taken modulo some statically known modulus (coefficientModulus).

}
}

// polynomial type related
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use some of the following variables outside of the above function. Do we anticipate using them in the future? Otherwise I think it would best to keep them local.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for input* (e.g. inputTensor, inputElementWidth) it is indeed FromTensor specific and now moved to ConvertFromTensor. Some other unused variable are also eliminated, and I think the current variables are essential and may be used in the future.

@ZenithalHourlyRate ZenithalHourlyRate force-pushed the polynomia-to-mod-arith-add branch from 384af07 to 9004489 Compare October 23, 2024 16:11
Copy link
Collaborator Author

@ZenithalHourlyRate ZenithalHourlyRate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking about moving some logic inside mod_arith. I think the handling of modArithReduceOp needs some change to mod_arith.reduce.

Note this will interpret x as a signed integer. It is required the bitwidth of q is smaller than that of x. The smaller requirement makes the lowering pass needs to manually extsi the input. Should we allow q (intepreted as unsigned) whose bitwidth is the same as x (signed). This semantic is wierd.

}
}

// polynomial type related
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for input* (e.g. inputTensor, inputElementWidth) it is indeed FromTensor specific and now moved to ConvertFromTensor. Some other unused variable are also eliminated, and I think the current variables are essential and may be used in the future.

c.resultElementWidth > c.coefficientModulusWidth) {
v = b.create<mod_arith::ReduceOp>(resultType, v, c.coefficientModulus);
}
// else float or natural power of two modulus larger than underlying type.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polynomial dialect does not have definition for coefficientModulus when coeffcientType is a float

if the coefficient type is integral, whose coefficients are taken modulo some statically known modulus (coefficientModulus).

typename ArithFOp>
Value modArithBinaryOp(ConvertCommon<Op> &c, ImplicitLocOpBuilder &b, Value lhs,
Value rhs) {
if (c.coefficientModulusIsNotNatural) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now migrated the logic of natural modulus handling inside mod_arith. I agree with the idea of handling all the integer stuff in mod_arith. But I think handling float point inside mod_arith seems a violation of its name.

Also, the PolynomialToModArith code must contain some float point specific thing, like ConvertConstant where FloatAttr is explicitly asked.

};

template <typename Op, typename ModArithOp, typename ArithIOp,
typename ArithFOp>
Copy link
Collaborator Author

@ZenithalHourlyRate ZenithalHourlyRate Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add it in the next round of push.

Value modArithReduceOp(ConvertCommon<Op> &c, ImplicitLocOpBuilder &b,
ConversionPatternRewriter &rewriter, Value v,
int64_t shape = 0) {
// why not c.resultTensorType here
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased this comment so it is less confusing now.

@AlexanderViand-Intel
Copy link

Talking about moving some logic inside mod_arith. I think the handling of modArithReduceOp needs some change to mod_arith.reduce.

Note this will interpret x as a signed integer. It is required the bitwidth of q is smaller than that of x. The smaller requirement makes the lowering pass needs to manually extsi the input. Should we allow q (intepreted as unsigned) whose bitwidth is the same as x (signed). This semantic is wierd.

I think it'd be a good idea to have a general mod_arith.reduce operation that has absolutely no preconditions on the inputs and will fix things as needed as part of the lowering. So I'd be happy with moving the extsi to the -to-arith lowering of mod_arith.reduce

@ZenithalHourlyRate ZenithalHourlyRate force-pushed the polynomia-to-mod-arith-add branch from 9004489 to c141b07 Compare October 31, 2024 14:33
@ZenithalHourlyRate
Copy link
Collaborator Author

I think it'd be a good idea to have a general mod_arith.reduce operation that has absolutely no preconditions on the inputs and will fix things as needed as part of the lowering. So I'd be happy with moving the extsi to the -to-arith lowering of mod_arith.reduce

Now mod_arith.reduce can handle modulus with the same bitwidth as underlying type and the natural modulus. Corresponding modArithReduceOp has been removed.

@j2kun
Copy link
Collaborator

j2kun commented Nov 21, 2024

Oh no! I did not realize this PR was in flight when I was working on migrating the polynomial-to-standard pass to use mod_arith types (which is now almost done, although I think I still have a lot of cleanup work once tests are passing). Is there a particular part of this PR that you think would be worth incorporating into that work?

@ZenithalHourlyRate
Copy link
Collaborator Author

Oh no! I did not realize this PR was in flight when I was working on migrating the polynomial-to-standard pass to use mod_arith types (which is now almost done, although I think I still have a lot of cleanup work once tests are passing). Is there a particular part of this PR that you think would be worth incorporating into that work?

I think most of them have been covered by your PR. One thing worth noting is that I used a CommonConversionInfo as a helper to simplify type manipulation, such that each ConvertXXX class can care less about them and focus on constructing lowered operations itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

polynomial-to-standard: error on integer width for poly.add when ring coefficientModulus width is less than coefficient width

5 participants