Use the extened euclidian algorithm to compute scalar inverse in variable time #730

deadalnix · 2020-03-29T01:54:36Z

This is faster than the multiple exponentiation method that is used now. It remains way slower than libgmp, so probably not a full replacement, but it provides a nice speedup when libgmp is not available.

These are the benchmark time I measure.

scalar_inverse: min 14.4us / avg 16.4us / max 20.0us
scalar_inverse_var: min 11.5us / avg 11.6us / max 11.6us

While there is a fair bag of trick that is used to make this fast, there is still a lot of untapped potential. For instance, doing numerous additions before reducing x0 and x1. While more could be done, this is already an improvement, so it is time for a PR.

elichai · 2020-03-29T07:20:20Z

This seem like a simpler approach than #290
even though it's slower and doesn't have the jacobi implementation, but the bignum approach is quite hard to implement correctly.

(haven't reviewed the actual logic here)

deadalnix · 2020-03-29T15:14:34Z

I wasn't aware of #290 , this is unfortunate. A few comments.

The approach is slightly different. This uses an approach with no divisions - technically, divisions by powers of 2 only, but these can be performed as simple shifts and additions while #290 tries to limit the number of bigint divisions using lehmer's optimization. I do not know which is best and it is hard to compare because the implementations details are quite different.

I was considering using a 5x64bits or 9x32bits big int for x0, x1 and negx going forward, just like #290. A lot of time is spent keeping them between 0 and p, but we could do it only once at the end. Everything else do not really needs big int. This is the next logical step here.

On a more high level note, I think proceeding is small steps is preferable. It is easier to review, avoids rebase hell and delivers value to users now. In addition, it allows for better fan out down the road, as it is possible to work and experiment with the various optimisations and variations discussed independently on top of a common base.

Implementing jacobi, further optimizing this and making it work with field element in addition to scalars is on my todo list, but I think it is better to agree on some approach before investing that much effort into this.

real-or-random · 2020-03-30T18:11:00Z

Let me just note for now that I haven't had a closer look so far but this looks very interesting. Is this the algorithm described in pseudocode here? https://crypto.stackexchange.com/a/54623/12020
A pseudocode variant will make it easier to review this.

deadalnix · 2020-03-30T18:51:03Z

Not quite, but it is similar. The way i approached is starting with algorithm 2.23 from "Guide to Elliptic Curve Cryptography" by Hankerson, Menezes and Vanstone for which I added a few optimizations:

a can only be even early on, and at that time, x1 is always zero, so no need to double it.
when b is even, they accumulate more factors 2 in x0 and x1, but I eliminate them right away - it is worth it due to the optimized division by a power of 2 routine.

Pseudocode is as follow:

inverse a:
  b = p
  x0 = 1
  x1 = 0
  k = 0

  while a is even:
    a /= 2
    k++

  while a > 1:
    b -= a
    x1 -= x0
    while b is even:
      b /= 2
      x1 = x1 / 2 if x1 is even else (x1 + p) / 2
    if a > b:
      swap a, b
      swap x0, x1

  repeat k times:
    x0 = x0 / 2 if x0 is even else (x0 + p) / 2

  return x0

This is the hist of it, but there are various optimizations done along the way.

Not adding it to scalar.h because it is not meant to be a public API. Scalars do not have ordering per se, but this is meant to be able to implement the extended euclidian algorithm in order to compute inverse in variable time effisciently.

…scalars in variable time. This is faster than the exponentiation method. It is still significantly slower than GMP, so we keep that option as well.

deadalnix · 2020-04-15T17:05:37Z

I updated secp256k1_scalar_pow2_div to add zero instea of skipping the fixup with a branch as it turns out it is faster.

mratsim · 2020-04-15T17:32:17Z

FYI, EEA can be made constant-time with very reasonable speed (for a constant-time impl, it would still be an order of magnitude slower than variable time GCD like in GMP) following the algorithm 5 from paper: https://link.springer.com/content/pdf/10.1007%2F978-3-642-40588-4_10.pdf

Implementation in C are available in GMP (mirror) and Nettle, in C++ in Botan and in Nim in my own library Constantine

There is a lengthy discussion on constant-time modular inverse in Botan repo: randombit/botan#1479

In particular there is discussion on a recent paper by Bernstein and Yang that claims a very fast constant time GCD and inversion (https://eprint.iacr.org/2019/266).

deadalnix · 2020-04-16T01:13:51Z

That's interesting, but considering the benchmark I have, I highly doubt this can be made faster that the exponentiation technique that is used today.

mratsim · 2020-04-16T09:18:48Z

The Bernstein-Yang paper claims that their GCD-based inversion is faster than using Little Fermat on Curve25519. Similarly, the Möller algorithm for inversion is faster than the addition-chains from secp256k1 in my own benchmark though I use a generic Montgomery mul/square algorithm and the difference is about 20% so faster/specialized mul/square will probably change the balance). Anyway if there is a dedicated thread on constant-time inversion I would be happy to discuss there instead of in this variable time EEA thread.

roconnor-blockstream · 2020-04-21T15:29:22Z

I've made a PR with some suggested amendments @ deadalnix#1

roconnor-blockstream@2446b1e replaces the scalar_complement and scalar_binadd with a scalar_sub function. I've been told that my scalar_sub function has managed to coax the compiler into emitting a SBB (subtract with borrow) instruction which is very good. I haven't verified that fact myself yet. I believe this substitution is an objective improvement.

I suppose there is an argument over which style is better, using subtraction or addition with complement as is done in scalar_negate. My opinion is that subtraction is marginally better and arguably the scalar_negate should be using this subtraction technique.

roconnor-blockstream@13e07ab revises the main secp256k1_scalar_eea_inverse a little. My main goal there is to remove the use of goto. My reasons are a little selfish. The Verifiable C project cannot reason about C code with gotos, and I would like to be able to do a formal verification of this code at some point in the future. That said, to date I've only done a very limited about of verification of libsecp256k1 using Verifiable C, only enough to evaluate the tool. Otherwise this would be the only use of goto in the entire codebase. I think this revision to secp256k1_scalar_eea_inverse is a subjective improvement.

That all said, it would be a huge improvement to generalize this so that it can be used to compute field inversions in addition to scalar inversions. ECDSA verification requires a single scalar inversion, and Schnorr verification requires no scalar inversions at all, whereas field inversion occurs in a moderate number of places during verification.

I think we can generalize this by building a static table of "remainders", as the lookup table found in secp256k1_scalar_pow2_div, for both the field order and group order, and pass a pointer to required table into a generalized eea_inverse function. For field inversion, we can compact the field value into the scalar type (or perhaps better, a new equivalent 256-bit type), perform the eea inversion, and copy the result back into the field limb format.

Let me know if I've made any errors.

deadalnix · 2020-04-21T16:23:10Z

Awesome!

I'm not sure if it is best to comment here or in your PR, I'll do it here and we can move the discussion there if need be.

I was working myself on the next iteration of this thing and I got some more nice speedup - mostly by avoiding to reduce at every steps. Unfortunately, we stepped on each others toe a bit, but that's okay, there are no way you'd be able to know.

You move some variable declarations to have smaller scope. This is awesome. I was uncertain about the C89 rules and don't wanted to fight the compiler over this so I went extra large.
I'm not sure about the goto removal. I get it, goto is frowned upon, and for the most part for good reasons. But once in a while comes this situation where a goto really is better and I think we are in this case. This is because the goto effectively skips a good chunk of the first loop iteration - and in fact it can skip more. If you have a solution to achieve this without goto, then I'm taking, but it is unclear to me if this is possible at all.
Your implementation of scalar sub should be called binsub instead, because it skips the modulo reduction. It should also return the leftover so that the caller can do something with it. If nothing else, check that expectations about overflow are maintained.
Please do not remove the binadd operation. There are optimizations of secp256k1_scalar_pow2_div that can greatly benefit from it. It is also useful as a building block to avoid reduction at every step. I have code for all this, but not in a shape I think is good enough for review ATM. I can upload it anyways if you want to check.
I expect that there won't be any major roadblock to generalize this to field element and I intend to do this if there is interest. I made the choice of going for scalar only so that I can focus on extracting all the juice from there - there is a lot more to extract. Doing everything at once would make review more difficult, as well as iterating on various ways to do things to see which is faster slower, both of which are undesirable initially.

roconnor-blockstream · 2020-04-21T16:54:58Z

Regarding variable scoping. I wasn't sure either. Jonas suggested to me that it was okay to narrow the scope, so I guess we should do it unless we hear an objection.

Regarding goto: I believe my revision does no more work than your goto version. Roughly speaking the call secp256k1_scalar_negate(&v, a); was eliminated in favour of entering the loop from the top an running secp256k1_scalar_sub which does more or less the identical amount of work.

In light of your overall comments, I suggest closing my PR against your repo and I can revisit it once you have some new code. Feel free to incorporate any of my suggestions that you agree with into your next update. I have a couple of other, more questionable tricks in there, such as using *r as scratch space.

roconnor-blockstream · 2021-03-24T12:58:30Z

I believe this PR is now obsolete and can be closed since https://github.com/bitcoin-core/secp256k1/pull/831/files has now been merged.

deadalnix · 2021-03-24T15:15:57Z

Yes. Closing.

deadalnix force-pushed the scalarinversevar branch 2 times, most recently from 966f173 to be1ce35 Compare March 29, 2020 02:36

deadalnix force-pushed the scalarinversevar branch from be1ce35 to cc651da Compare April 15, 2020 14:06

deadalnix added 4 commits April 15, 2020 18:58

Add the capability to divide scalars by a power of 2.

b0ae4ed

Move secp256k1_scalar_is_even to make things clearer

9055761

Implement the extended euclidian algorithm to compute the inverse of …

0a4741a

…scalars in variable time. This is faster than the exponentiation method. It is still significantly slower than GMP, so we keep that option as well.

deadalnix force-pushed the scalarinversevar branch from cc651da to 0a4741a Compare April 15, 2020 17:04

roconnor-blockstream mentioned this pull request Apr 21, 2020

Scalarinversevar deadalnix/secp256k1#1

Closed

real-or-random mentioned this pull request Jul 15, 2020

WIP: "safegcd" field and scalar inversion #767

Closed

deadalnix closed this Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use the extened euclidian algorithm to compute scalar inverse in variable time #730

Use the extened euclidian algorithm to compute scalar inverse in variable time #730

Uh oh!

deadalnix commented Mar 29, 2020

Uh oh!

elichai commented Mar 29, 2020 •

edited

Loading

Uh oh!

deadalnix commented Mar 29, 2020

Uh oh!

real-or-random commented Mar 30, 2020

Uh oh!

deadalnix commented Mar 30, 2020 •

edited

Loading

Uh oh!

deadalnix commented Apr 15, 2020

Uh oh!

mratsim commented Apr 15, 2020 •

edited

Loading

Uh oh!

deadalnix commented Apr 16, 2020

Uh oh!

mratsim commented Apr 16, 2020

Uh oh!

roconnor-blockstream commented Apr 21, 2020 •

edited

Loading

Uh oh!

deadalnix commented Apr 21, 2020

Uh oh!

roconnor-blockstream commented Apr 21, 2020 •

edited

Loading

Uh oh!

roconnor-blockstream commented Mar 24, 2021

Uh oh!

deadalnix commented Mar 24, 2021

Uh oh!

Uh oh!

Use the extened euclidian algorithm to compute scalar inverse in variable time #730

Use the extened euclidian algorithm to compute scalar inverse in variable time #730

Uh oh!

Conversation

deadalnix commented Mar 29, 2020

Uh oh!

elichai commented Mar 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deadalnix commented Mar 29, 2020

Uh oh!

real-or-random commented Mar 30, 2020

Uh oh!

deadalnix commented Mar 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deadalnix commented Apr 15, 2020

Uh oh!

mratsim commented Apr 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deadalnix commented Apr 16, 2020

Uh oh!

mratsim commented Apr 16, 2020

Uh oh!

roconnor-blockstream commented Apr 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deadalnix commented Apr 21, 2020

Uh oh!

roconnor-blockstream commented Apr 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roconnor-blockstream commented Mar 24, 2021

Uh oh!

deadalnix commented Mar 24, 2021

Uh oh!

Uh oh!

elichai commented Mar 29, 2020 •

edited

Loading

deadalnix commented Mar 30, 2020 •

edited

Loading

mratsim commented Apr 15, 2020 •

edited

Loading

roconnor-blockstream commented Apr 21, 2020 •

edited

Loading

roconnor-blockstream commented Apr 21, 2020 •

edited

Loading