-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Use the extened euclidian algorithm to compute scalar inverse in variable time #730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
966f173
to
be1ce35
Compare
This seem like a simpler approach than #290 (haven't reviewed the actual logic here) |
I wasn't aware of #290 , this is unfortunate. A few comments. The approach is slightly different. This uses an approach with no divisions - technically, divisions by powers of 2 only, but these can be performed as simple shifts and additions while #290 tries to limit the number of bigint divisions using lehmer's optimization. I do not know which is best and it is hard to compare because the implementations details are quite different. I was considering using a 5x64bits or 9x32bits big int for On a more high level note, I think proceeding is small steps is preferable. It is easier to review, avoids rebase hell and delivers value to users now. In addition, it allows for better fan out down the road, as it is possible to work and experiment with the various optimisations and variations discussed independently on top of a common base. Implementing jacobi, further optimizing this and making it work with field element in addition to scalars is on my todo list, but I think it is better to agree on some approach before investing that much effort into this. |
Let me just note for now that I haven't had a closer look so far but this looks very interesting. Is this the algorithm described in pseudocode here? https://crypto.stackexchange.com/a/54623/12020 |
Not quite, but it is similar. The way i approached is starting with algorithm 2.23 from "Guide to Elliptic Curve Cryptography" by Hankerson, Menezes and Vanstone for which I added a few optimizations:
Pseudocode is as follow:
This is the hist of it, but there are various optimizations done along the way. |
be1ce35
to
cc651da
Compare
Not adding it to scalar.h because it is not meant to be a public API. Scalars do not have ordering per se, but this is meant to be able to implement the extended euclidian algorithm in order to compute inverse in variable time effisciently.
…scalars in variable time. This is faster than the exponentiation method. It is still significantly slower than GMP, so we keep that option as well.
cc651da
to
0a4741a
Compare
I updated secp256k1_scalar_pow2_div to add zero instea of skipping the fixup with a branch as it turns out it is faster. |
FYI, EEA can be made constant-time with very reasonable speed (for a constant-time impl, it would still be an order of magnitude slower than variable time GCD like in GMP) following the algorithm 5 from paper: https://link.springer.com/content/pdf/10.1007%2F978-3-642-40588-4_10.pdf Implementation in C are available in GMP (mirror) and Nettle, in C++ in Botan and in Nim in my own library Constantine There is a lengthy discussion on constant-time modular inverse in Botan repo: randombit/botan#1479 In particular there is discussion on a recent paper by Bernstein and Yang that claims a very fast constant time GCD and inversion (https://eprint.iacr.org/2019/266). |
That's interesting, but considering the benchmark I have, I highly doubt this can be made faster that the exponentiation technique that is used today. |
The Bernstein-Yang paper claims that their GCD-based inversion is faster than using Little Fermat on Curve25519. Similarly, the Möller algorithm for inversion is faster than the addition-chains from secp256k1 in my own benchmark though I use a generic Montgomery mul/square algorithm and the difference is about 20% so faster/specialized mul/square will probably change the balance). Anyway if there is a dedicated thread on constant-time inversion I would be happy to discuss there instead of in this variable time EEA thread. |
I've made a PR with some suggested amendments @ deadalnix#1 roconnor-blockstream@2446b1e replaces the I suppose there is an argument over which style is better, using subtraction or addition with complement as is done in scalar_negate. My opinion is that subtraction is marginally better and arguably the scalar_negate should be using this subtraction technique. roconnor-blockstream@13e07ab revises the main That all said, it would be a huge improvement to generalize this so that it can be used to compute field inversions in addition to scalar inversions. ECDSA verification requires a single scalar inversion, and Schnorr verification requires no scalar inversions at all, whereas field inversion occurs in a moderate number of places during verification. I think we can generalize this by building a static table of "remainders", as the Let me know if I've made any errors. |
Awesome! I'm not sure if it is best to comment here or in your PR, I'll do it here and we can move the discussion there if need be. I was working myself on the next iteration of this thing and I got some more nice speedup - mostly by avoiding to reduce at every steps. Unfortunately, we stepped on each others toe a bit, but that's okay, there are no way you'd be able to know.
|
Regarding variable scoping. I wasn't sure either. Jonas suggested to me that it was okay to narrow the scope, so I guess we should do it unless we hear an objection. Regarding In light of your overall comments, I suggest closing my PR against your repo and I can revisit it once you have some new code. Feel free to incorporate any of my suggestions that you agree with into your next update. I have a couple of other, more questionable tricks in there, such as using |
I believe this PR is now obsolete and can be closed since https://github.com/bitcoin-core/secp256k1/pull/831/files has now been merged. |
Yes. Closing. |
This is faster than the multiple exponentiation method that is used now. It remains way slower than libgmp, so probably not a full replacement, but it provides a nice speedup when libgmp is not available.
These are the benchmark time I measure.
While there is a fair bag of trick that is used to make this fast, there is still a lot of untapped potential. For instance, doing numerous additions before reducing x0 and x1. While more could be done, this is already an improvement, so it is time for a PR.