Optimization proposal for divisions

The current division algorithm uses a [BIGNUM structure for 'current'](https://github.com/kokke/tiny-bignum-c/blob/e814d2b6b522ecaa67701453c492811a8adbc555/bn.c#L282) while its task can be completed with natural integer arithmetics. In my local fork (sorry, it's still a hot mess), I have changed the algorithm as follows, where several names and small things have been altered, but should still be recognizable fine:
```
  //
  // Use an integer of natural size to store the current Bit index as 'current'
  // is always a 2's potency. While a BIGNUM can theoretically hold a
  // 2's potency eight times larger than what can represent as Bit index with a
  // natural integer (Bytes vs Bits), this cannot happen within this function
  // as 'a' aligned to the next 2's potency would need to be just as big for
  // this to be the case. This cannot happen due to the address space
  // limitation.
  //
  size_t currentBitIndex = 0;                         // int current = 1;         
  bignum_u_assign_const(denom, b);                    // denom = b
  bignum_u_assign_const(tmp, a);                      // tmp   = a

  const DTYPE half_max = 1 + (DTYPE)(((1ULL << (8 * WORD_SIZE)) - 1U) / 2);
  bool overflow = false;
  while (bignum_u_cmp(denom, a) != LARGER)            // while (denom <= a) {
  {
    if (denom->d[BN_U_DLEN(denom) - 1] >= half_max)
    {
      overflow = true;
      break;
    }
    ++currentBitIndex;                                //   current <<= 1;                 
    _lshift_bit_const(denom, 1);                      //   denom <<= 1;
  }
  if (!overflow)
  {
    _rshift_bit_const(denom, 1);                      // denom >>= 1;
    --currentBitIndex;                                // current >>= 1;                 
  }
  bignum_u_init_const(c);                             // int answer = 0;
  //
  // currentBitIndex cannot add-wraparound to reach this value as reasoned in
  // the comment before.
  //
  while(currentBitIndex != (size_t)0U - 1U)           // while (current != 0)
  {
    if (bignum_u_cmp(tmp, denom) != SMALLER)          //   if (dividend >= denom)
    {
      bignum_u_sub(tmp, denom, tmp);                  //     dividend -= denom;            
      bignum_u_or_int (c, 1, currentBitIndex);        //     answer |= current;
    }
    --currentBitIndex;                                //   current >>= 1;
    _rshift_bit_const(denom, 1);                      //   denom >>= 1;
  } 
```
Along with this newly introduced function:
```
void bignum_u_or_int(ubn* a, DTYPE v, size_t nbits)
{
  require(a, "a is null");

  const uint8_t nbits_pr_word = (WORD_SIZE * 8);
  size_t nwords   = nbits / nbits_pr_word;
  uint8_t  nwordbits = nbits % nbits_pr_word;

  if (nwords >= BN_U_DLEN (a)) {
    return;
  }

  a->d[nwords] |= (v << nwordbits);
}
```
BN_U_DLEN returns the array size of the current sturcture instance, which I added because I cannot affort having every BIGNUM at the maximum required size among all.

This new function is also a really important one performance-wise for different tasks as it allows super-fast construction of 2's potencies. For example, I also used it for the calculation of R², used for Montgomery arithmetics, where R is a 2's potency. The BoringSSL algorithm utilizes BIGNUM shift, pow and mul algorithms for its construction, when with this function, it can be achieved easily with just a preceeding BIGNUM init.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimization proposal for divisions #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Optimization proposal for divisions #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions