Skip to content

Optimization proposal for divisions #12

@mhaeuser

Description

@mhaeuser

The current division algorithm uses a BIGNUM structure for 'current' while its task can be completed with natural integer arithmetics. In my local fork (sorry, it's still a hot mess), I have changed the algorithm as follows, where several names and small things have been altered, but should still be recognizable fine:

  //
  // Use an integer of natural size to store the current Bit index as 'current'
  // is always a 2's potency. While a BIGNUM can theoretically hold a
  // 2's potency eight times larger than what can represent as Bit index with a
  // natural integer (Bytes vs Bits), this cannot happen within this function
  // as 'a' aligned to the next 2's potency would need to be just as big for
  // this to be the case. This cannot happen due to the address space
  // limitation.
  //
  size_t currentBitIndex = 0;                         // int current = 1;         
  bignum_u_assign_const(denom, b);                    // denom = b
  bignum_u_assign_const(tmp, a);                      // tmp   = a

  const DTYPE half_max = 1 + (DTYPE)(((1ULL << (8 * WORD_SIZE)) - 1U) / 2);
  bool overflow = false;
  while (bignum_u_cmp(denom, a) != LARGER)            // while (denom <= a) {
  {
    if (denom->d[BN_U_DLEN(denom) - 1] >= half_max)
    {
      overflow = true;
      break;
    }
    ++currentBitIndex;                                //   current <<= 1;                 
    _lshift_bit_const(denom, 1);                      //   denom <<= 1;
  }
  if (!overflow)
  {
    _rshift_bit_const(denom, 1);                      // denom >>= 1;
    --currentBitIndex;                                // current >>= 1;                 
  }
  bignum_u_init_const(c);                             // int answer = 0;
  //
  // currentBitIndex cannot add-wraparound to reach this value as reasoned in
  // the comment before.
  //
  while(currentBitIndex != (size_t)0U - 1U)           // while (current != 0)
  {
    if (bignum_u_cmp(tmp, denom) != SMALLER)          //   if (dividend >= denom)
    {
      bignum_u_sub(tmp, denom, tmp);                  //     dividend -= denom;            
      bignum_u_or_int (c, 1, currentBitIndex);        //     answer |= current;
    }
    --currentBitIndex;                                //   current >>= 1;
    _rshift_bit_const(denom, 1);                      //   denom >>= 1;
  } 

Along with this newly introduced function:

void bignum_u_or_int(ubn* a, DTYPE v, size_t nbits)
{
  require(a, "a is null");

  const uint8_t nbits_pr_word = (WORD_SIZE * 8);
  size_t nwords   = nbits / nbits_pr_word;
  uint8_t  nwordbits = nbits % nbits_pr_word;

  if (nwords >= BN_U_DLEN (a)) {
    return;
  }

  a->d[nwords] |= (v << nwordbits);
}

BN_U_DLEN returns the array size of the current sturcture instance, which I added because I cannot affort having every BIGNUM at the maximum required size among all.

This new function is also a really important one performance-wise for different tasks as it allows super-fast construction of 2's potencies. For example, I also used it for the calculation of R², used for Montgomery arithmetics, where R is a 2's potency. The BoringSSL algorithm utilizes BIGNUM shift, pow and mul algorithms for its construction, when with this function, it can be achieved easily with just a preceeding BIGNUM init.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions