Multiprecision Long Division: Knuth’s Algorithm D

The computer algorithm for the long division, elementary-school style has been described, analyzed and proven by Prof. Donald Knuth already (TAoCP Vol. 2 Algorithm D), so I don’t have to do that. Many implementations exist in many languages, no need to write one from scratch, just port it. Problem: all of these implementations I could lay my hand on are either for small base-10 values (I used one of those[1] for my base 10^6 implementation), for full 32/64 bit or multiples of it (e.g. 16/32 or even 16/8 bit for some microcontroller), none for these “odd” 52/26 I need for JavaScript. I didn’t expect to find any with the exception of Tom Wu’s implementation which is not very readable.
The Book “Hacker’s Delight” by Henry S. Warren describes an implementation and gives free C code for 32/16 bit and 64/32 bit. It is clean and readable (even with comments!) and last but not least: has test cases for some normal cases and some borderline ones.

I used the 64/32 bit version and made some changes be able to make it a 52/26 bit version. I hope the test cases still have all the special cases in it.

/* This divides an n-word dividend by an m-word divisor, giving an
n-m+1-word quotient and m-word remainder. The bignums are in arrays of
words. Here a "word" is 32 bits. This routine is designed for a 64-bit
machine which has a 64/64 division instruction. */

#include <stdio.h>
#include <stdlib.h>     //To define "exit", req'd by XLC.

#define max(x, y) ((x) > (y) ? (x) : (y))

int nlz(unsigned x) {
   int n;

   if (x == 0) return(32);
   n = 0;
   if (x <= 0x0000FFFF) {n = n +16; x = x <<16;}
   if (x <= 0x00FFFFFF) {n = n + 8; x = x << 8;}
   if (x <= 0x0FFFFFFF) {n = n + 4; x = x << 4;}
   if (x <= 0x3FFFFFFF) {n = n + 2; x = x << 2;}
   if (x <= 0x7FFFFFFF) {n = n + 1;}
   return n;
}

void dumpit(char *msg, int n, unsigned v[]) {
   int i;
   printf(msg);
   for (i = n-1; i >= 0; i--) printf(" %08x", v[i]);
   printf("\n");
}

/* q[0], r[0], u[0], and v[0] contain the LEAST significant words.
(The sequence is in little-endian order).

This is a fairly precise implementation of Knuth's Algorithm D, for a
binary computer with base b = 2**32. The caller supplies:
   1. Space q for the quotient, m - n + 1 words (at least one).
   2. Space r for the remainder (optional), n words.
   3. The dividend u, m words, m >= 1.
   4. The divisor v, n words, n >= 2.
The most significant digit of the divisor, v[n-1], must be nonzero.  The
dividend u may have leading zeros; this just makes the algorithm take
longer and makes the quotient contain more leading zeros.  A value of
NULL may be given for the address of the remainder to signify that the
caller does not want the remainder.
   The program does not alter the input parameters u and v.
   The quotient and remainder returned may have leading zeros.  The
function itself returns a value of 0 for success and 1 for invalid
parameters (e.g., division by 0).
   For now, we must have m >= n.  Knuth's Algorithm D also requires
that the dividend be at least as long as the divisor.  (In his terms,
m >= 0 (unstated).  Therefore m+n >= n.) */

int divmnu(unsigned q[], unsigned r[],
     const unsigned u[], const unsigned v[],
     int m, int n) {

   const unsigned long long b = 67108864LL; // Number base (2**26).
   const unsigned long long mask = 67108863LL; // Number base (2**26)-1.
   unsigned *un, *vn;                         // Normalized form of u, v.
   unsigned long long qhat;                   // Estimated quotient digit.
   unsigned long long rhat;                   // A remainder.
   unsigned long long p;                      // Product of two digits.
   long long t, k;
   int s, i, j;

   if (m < n || n <= 0 || v[n-1] == 0)
      return 1;                         // Return if invalid param.

   if (n == 1) {                        // Take care of
      k = 0;                            // the case of a
      for (j = m - 1; j >= 0; j--) {    // single-digit
         q[j] = (k*b + u[j])/v[0];      // divisor here.
         k = (k*b + u[j]) - q[j]*v[0];
      }
      if (r != NULL) r[0] = k;
      return 0;
   }

   /* Normalize by shifting v left just enough so that its high-order
   bit is on, and shift u left the same amount. We may have to append a
   high-order digit on the dividend; we do that unconditionally. */

   s = nlz(v[n-1])-7;             // 0 <= s <= 25.
   vn = (unsigned *)alloca(4*n);
   for (i = n - 1; i > 0; i--)
      vn[i] = ((v[i] << s)&mask) | (((unsigned long long)v[i-1] >> (26-s))&mask);
   vn[0] = (v[0] << s)&mask;

   un = (unsigned *)alloca(4*(m + 1));
   un[m] = ((unsigned long long)u[m-1] >> (26-s))&mask;
   for (i = m - 1; i > 0; i--)
      un[i] = ((u[i] << s)&mask) | (((unsigned long long)u[i-1] >> (26-s))&mask);
   un[0] = (u[0] << s)&mask;

   for (j = m - n; j >= 0; j--) {       // Main loop.
      // Compute estimate qhat of q[j].
      qhat = (un[j+n]*b + un[j+n-1])/vn[n-1];
      rhat = (un[j+n]*b + un[j+n-1]) - qhat*vn[n-1];
again:
      if (qhat >= b || qhat*vn[n-2] > b*rhat + un[j+n-2])
      { qhat = qhat - 1;
        rhat = rhat + vn[n-1];
        if (rhat < b) goto again;
      }

      // Multiply and subtract.
      k = 0;
      for (i = 0; i < n; i++) {
         p = qhat*vn[i];
         t = un[i+j] - k - (p & mask);
         un[i+j] = t&mask;
         k = ((p >> 26)) - ((t >> 26));
      }
      t = un[j+n] - k;
      un[j+n] = t;

      q[j] = qhat;              // Store quotient digit.
      if (t < 0) {              // If we subtracted too
         q[j] = q[j] - 1;       // much, add back.
         k = 0;
         for (i = 0; i < n; i++) {
            t = (unsigned long long)un[i+j] + vn[i] + k;
            un[i+j] = t&mask;
            k = (t >> 26);
         }
         un[j+n] = un[j+n] + k;
      }
   } // End j.
   // If the caller wants the remainder, unnormalize
   // it and pass it back.
   if (r != NULL) {
      for (i = 0; i < n-1; i++)
         r[i] = ((un[i] >> s)&mask) | (((unsigned long long)un[i+1] << (26-s))&mask);
      r[n-1] = (un[n-1] >> s)&mask;
   }
   return 0;
}

int errors;

void check(unsigned q[], unsigned r[],
           unsigned u[], unsigned v[],
           int m, int n,
           unsigned cq[], unsigned cr[]) {
   int i, szq;

   szq = max(m - n + 1, 1);
   for (i = 0; i < szq; i++) {
      if (q[i] != cq[i]) {
         errors = errors + 1;
         dumpit("Error, dividend u =", m, u);
         dumpit("       divisor  v =", n, v);
         dumpit("For quotient,  got:", m-n+1, q);
         dumpit("        Should get:", m-n+1, cq);
         return;
      }
   }
   for (i = 0; i < n; i++) {
      if (r[i] != cr[i]) {
         errors = errors + 1;
         dumpit("Error, dividend u =", m, u);
         dumpit("       divisor  v =", n, v);
         dumpit("For remainder, got:", n, r);
         dumpit("        Should get:", n, cr);
         return;
      }
   }
   return;
}

int main() {
   static unsigned test[] = {
   // m, n, u...,          v...,          cq...,  cr....
      1, 1, 3,             0,             1,      1,            // Error, divide by 0.
      1, 2, 7,             1,3,           0,      7,0,          // Error, n > m.
      2, 2, 0,0,           1,0,           0,      0,0,          // Error, incorrect remainder cr.
      1, 1, 3,             2,             1,      1,
      1, 1, 3,             3,             1,      0,
      1, 1, 3,             4,             0,      3,
      1, 1, 0,             0x3ffffff,    0,      0,
      1, 1, 0x3ffffff,    1,             0x3ffffff, 0,
      1, 1, 0x3ffffff,    0x3ffffff,    1,      0,
      1, 1, 0x3ffffff,    3,             0x01555555, 0,
      2, 1, 0x3ffffff,0x3ffffff, 1,     0x3ffffff,0x3ffffff, 0,
      2, 1, 0x3ffffff,0x3ffffff, 0x3ffffff,        1,1,    0,
      2, 1, 0x3ffffff,0x3fffffe, 0x3ffffff,        0x3ffffff,0, 0x3fffffe,
      2, 1, 0x00005678,0x00001234, 0x00009abc,        0x007876ea,0, 0x3ea0,
      2, 2, 0,0,           0,1,           0,      0,0,
      2, 2, 0,7,           0,3,           2,      0,1,
      2, 2, 5,7,           0,3,           2,      5,1,
      2, 2, 0,6,           0,2,           3,      0,0,
      1, 1, 0x800000,  0x400001, 0x00000001, 0x3fffff,
      2, 1, 0x00000000,0x800000, 0x400001,   0x3ffffe0,0x00000001, 0x00000020,
      2, 2, 0x00000000,0x800000, 0x00000001,0x400000, 0x00000001, 0x3ffffff,0x3fffff,
      2, 2, 0x0000789a,0x0000bcde, 0x0000789a,0x0000bcde,          1,          0,0,
      2, 2, 0x0000789b,0x0000bcde, 0x0000789a,0x0000bcde,          1,          1,0,
      2, 2, 0x00007899,0x0000bcde, 0x0000789a,0x0000bcde,          0, 0x00007899,0x0000bcde,
      2, 2, 0x0000ffff,0x0000ffff, 0x0000ffff,0x0000ffff,          1,          0,0,
      2, 2, 0x0000ffff,0x0000ffff, 0x00000000,0x00000001, 0x0000ffff, 0x0000ffff,0,
      3, 2, 0x000089ab,0x00004567,0x00000123, 0x00000000,0x00000001,   0x00004567,0x00000123, 0x000089ab,0,
      3, 2, 0x0,0xfffe,0x8000, 0xffff,0x8000,   0x3ffffff,0x0, 0xffff,0x7fff, // Shows that first qhat can = b + 1.
      3, 3, 0x3,0x0,0x800000, 0x1,0x0,0x200000,   0x3, 0,0,0x200000, // Adding back step req'd.
      3, 3, 0x3,0x0,0x00008000, 0x1,0x0,0x2000,   0x3, 0,0,0x2000, // Adding back step req'd.
      4, 3, 0,0,0x8000,0x7fff, 1,0,0x8000,   0x3fff800,0, 0x800,0x3ffffff,0x7fff,  // Add back req'd.
      4, 3, 0,0xfffe,0,0x8000, 0xffff,0,0x8000, 0x3ffffff,0, 0xffff,0x3ffffff,0x7fff,  // Shows that mult-sub quantity cannot be treated as signed.
      4, 3, 0,0x3fffffe,0,0x800000, 0xffff,0,0x800000, 0x0,1, 0x0,0x3feffff,0x0,  // Shows that mult-sub quantity cannot be treated as signed.
      4, 3, 0,0x3fffffe,0,0x800000, 0x3ffffff,0,0x800000, 0x3ffffff,0, 0x3ffffff,0x3ffffff,0x07fffff,  // Shows that mult-sub quantity cannot be treated as signed.
   };
   int i, n, m, ncases, f;
   unsigned q[10], r[10];
   unsigned *u, *v, *cq, *cr;

   printf("divmnu:\n");
   i = 0;
   ncases = 0;
   while (i < (int)sizeof(test)/4) {
      m = test[i];
      n = test[i+1];
      u = &test[i+2];
      v = &test[i+2+m];
      cq = &test[i+2+m+n];
      cr = &test[i+2+m+n+max(m-n+1, 1)];

      f = divmnu(q, r, u, v, m, n);
      if (f) {
         dumpit("Error return code for dividend u =", m, u);
         dumpit("                      divisor  v =", n, v);
         errors = errors + 1;
      }
      else
         check(q, r, u, v, m, n, cq, cr);
      i = i + 2 + m + n + max(m-n+1, 1) + n;
      ncases = ncases + 1;
   }

   printf("%d errors out of %d cases; there should be 3.\n", errors, ncases);
   return 0;
}

You can unfold the thingy above by clicking on it.
The main difference beside the adapting of the tests and bitsizes is the mass of masking, especially when shifting a 64 bit number right. It was a hard to find error đŸ˜‰
Cause: if we do full wordsize shifts (16 or 32 bit) any residues left from former operations get dumped. Here some of those get included if they are in the highest six bits (assuming small endian).
Now I can check my own implementation when I have written it.

[1]P. Brinch Hansen, Multiple-length division revisited: A tour of the minefield, Software—Practice and Experience 24, (June 1994), 579­601. Copyright 1994, John Wiley & Sons, Ltd.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s