## Division by Way of Multiplication with the Reciprocal (Newton Division)

For divisions with participants in the range of millions of bits we can use this method. It is probably the simplest method of all four division algorithms but also the one with the largest overhead.

The algorithm itself is probably the most published one (mainly because of its simplicity, I guess) but the version for integers: not so much, I think. A short search at Google brought up: nothing. Especially nothing about the problems you’ll meet.

At first the actual algorithm for large integers with base-2^{k} digits (2^{26} in the case of my implementation) in pseudocode:

divisionNewton(a,b){ // checks and balances omitted // length of numerator in bits alen = bitlength(a); // length of denominator in bits blen = bitlength(b); // length of remainder in bits rlen = alen - blen; // We need some extra precision to avoid // needing an expensive extra round // this value is most probably too high // but doesn't add much to the runtime extra = ilog2(blen) + ilog2(alen) + 1; // at least three extra-bits are needed if(extra < 3){ extra = 3; } // these extra bits get added to the // Numerator by multiplying with 2^extra, that is, // shifting it left as = a * 2^extra; // adjust lengths accordingly alen = alen + extra; rlen = rlen + extra; // We do not need to do every step in full precision // but we must start somewhere. // By choosing the start precision like that we could // do the first approximation purely with normal numbers // but the effect is so abysmally small, it is wa below // measure-noise on my machine startprecision = int (MP_DIGIT_BIT/2); // You can pre-compute the individual precisions needed // in the iterations, like I described in the previous post giantsteps = computeGiantsteps(startprecision,rlen,2); steps = giantsteps.length; // the first approximation // for integers the first approximation can be done directly // by calculating 2^k/b and adjust the precision t1 = 2^(2 * startprecision); bs = b / 2^(blen - startprecision); bs = round(bs, ROUND_UP); t1 = t1 / bs; // These iterations look slightly different from most // descriptions but they are the very same, I thought it // can be done faster that way but I was wrong. // you do not need that much temporary variables, it // is for legibility only for(var i = 0; i < steps ; i++){ gsi = giantsteps[i]; // Adjust numerator (2^k) to new precision t3 = t1 * 2^(gsi - gs0 + 1); // Adjust denominator to new precision t4 = b / 2^(blen - gsi); // Do the squaring of the Newton-Raphson algorithm t1 = t1^2; // Do the multiplication of the Newton-Raphson algorithm t1 = t1 * t4; // The division of N-R gets replaced by a simple shift t4 = t1 / 2^(2 * gs0); // Do the subtraction of the Newton-Raphson algorithm t1 = t3 - t4; gs0 = gsi; } // the reciprocal is in t1, do the final multiplication to get the quotient // Adjust the numerator's precision to the precision of the denominator // but leave the extra bits in place as = as / 2^(blen); // Do the actual multiplication N * 1/D to get the raw quotient q = as * t1; // Get the real quotient by dividing by the 2^k from the top plus the // extra bits q = q / 2^(rlen + extra); // Compute the remainder, we need it r = a - q * b; // The algorithm as it is implemented can be off by one. // do one round of Barrettt reduction as known from the // Barrett division treated in my last post to correct if( r < 0){ r = r + b; q = q -1; } else if( r >= b){ r = r - b; q = q + 1; } return [q, r]; }

More comments than code 😉

The same for my JavaScript implementation of a bigint library:

Bigint.prototype.divisionNewton = function(bint){ var tlen, blen, rlen, extra; var t1, t2, t3, t4, ts, q, r; var giantsteps, steps, gs0, gsi, startprecision; tlen = this.highBit() + 1; blen = bint.highBit() + 1; rlen = tlen - blen; // probably too much and should be adjusted to fill a limb if possible, too. extra = blen.highBit() + tlen.highBit() + 1; // should also have three bits at least if(extra < 3){ extra = 3; } ts = this.lShift(extra); tlen += extra; rlen += extra; // The value of startprecision has been chosen to keep the first // approximation in the Number range but is was found to be of about the // same speed as with Bigints. YMMV, so please try it out yourself. startprecision = 15; // precompute individual precisions to keep the iteration loop legible. giantsteps = computeGiantsteps(startprecision,rlen,2); steps = giantsteps.length; t1 = new Bigint(1); t1.lShiftInplace(2 * giantsteps[0]); t1 = t1.div(bint.rShiftRounded(blen - giantsteps[0])); // the first entry of giantsteps is not necessarily equal to startprecision gs0 = giantsteps[0]; for(var i = 0; i < steps ; i++){ gsi = giantsteps[i]; // Adjust numerator (2^k) to new precision t3 = t1.lShift(gsi - gs0 + 1); // Adjust denominator to new precision t4 = bint.rShift(blen - gsi); // Do the squaring of the Newton-Raphson algorithm t1 = t1.sqr(); // Do the multiplication of the Newton-Raphson algorithm t1 = t1.mul(t4); // The division of N-R gets replaced by a simple shift t4 = t1.rShift(2 * gs0); // Do the subtraction of the Newton-Raphson algorithm t1 = t3.sub(t4); gs0 = gsi; } // the reciprocal is in t1, do the final multiplication to get the quotient // Adjust the numerator's precision to the precision of the denominator ts.rShiftInplace(blen); // Do the actual multiplication N*1/D q = ts.mul(t1); // Divide by 2^k to get the quotient q.rShiftInplace(rlen + extra); // compute the remainder r = this.sub(q.mul(bint)); // The N_R algorithm as implemented can be off by one, correct it if( r.sign == MP_NEG){ r = r.add(bint); q.decr(); } else if( r.cmp(bint) == MP_GT){ r = r.sub(bint); q.incr(); } return [q, r]; };

If you have implemented your bigint with a decimal base (e.g.: 10^{7}) just replace all 2^{k} with 10^{k} and use base 10 integer logarithms where I used base 2 integer logarithms (but both do the same: they count digits).

Next post: a needful little helper to speed up unbalanced multiplication which speeds up all of these fast divsion algorithms.