# Multi-Precision Floats: Radix Conversion with Large Exponents

[UPDATE] bugfix in code listing
The method for the radix conversion of floating point numbers described more theoretically here and more practically here has one large drawback: albeit always correct it is abysmally slow for large magnitudes.

Is there a faster way to do it? Yes, there is!

But you need to have six Bigfloat-functions implemented: a function to convert small integers, a function for addition, one for multiplication, one for exponentiation with an integer exponent which is nothing but a shortcut for multiple multiplications, division, and finally normalization.

Converting small integers is simple, I implemented it in the constructor of my JavaScript implementation of a Bigfloat directly:

function Bigfloat(n) {
// keep an extra sign and save some typing
// Addendum: after lot of code writing the author can conclude that the
//           comment above about the amount of typing is complete and
//           utter...
this.sign = MP_ZPOS;
this.mantissa = new Bigint(0);
this.exponent = 1;
this.precision = MPF_PRECISION;
if (arguments.length == 1 && n.isInt() && Math.abs(n) < MP_INT_MAX) {
this.mantissa = Math.abs(n).toBigint();
this.mantissa.lShiftInplace(MPF_PRECISION - Math.abs(n).highBit() -
1);
this.mantissa.sign = (n < 0) ? MP_NEG : MP_ZPOS;
this.sign = this.mantissa.sign;
this.exponent = -this.precision + Math.abs(n).highBit() + 1;
}
}


No rounding necessary (but I set the minimum precision to 4 limbs, that is: 104 bits) which makes normalizing simple and straight-forward.

Addition is probably the most complicated basic function for a floating point number. The principle is quite simple: align the bits of the mantissas, add them, and normalize the result. The first question you have to answer yourself is how to align the bits. Shift the larger one to the left side (make it larger) until it fits the smaller one or shift the smaller one to the right (make it smaller) until it fits the larger one?

The most reasonable one seems to be the second: make the little one smaller until it fits or vanishes completely. It seems reasonable because it makes not much sense to add one millionth to one if the precision can only hold hundredths: 1.00 + 0.000001 = 1.00.
But you will loose the addend completely.

The other way around keeps all information but the amount of information can get very, very, large. It is not much with our example above but the difference can grow up to the full size of the exponent possible. With my implementation for example, that would be a difference of 253-1 bits or roundabout 5,643,501,183,366,341 decimal digits which should make clear that such a method would be in urgent need of some limitations. But where to put them? If you have an answer to that feel free to post it or better: publish it and please send me a copy.

It is a good compromise to use the first method and add some guard digits. How many? Well, it depends, sadly. But you already need to do almost every computation on whole limbs so just add one limb (26 bits for me, 32 or even 64 bits for native implementations) and call it a day.
You can do it generally or inside every elementary function but if you want to do it inside the function take care to initialize every variable of the function after the raise of the precision, even make copies of the arguments of the function. Doing that is in most cases cheaper than having to do e.g.: another iteration of a Newton-like algorithm.

Multiplication on the other side is probably the simplest function: just multiply the mantissas, add the exponents and normalize the result.

Exponentiation with an integer exponent is also simple: just port the exponentiation from the Bigint implementation to Bigfloats.

Division gets done by computing the multiplicative inverse, also known as the reciprocal and multiply it with the numerator.

Computing the reciprocal gets done with some rounds of Newton bracketing. A problem here is the same as for all Newton-like algorithms: to find a good initial value.

If the magnitude of the number is inside something we already have a function implemented for, like JavaScript’s native number representation, it is simple: just use it.
That sounds simpler than it is as I had to find out the hard way.
If the Bigfloat is too large, converting it to a native number will result in zero which itself will result in a division by zero. The first workaround was to replace the zero with a small value but that value might be way off and the Newton rounds might not converge in the given number of rounds.

At the second thought I remembered one of the approximation rules from school (do they still teach them?) which is $\frac{1}{ab^x} \approx \frac{1}{a}b^{-x}$. The Bigfloat type works with a mantissa and an exponent, too, so I am able to apply this rule here.

This needs the two additional functions to convert a Bigfloat to a native number and vice versa. I implemented it by operating at the gut-level directly but a simple loop (divide by ten and catch the digit that falls out) would do it, too.

Leaves the moralizing function&hellip&wait…what? Ok, I’ll let is stand đŸ˜‰

Uhm&helli;so the normalizing function is where all the magic happens.
A bit of highly commented code might help, I think:

Bigfloat.prototype.normalize = function() {
var cb, diff;
var err;
var c;

// If the current precision differs from the precision of the
// number to be normalized set it to the current precision
// but to the minimum precision at least
if (this.precision != MPF_PRECISION && this.precision >=
MPF_PRECISION_MIN) {
this.precision = MPF_PRECISION;
} else if(this.precision < MPF_PRECISION_MIN){
this.precision = MPF_PRECISION_MIN;
}
// no need to normalize zero because zero is the
// special value 0e1 as recommended by Prof. D. Knuth.
if (this.isZero()) {
return MP_OKAY;
}
// size of the mantissa in bits
// highBit() returns the zero based position of the
// highest set bit, hence the addition of one
cb = this.mantissa.highBit() + 1;

// we have more bits than we need
if (cb > this.precision) {
// compute the difference
diff = cb - this.precision;
// add the magnitude to the exponent
this.exponent += diff;
// a JavaScript specific check for overflow
if (!this.exponent.isInt()) {
this.setInf();
this.sign = MP_ZPOS;
return MPFE_OVERFLOW;
}
// Rounding needs information that gets lost in the next step
// so get it before. Wee need to know if the bit diff-1 is set
c = this.mantissa.dp[Math.floor(diff / MP_DIGIT_BIT)] & (1 << (diff %
MP_DIGIT_BIT));
// Shift right know, that is: divide by 2^diff, truncate the result
this.mantissa.rShiftInplace(diff);
// if the bit is set, round up
if (c != 0) {
this.mantissa.incr();
return MP_OKAY;
} else {
return MP_OKAY;
}
}
// We have less bits than needed
else if (cb < this.precision) {
// Make a proper zero.
// This avoids results like 0e-123 which some
// people prefer, YMMV
if (this.mantissa.isZero() == MP_YES) {
this.exponent = 1;
return MP_OKAY;
} else {
// do the very same like above just in the
// opposite direction
diff = this.precision - cb;
this.exponent -= diff;
if (!this.exponent.isInt()) {
this.setInf();
this.sign = MP_NEG;
return MPFE_UNDERFLOW;
}
this.mantissa.lShiftInplace(diff);
return MP_OKAY;
}
}
// if(cb == precision) nothing to do
return MP_OKAY;
};


Another function which is not really necessary here but will ease our work is a function to compare two Bigfloats. The principle is simple, too, as is a nasty little caveat: the two Bigfloats must be of the same precision or it will cause some really curious errors, so check for it first and normalize before comparing.

The comparing algorithm depends on the complexity of the individual methods. Comparing the sign is quite simple and comparing the mantissas is the most expensive. With this costs in mind I compared the signs together with a check if one is zero, if both are the same compare the exponents and if those are the same compare the mantissas.

Now to the actual radix conversion. To make things simple I restricted the input to base ten but the only thing where this restriction counts is in the string parser, the computation is the same for all bases. A string parser for some more bases is shown in the former algorithm linked to at the beginning of this post.

I think a highly commented code might be useful here. The parser itself is slightly different from the last one but not that much.

String.prototype.toBigfloatFast = function(base) {
var ten, a, b, len, k, e, c, str, digit, decimal, expo,
asign, exposign, ret, fdigs, oldeps, table, bigbase;
// as said above: rise precision at the very beginning of the function
oldeps = epsilon();
// TODO: may not be enough for very large exponents,
//       so: compute necessary precision more precisely
epsilon(oldeps + 10);
// gathers the digits
a = new Bigfloat(0);
len = this.length;
decimal = -1;
// flag gets set if an exponent exists
expo = undefined;
// sign of significant, also used as a flag
asign = 0;
// sign of exponent also used as a flag
exposign = 0;
// number of digits in fraction part
fdigs = 0;
// map character to value. ASCII only
table = [
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1, 123,-1, 124, 125,-1, 0, 1, // 123 = "+", 124 = "-", 125 = "."
2, 3, 4, 5, 6, 7, 8, 9,-1,-1,
-1,-1,-1,-1,-1, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
126,-1,-1,-1,-1,-1,-1, 10, 11, 12, //  126 = uppercase "p"
13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 126,-1,-1,-1,-1,-1,-1,-1, //  126 = lowercase "p"
-1,-1,-1,-1,-1,-1,-1,-1
];

str = this;

// TODO: checks & balances

if (arguments.length == 0) {
base = 10;
}

// max. base is 24 because of the character "p" used
// for the exponent mark
if(base < 2 || base > 23){
throw new RangeError("Base outside of range in String.toBigfloatFast");
}
// base is restricted to ten for this example
base = 10;

bigbase = base.toBigfloat();
// Not needed because of we use of a table to map
// the values directly
// str = str.toLowerCase();
for (k = 0; k < len; k++) {
// strip unicode
// TODO: check if it is a Unicode character
c = str.charCodeAt(k) & 0xff;
c = table[c];
if (c < 0) {
throw new RangeError("Unknown character in String.toBigfloat");
}
switch (c) {
case 123: // plus sign ("+")
case 124: // minus sign ("-")
if (typeof expo === "undefined") {
if (asign != 0) {
throw new RangeError(
"Second decimal sign found in String.toBigfloat"
);
} else {
asign = (c == 124) ? -1 : 1;
}
} else {
if (exposign != 0) {
throw new RangeError(
"Second exponent sign found in String.toBigfloat"
);
} else {
exposign = (c == 124) ? -1 : 1;
}
}
break;
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
case 6:
case 7:
case 8:
case 9:
if (typeof expo === "undefined") {
// TODO: cache it?
digit = new Bigfloat(c);
if (decimal != -1) {
fdigs++;
}
} else {
// only exponents in base ten allowed
expo = expo * 10 + c;
}
break;
case 125: // decimal mark ('.')
if (decimal != -1) {
throw new RangeError(
"Second decimal mark found in String.toBigfloat"
);
}
decimal = k;
break;
case 14: // exponent mark ('e')
if (typeof expo !== "undefined") {
throw new RangeError(
"Second exponent mark found in String.toBigfloat"
);
}
expo = 0;

break;
default:
throw new RangeError(
"Unknown character in String.toBigfloat " + str.charAt(k));
}
}
ret = a;
// Ignore the exponent if the mantissa is zero.
// This avoids results like 0e-123 which some people
// might prefer, so YMMV
if (ret.isZero()) {
return ret;
}
// We have some fractional digits and need
// to adjust the magnitude by shifting right
// the number of fractional digits
// TODO: for base 2 (two) use shift instead
if (fdigs != 0) {
b = bigbase.pow(fdigs);
ret = ret.div(b);
}
// Set the sign
if (asign == -1) {
ret = ret.neg();
}
// We have an exponent, add it
if (typeof expo !== "undefined") {
if (exposign == -1) {
expo = -expo;
}
// shift by the amount set by the exponent
// the pow() function knows what to do with negative
// exponents but yours might not in which case you
// have to check for the sign of the exponent and
// act accordingly
b = bigbase.pow(expo);
ret = ret.mul(b);
}
// As said above: restore the precision first, than normalize
epsilon(oldeps);
ret.normalize();
return ret;
};


This function is, despite its name, slower than the old algorithm for small input but the exact definition of small is one of the YMMV cases. I have set the cutoffs to 100 decimal digits string length and an exponent larger than plus-minus 100.