Conversion of Binary Floating Point Numbers—Second Try

The conversion of a string to a binary encoded floating point number is not easy, as shown in my last post. The information given there is a bit too much on the theoretical side. I’ll try to change that here with some real code: String to Bigfloat and Bigfloat to String.

The conversion of a string to a float has several stages. The first one is, obviously the parsing of the string. I tried it with JavaScript’s build-in regular expression methods but that made the thing more complicated than necessary. A simple LR parser will do it as well if not even better.

The String Parser

A floating point number according to IEEE-754 consists of seven parts, some of them are optional.

  1. The sign of the whole number. One character, optional
  2. The integer part of the number. An arbitrary number of characters, optional
  3. The decimal point (or -mark. In general: radix point). One Character, optional
  4. The fractional part of the number. An arbitrary number of characters, optional
  5. The exponent mark, One character, optional
  6. The sign of the exponent. One character, optional
  7. The exponent part of the number. An arbitrary number of characters, optional

As it is easy to see: anything is optional, so some conditions must exist otherwise even an empty string would have to be considered a proper floating point number.

The conditions:

  • Any sign belongs to a number/exponent, only one per number/exponent, only in front of the number/exponent
  • An exponent mark belongs to an exponent, only one per exponent
  • A decimal point belongs to a number, only one per number, none in the exponent
  • A number consists of digits, at least one
  • A digit character must not be the same as an exponent mark

Examples:

These are some of correct floating point numbers.

The following numbers are all zero but the sign of the number is relevant and shall
not be dropped. Only minus signs are shown, plus signs are redundant but might exist.

  • 0
  • 0.
  • .0
  • 0e0
  • 0.e0
  • .0e0
  • -0
  • -0.
  • -.0
  • -0e0
  • -0.e0
  • -.0e0
  • 0e-0
  • 0.e-0
  • .0e-0
  • -0e-0
  • -0.e-0
  • -.0e-0
  • 0e-100
  • 0e100
  • -0e-100
  • -0e100

Some so called normal numbers

  • 123
  • 123.
  • 1.23
  • .123
  • 123e3
  • 1.23e3
  • .123e3
  • -123
  • -123.
  • -1.23
  • -.123
  • -123e3
  • -1.23e3
  • -.123e3
  • 123e-3
  • 123.e-3
  • 1.23e-3
  • .123e-3
  • -123e-3
  • -1.23e-3
  • -.123e-3

The existence of subnormal/denormal numbers depend on the minimum value of the exponent. There exists such a value in a multi-precision but the exponent can be a 32 bit number or even a big integer of arbitrary size, so subnormal numbers are hard to find in this case.

Not floating pointer numbers.

  • .
  • 1e
  • “”
  • .e3
  • 1.-e2

Some special values are defined, too.

  • Inf
  • Infinity
  • NaN
  • -Inf
  • -Infinity
  • -NaN

The case of the characters should not matter but it is normally either the above form or all lower-case. The sign does matter!

Note:One example is the rule-set for the input of atan2(y,x) as defined in ECMAScript 5.1 and, with some slight variations here and there every other implementation of the two valued arcus-tangent function. With {  \mathop{\rm NaN} } some symbols {  \mathop{\rm NaN} \not=  \mathop{\rm NaN} } that are not a number and \infty for a signed infinity, the two points added to the real line to make it the extended real line and \pi' a finite approximation to \pi we get the rule-set from ECMAScript 5.1

{\displaystyle{\mathop{\rm atan2}(y,x) = \mathop{\rm NaN} \quad\text{with}\quad y = \mathop{\rm NaN} \lor x = \mathop{\rm NaN}  }}
{\displaystyle{\mathop{\rm atan2}(y,x) = \frac{\pi'}{2}  \quad\text{with}\quad y > 0 \land x = +0  }}
{\displaystyle{\mathop{\rm atan2}(y,x) = \frac{\pi'}{2}  \quad\text{with}\quad y > 0 \land x = -0  }}
{\displaystyle{\mathop{\rm atan2}(y,x) = +0   \quad\text{with}\quad y = +0 \land x > 0}}
{\displaystyle{\mathop{\rm atan2}(y,x) = +0   \quad\text{with}\quad y = +0 \land x = +0  }}
{\displaystyle{\mathop{\rm atan2}(y,x) = +\pi'  \quad\text{with}\quad y = +0 \land x = -0 }}
{\displaystyle{\mathop{\rm atan2}(y,x) = +\pi'  \quad\text{with}\quad y = +0 \land x < 0  }}
{\displaystyle{\mathop{\rm atan2}(y,x) =  -0  \quad\text{with}\quad y = -0 \land x > 0 }}
{\displaystyle{\mathop{\rm atan2}(y,x) =  -0  \quad\text{with}\quad y = -0 \land x = +0   }}
{\displaystyle{\mathop{\rm atan2}(y,x) = -\pi' \quad\text{with}\quad y = -0 \land x = -0   }}
{\displaystyle{\mathop{\rm atan2}(y,x) = -\pi' \quad\text{with}\quad y = -0 \land x < 0   }}
{\displaystyle{\mathop{\rm atan2}(y,x) = -\frac{\pi'}{2} \quad\text{with}\quad y < 0 \land x = +0   }}
{\displaystyle{\mathop{\rm atan2}(y,x) = -\frac{\pi'}{2} \quad\text{with}\quad y < 0 \land x = -0   }}
{\displaystyle{\mathop{\rm atan2}(y,x) = +0 \quad\text{with}\quad y > 0 \land y \not= \infty \land x = +\infty  }}
{\displaystyle{\mathop{\rm atan2}(y,x) = +\pi' \quad\text{with}\quad y > 0 \land y \not= \infty \land x = -\infty   }}.
{\displaystyle{\mathop{\rm atan2}(y,x) = -0 \quad\text{with}\quad y < 0 \land y \not= \infty \land x = +\infty  }}
{\displaystyle{\mathop{\rm atan2}(y,x) = +\pi' \quad\text{with}\quad y < 0 \land y \not= \infty \land x = -\infty  }}
{\displaystyle{ \mathop{\rm atan2}(y,x) = +\frac{\pi'}{4} \quad\text{with}\quad  y = +\infty \land x = +\infty  }}
{\displaystyle{\mathop{\rm atan2}(y,x) = +\frac{3\pi'}{4} \quad\text{with}\quad  y = +\infty \land x = -\infty   }}
{\displaystyle{ \mathop{\rm atan2}(y,x) = -\frac{\pi'}{4} \quad\text{with}\quad  y = -\infty \land x = +\infty }}
{\displaystyle{ \mathop{\rm atan2}(y,x) = -\frac{3\pi'}{4} \quad\text{with}\quad  y = -\infty \land x = -\infty }}
{\displaystyle{  }}

I hope I did it all correct in my implementation.

Nasty Typographical Traps

In most prints, the numbers have thousand-separators. Typographically correct is, as far as I remember, one fifths of a unit: 1 000 000. It is rarely seen today, most use some kind of visual mark, mostly a comma when the decimal point is a period and a period when the decimal point is a comma or even something completely different. From that last link to Wikipedia we must accept the fact that the decimal point is not a period everywhere, so we need to localize or restrict it. Localizing is very complicated, but restricting is not as rude as it sounds in the first place. We restrict it to the period (ASCII 0x2e) and think about using the underbar (ASCII 0x5f) as a thousand separator. The problem with the underbar is that e.g.: _123 is a proper variable name but this gets caught if we insist on declaring all variables as variables: let _10 = "ten"; such that things like zeta(_10) cause a parse error in the line of “Wrong type in argument to zeta()”.

The Exponent

The character used to mark the beginning of the exponent depends on the base of the number. It is also complicated to do it for all cases, restricting is simpler, as it is always if you are not the one at the front line trying to sell the software you wrote.
We restrict the possible input to the bases 2, (8,) 10, and 16. For the bases 2, 8, and 16 the character “p” as the exponent mark indicates that the exponent itself is encoded in base 10. That makes it an easy choice to restrict the exponent to base 10.

All Your Base Are Belong to …uhm…sorry, couldn’t resist

A base of the number other then 10 is usually marked with a leading zero followed by a character or—in case of base 8—a digit (which makes base 8 so awkward to implement). The characters are “x” for base 16 and “b” for base 2. We restrict the acceptable bases to 2, (8,) 10 and 16.

Examples:
Base 2: 0b100101.001101p1239
Base 8: 071234.76125P1239
Base 10: 97854.7239817e1239
Base 16: 978aF4.72aF9817p1239

If it is not already obvious: for such small bases the case of the characters is irrelevant. The encoding of the characters on the other side definitely is. We restrict the encoding to ASCII, that is, the byte values 0x2e, 0x30-0x39, 0x41-0x46, 0x50, 0x61-0x66, 0x70, and the underbar at 0x5f.

Exceptional

The exceptions Infinity, Inf, and NaN together with their signs need some extra handling before the parsing. Well, there is no pressing need to do so but it makes things simpler. So does the extra handling of the base markers at the beginning of the input string and the sign of the number placed at the very beginning of the input string.

The String parser

We have no need for a lot of variables but these few:

// the length of the input string
var slen = s.length;
// a variable holding the currently read character
var c;
// a string holding the exponent part of the number
var exponent_part = "";
// flag to check if the number has one and only one decimal mark
var decimal_point = -1;
// digits are digits, so this flag denotes the case that we are
// parsing the exponent part of the number
var is_expo_part = false;
// all digits but the exponent
var str_number = "";
// the sign of the number
var sign = "+";
// the sign of the exponent
var expo_sign = "+";
// the base of the number (base of exponent is always 10)
var base = 10;
// Index of main loop
var i;
// Index in string (would be the pointer in C)
var k = 0;

We should start with some clean-ups and shortcuts. We do the clean-ups in the main language parser so no clean-ups necessary in the number parser.

The sign, if one exists is always the first symbol, check for it.

// unitary minus/plus comes before base marks (e.g.: "0x")
if (s.charAt(k) == "-" || s.charAt(k) == "+") {
    sign = s.charAt(0);
    // increase pointer to next index of string
    k += 1;
}

The next shortcut is for the exceptions. Shown for brevity are only two of them. It should even be sufficient, the language parser can do the main work and return either Inf of NaN plus the sign. We do the same here.

if (s.charAt(k) == "I" && s.charAt(k + 1) == "n" 
      && s.charAt(k + 1) == "f") {
    return sign + "Infinity";
}

if (s.charAt(k) == "N" && s.charAt(k + 1) == "a" 
      && s.charAt(k + 1) == "N") {
    return sign + "NaN";
}

Having that out of the way we can go to the next shortcut: the detection of the base. It could be done with JavaScript’s build-in regular expressions but I wanted it to have as similar to the C implementation as possible.

// check for a leading zero
if (s.charAt(k) == "0") {
    // "x" marks base 16
    if (s.charAt(k + 1) == "x") {
        base = 16;
        // "0x" are two characters, increment pointer accordingly
        k += 2;
    // "b" marks base 2 ("b" as in "binary"?)
    } else if (s.charAt(k + 1) == "b") {
         base = 2;
         // "0b" are two characters, increment pointer accordingly
         k += 2;
     }
     // no leading zeros allowed for octal numbers, except the first
     // to make things simpler
     else if (s.charAt(k + 1) && s.charAt(k + 1) != "0"
                && s.charAt(k + 1) != ".") {
         base = 8;
         // "0" is only _one_ character, increment pointer accordingly
         k += 1;
    }
    // it is not a leading zero, it is an actual zero
    if (s.length == 1) {
        // return a signed zero (zero is the special number "0e1"")
        return [base, sign, "0", decimal_point, 1];
    }
}

Leading zeros are allowed, no problem, just strip them.

while (s.charAt(k) == "0") {
    k++;
}
// If all we had were zeros it is an actual zero
if (k == slen) {
    // return a signed zero
    return [base, sign, "0", decimal_point, 1];
}

We could do another shortcut to handle things like “0.”. “0.0”, or “.0” but that would only clutter the code.
In the hope that we got all special cases treated we can go to the big loop.

One of the problems with JavaScript/ECMAScript is the fact that the strings are encoded in 16 bit Unicode. We trust the main language parser to handle it correctly and to return ASCII only and plainly ignore it.

The main loop starts at the index given by k in the input string and goes up to the end if no fatal error occurred and picks every single character out for treatment. This treatment gets done by a longer switch. An abbreviated loop first to give you a taste of it.

// loop over the string starting at given index
for (i = 0; i < slen - k; i++) {
    // pick a character, could be a 16-bit character-ignored
    // index offset necessary because we started with i=0
    // we also could have started at k, no difference
    c = s.charAt(i + k);
    // one large switch to handle all cases
    switch (c) {
        case "0":

           /*...*/

        // There are Unicode symbols that look like digits and can be
        // taken as digits, are even meant to be digits but we insist
        // on ASCII only here.
        default:
            return "Parse error: unknown character = \"" + c +  "\"";
   }
}

So it is quite simple, the harder part comes now with the full switch. I use a lot of so called fall-throughs here which could be shortened by a regex but I found out that it is actually shorter this way because you get your parts with very little code with a regular expression but you need more code to handle the individual parts. With such a simple LR-parser you can do it all at once. More or less 😉

for (i = 0; i < slen - k; i++) {
  c = s.charAt(i + k);
  switch (c) {
    case '0':
    case '1':
      // These are binary digits, they fit everywhere.
      // Just check where we are and concat it to the appropriate
      // variable
      if (is_expo_part == true) {
        exponent_part += c;
      } else {
        str_number += c;
      }
      // nothing else to do-next character, please
      break;
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
      // the base 8 digits that are not fit for
      // base 2
      if (base == 2) {
        return 'Parse error: digit > base 2';
      }
      // same spiel as above: choose the right part and
      // concat to variable
      if (is_expo_part == true) {
        exponent_part += c;
      } else {
        str_number += c;
      }
      break;
    case '8':
    case '9':
      // same spiel as above, this time for base 8 and smaller
      if (base <= 8) {
        return 'Parse error: digit > base 8';
      }
      if (is_expo_part == true) {
        exponent_part += c;
      } else {
        str_number += c;
      }
      break;
    case 'a':
    case 'A':
    case 'b':
    case 'B':
    case 'c':
    case 'C':
    case 'd':
    case 'D':
      // These are hexadecimal characters.
      // We restricted the exponent to base 10.
      // The rest is the same as above, this time for
      // base 10 and smaller
      if (is_expo_part == true) {
        return 'Parse error: base in exponent > 10';
      } else if (base <= 10) {
        return 'Parse error: digit > base 10';
      } else {
        str_number += c;
      }
      break;
    case 'e':
    case 'E':
      // check if it is a digit or an exponent mark
      // there can be only one such mark, so raise an error
      // if it is the second occurrence
      if (base != 16 && is_expo_part == true) {
        return 'Parse error: additional exponent part found';
      }
      // We do not have bases larger than 16 so this check
      // is also valid
      if (base == 16) {
        str_number += c;
      } else {
        is_expo_part = true;
      }
      break;
    case 'f':
    case 'F':
      if (base <= 10) {
        return 'Parse error: digit > base 10';
      } else {
        str_number += c;
      }
      break;
    case 'p':
    case 'P':
      // The exponent mark for bases other than 10
      // and an indicator that the exponent is in base 10
      if (base == 16) {
        is_expo_part = true;
      } else {
        return 'Parse error: wrong character: "' + c + '"';
      }
      break;
    case '.':
      // The decimal point/radix point.
      // There is only one of it, so raise an error
      // at the second occurrence
      if (decimal_point >= 0) {
        return 'Parse error: additional decimal point found';
      }
      decimal_point = i;
      break;
    case '+':
    case '-':
      // There is only one sign at the beginning of the exponent part.
      // It does not check the condition that the sign is at
      // the beginning of the exponent. Funny thing: it does not matter
      // here. But the language parser would make an addition out of
      // it and it would never come down here.
      if (is_expo_part == true) {
        expo_sign = c;
      } else {
        return 'Parse error: sign at wrong position ';
      }
      break;
/*
    case '_':
      // ignore underbars
      break;
*/
    default:
      // Either raise an error  or ignore (drop) it.
      // Raising an error is much cleaner.
      return 'Parse error: unknown character = "' + c + '"';
  }
}

We might have nothing at all. Can happen with e.g.: “.” or “.E”, something the main language parser should have caught but did not for some unknown reason.

if (str_number.length == 0) {
     // No error raised, just return zero instead, for simplicity
     // Maybe raise a warning, for debugging purposes
     // console.log("Parse error: number has no digits?");
     return [base, sign, "0", decimal_point, 1];
}

The exponent is a small (in the computing sense, meaning that it can be put in a native data type) integer so we can handle it here before it gets forgotten.

if (is_expo_part == true) {
    exponent_part = parseInt(expo_sign + exponent_part);
} else {
    exponent_part = 0;
}

Handle the fraction part. On itself it can have leading zeros e.g.: “0.00000123” but they count, so count them. It can have trailing zeros e.g.: “0.001230000”. They do not count so do not count them.

// we do have a fraction part in the string
if (decimal_point >= 0) {
  // Part the string at the decimal point.
  // Nearly the same in C where you set two pointers pointing
  // to the beginning of the individual parts. This is more
  // complicated in JavaScript, so a substring() will do
  var parts0 = str_number.substring(0, decimal_point);
  var parts1 = str_number.substring(decimal_point);
  k = 0;
  i = parts1.length - 1;
  // count leading zeros, integer part
  while (parts0.charAt(k) == '0') {
    k++;
  }
  // count trailing zeros, fractional part
  while (parts1.charAt(i) == '0') {
    i--;
  }
  // strip the zeros
  // As above: substring() instead of pointer juggling
  parts0 = parts0.substring(k);
  parts1 = parts1.substring(0, i + 1);
  k = 0;
  // count leading zeros, fractional part
  while (parts1.charAt(k) == '0') {
    k++;
    if (k == parts1.length) {
      break;
    }
  }
  // Move number of trailing zeros to exponent,
  // that is: 0.0001 = 1e-4
  // but only if there is no integer part
  if (k > 0 && parts0.length == 0) {
    parts1 = parts1.substring(k);
    exponent_part -= k;
  }
  // Catch a zero here
  if (parts0.length == 0 && parts1.length - k == 0) {
    parts0 = '0';
    exponent_part = 1;
  }
  return [base, sign, parts0, parts1, decimal_point, exponent_part];
} else {
  k = 0;
  // Strip leading zeros (not necessary, just in case)
  while (str_number.charAt(k) == '0') {
    k++;
  }
  str_number = str_number.substring(k);
  return [base, sign, str_number, '', decimal_point, exponent_part];
}

We should feed this array to a function to get a binary floating point number out of it, otherwise it would be quite a waste, would it not?

This function handles base 10 only for brevity. Other bases are nearly handled the same, some of the work needed for base 10 is not even needed for bases 2, 8, or 16.

  var exceptions;
  // Handle exceptions first
  // BTW: mixing variable declarations with code
  // code in JavaScript is considered bad style.
  // I do not know why, maybe because of the rough grained
  // variable scopes in JavaScript that might irritate
  // the beginners?

  // The number parser returns either an Array or a String.
  // The latter is a sign to handle an exception
  if (typeof arrayFromParseNumber === 'string') {
    // check if it is +-Infinity or +-NaN
    // using JavaScript regular expression this time for
    // legibility
    if (arrayFromParseNumber.search(/[+-]{0,1}Infinity/) >= 0) {
      exceptions = (new Bigfloat()).setInf();
      exceptions.sign = arrayFromParseNumber.match(/{+-}/) [0];
      exceptions.sign = (exceptions.sign == '-') ? MP_NEG : MP_ZPOS;
      exceptions.mantissa.sign = exceptions.sign;
      return exceptions.sign;
    } else if (arrayFromParseNumber.search(/[+-]{0,1}NaN/) >= 0) {
      exceptions = (new Bigfloat()).setNaN();
      exceptions.sign = arrayFromParseNumber.match(/[+-]/) [0];
      exceptions.sign = (exceptions.sign == '-') ? MP_NEG : MP_ZPOS;
      exceptions.mantissa.sign = exceptions.sign;
      return exceptions.sign;
    } else {
      // raise some error and...
      return (new Bigfloat()).setNaN();
    }
  }
  // Get the content of the function argument
  var base = arrayFromParseNumber[0];
  var sign = arrayFromParseNumber[1];
  var int_part = arrayFromParseNumber[2];
  var fract_part = arrayFromParseNumber[3];
  var decimal_point = arrayFromParseNumber[4];
  var exponent = arrayFromParseNumber[5];
  // The base
  var ten = new Bigint(10);
  var k = 0;
  var slen = int_part.length + fract_part.length;
  var numerator,  denominator;
  var num_len,  den_len;
  var s;
  var fraction;
  var the_number = new Bigfloat();
  // log_2(10), the relation between base 2 and base 10 
  var log210 = parseFloat('3.321928094887362347870319429489390175865');

Zero is a special number.

// If we got send a zero we are out fast
if (int_part == "0" && fract_part == "" && exponent == 1) {
    the_number.mantissa.sign = (sign == "-") ? MP_NEG : MP_ZPOS;
    the_number.sign = the_number.mantissa.sign;
    return the_number;
}

We can treat integers and fractions differently.

// E.g. "123e12" but not "1.23e12" despite being an integer, too
// The latter gets caught later
if (decimal_point < 0  && exponent >= 0) {
    return handleInteger(exponent);
}

Branches for the individual bases. Only base 10 handled here.

  if (base == 10) {
    // Handles base 10, who would have thought.
    // Scale exponent to be able to treat other integers as integers
    // 12.34e4 = 1234e2     -> integer
    // 12.34e2 = 1234e0     -> integer
    // 12.34e1 = 1234e-1    -> fraction
    // 12.34e-10 = 1234e-12 -> fraction
    exponent -= fract_part.length;
    if (exponent >= 0) {
      return handleInteger(exponent);
    }
    // Convert number string to binary (Bigint), that
    // will be the numerator
    // Using the string concatenation of Javascript
    // for legibility
    numerator = (int_part + fract_part).toBigint();
    // Build denominator as 10 to the power of the decimal
    // places of the number.
    // Actually, we could save some money with
    //    5^x * 2^x = 10^x
    denominator = ten.pow(Math.abs(exponent));
    // Scale it. Both values are already in binary, we can
    // just take the difference of the bit length.
    num_len = numerator.highBit();
    den_len = denominator.highBit();
    // Compute the scale factor s = #(N_r) - #(D_r)
    s = (num_len - den_len);
    // Multiply the numerator by 2^(precision - s) where
    // "precision"" is the working precision of the Bigfloat
    numerator.lShiftInplace(the_number.precision - s);
    // Built the Bigfloat's fraction part and round it;
    // rounding mode is "half to nearest"
    fraction = roundFraction(numerator, denominator, s);
    exponent = - (the_number.precision - fraction[1]);
    // build the Bigfloat and return it.
    the_number.mantissa = fraction[0];
    the_number.mantissa.sign = (sign == '-') ? MP_NEG : MP_ZPOS;
    the_number.sign = the_number.mantissa.sign;
    the_number.exponent = exponent;
    return the_number;
  } else {
    // handle bases 2, 8 and 16
    // most of the base-10 code can be reused here
  }

The function roundFraction(numerator, denominator, s) does the real work here but lets take a look at the integer handling first.

  // I have put all into one function with sub- and sub-sub-functions
  // It is possible in JavaScript and sometimes even needed to handle
  // the scopes.
  // Just before you wonder where all the variables come from.
  var handleInteger = function (expo) {
    var exp;
    if (arguments.length > 0) {
      exp = expo;
    } else {
      exp = exponent;
    }
    // This is redundant code but it is easier to handle
    // it this way altough it is redundant code.
    numerator = (int_part + fract_part).toBigint();
    numerator = numerator.mul(ten.pow(expo));
    // The denominator is one and it needs to be scaled instead
    // of the numerator
    denominator = new Bigint(1);
    // The scale factor s is just the bitlength of the numerator
    // here
    s = numerator.highBit();
    // Any integers smaller than the work-precsion can be
    // taken at face value, all others need rounding
    if (s <= the_number.precision) {
      numerator.lShiftInplace(the_number.precision - s);
      exponent = - (the_number.precision - s);
      the_number.mantissa = numerator;
      the_number.mantissa.sign = (sign == '-') ? MP_NEG :
      MP_ZPOS;
      the_number.sign = the_number.mantissa.sign;
      the_number.exponent = exponent;
      return the_number;
    } else {
      denominator.lShiftInplace(s - the_number.precision);
      fraction = roundFraction(numerator, denominator, s);
      // we can build the exponent from expoent and scale
      // factor directly
      exponent = - (the_number.precision - fraction[1]);
      the_number.mantissa = fraction[0];
      the_number.mantissa.sign = (sign == '-') ? MP_NEG :
      MP_ZPOS;
      the_number.sign = the_number.mantissa.sign;
      the_number.exponent = exponent;
      return the_number;
    }
  };

The above function allows for fast treatment of small integers. Where “small” is relative. The minimum precision of the Bigfloat type is 104 that means that 2^104 is the biggest “small” integer handled by this function, a number with 32 decimal digits.

As I mentioned earlier, the rounding function is the heart of it all.

  // The function takes the numerator, denominator and scale factor
  // as arguments in that order
  var roundFraction = function (num, den, S) {
    var frac, Q, R, dhalf, cmp;
    // We might need to do the next lines multiple times
    // so either loop or use a goto. JavaScript has no goto
    // so a loop it is.
    // LOOP:
    do {
      // Divide, keep quotient and remainder
      // numerator = Q * denominator + R
      frac = num.divrem(den);
      Q = frac[0];
      R = frac[1];
      // We need one half of the denominator to compare
      // the remainder against, so dhalf = denominator/2.
      // That works only if the denominator is even. There
      // are some rare cases where that is not the case
      // It's a cheap test, so just do it.
      if (den.isEven()) {
        dhalf = den.rShift(1);
      } else {
        dhalf = den.sub(R);
      }
      // Rounding. Fixed mode: "half to even"
      // if(R > dhalf) Q' = Q +1
      // if(R == dhalf && Q.isOdd()) Q' = Q +1
      cmp = R.cmp(dhalf);
      if (cmp == MP_GT) {
        Q.incr();
      }
      if (cmp == MP_EQ) {
        if (Q.isOdd()) {
          Q.incr();
        }
      }
      /*
        Other rounding modes may need information about the
        sign, too.
       */
      // Q' might got too large and is > work-precision now
      // if that is the case set numerator = numerator/2
      // and s = s + 1, then goto LOOP
      if ((Q.highBit() + 1) > the_number.precision) {
        num.rShiftInplace(1);
        S++;
        continue;
      }
      // else break out of the loop if one was used instead of
      // a goto
      break;
    } while (true);
    // return not only the quotient but the scale factor, too
    return [Q, S];
  };

The code calling the functions above

    var parsedString = parseNumber(this);
    var ret = stringToBigfloat(parsedString);
    // The number is not necessarily normalized
    // at this point, do it now.
    ret.normalize();
    return ret;

So, all nice and dandy you might say (if you are old enough to know what that term means 😉 ) but what about the other way around? How do I get a nicely (for a wide variety of “nicely”) formatted string out of a Bigfloat?
That is astonishingly easy.

Bigfloat.prototype.toString = function(numbase) {
    var ret, quot;
    var log210 = parseFloat("3.321928094887362347870319429489390175865");
    var sign = (this.sign == MP_NEG) ? "-" : "";
    // getDecimalPrecision() is simply floor(this.precision/log210)
    var decprec = this.getDecimalPrecision();
    var exponent = this.exponent;
    var signexpo = (this.exponent < 0) ? "-" : "";
    var decexpo;
    var ten = new Bigint(10);
    var one = new Bigint(1);

    // Checks for the special value 0e1
    if (this.isZero()) {
        return sign + "0.0E0";
    }

    // We could use the sign of the mantissa directly but that might
    // not be the correct one.
    ret = this.mantissa.abs();
    // Handle integers that are for sure integers first
    // What we are doing here is dividing the mantissa--a Bigint--by the
    // value 2^-exponent. If the exponent is already positive we can
    // multiply by 2^x which is a simple shift left.
    if (exponent >= 0) {
        // lShiftInplace() does nothing if exponent is zero, no check needed.
        ret.lShiftInplace(exponent);
        // to avoiud a too complicated formatting at the end
        if (ret.isZero()) {
            ret = "00";
        } else {
            ret = ret.toString();
        }
        // The string is alread in base 10, just count the digits
        decexpo = ret.length - 1;
        signexpo = "";
    } else {
        // Multiply by 10^(toDec(this.precision)) and divide by 2^exponent
        // The decimal point is at 10^(toDec(this.precision)).
        // Don't check for a proper integer; it can't be known if it is a
        // real one.
        // What can be done is checking the remainder in the division below--
        // if it is zero it is an integer for all intent and purposes
        decexpo = Math.floor(Math.abs(this.exponent) / log210);
        // scale
        ret = ret.mul(ten.pow(decexpo));
        // build denominator and divide
        one.lShiftInplace(Math.abs(exponent));
        ret = ret.div(one);
        // TODO: needs some rounding here, too, or some guard digits in general
        // NOTE: general guard  digits are better--in general. I think.
        // Dividing by a small integer (smaller than the size of a limb)
        // gets done in O(n) which is neglible here
        // Get the last two decimal digits
        var mod100 = ret.divremInt(100)[1];
        // get the last decimal digit for the first branch of rounding
        var mod10 = mod100 % 10;
        // round to infinity
        if (mod10 > 5) {
            ret = ret.addInt(10 - mod10);
        } else if (mod10 == 5) {
            // round half to even, so we need to check
            // the second to last digit, too
            mod100 = Math.floor(mod100 / 10) % 10;
            if (mod100 & 1 == 1) {
                ret = ret.addInt(10 - mod10);
            }
        }
        // might have been rounded to zero
        // No, wait, it cannot! Uh, it is cheap, leave it. Just
        // in case when other rounding modes get implemented.
        if (ret.isZero()) {
            ret = "00";
        } else {
            ret = ret.toString();
        }
        // rounding might have added a digit (e.g.: 0.999... <> 1.000...)
        // so calculate the decimal exponent accordingly
        if (ret.length > decprec) {
            decexpo = decprec - decexpo;
        } else {
            decexpo = decprec - decexpo - 1;
        }
        if (Math.abs(this.exponent) < this.precision) {
            signexpo = "";
        }
    }

    if (decexpo == 0) {
        signexpo = "";
    }
    // I used JavaScript String methods here, but the meaning should
    // be obvious.
    ret = sign + ret.slice(0, 1) + "." + ret.slice(1, decprec) + "e" +
        signexpo + Math.abs(decexpo).toString();
    return ret;
};

The output format is quite strict, the first digit is always between 1 (one) and 9 (nine) inclusive and all digits show up. E.g.: “0.5” -> “5.000000000000000000000000000000E-1”. But it is a simple string, all information you need are given, so format it as you like.

Advertisements

One thought on “Conversion of Binary Floating Point Numbers—Second Try

  1. Pingback: Multi-Precision Floats: Radix Conversion with Large Exponents | De Amentiae Mundi

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s