Adventures of a Programmer: Parser Writing Peril XXXIII

Some Helpers

While I was sampling all the bits and pieces needed to make the whole thing not only function but useful I stumbled over some code I once wrote that I didn’t understand anymore at the first glance—where the first reaction was something in the line of WTF or worse.

One of the many examples of the above reactions was with this little gem:

Math.EPSILON = (function (){
  var tmp1    = 1.0;
  var tmp2    = 0.0;
  var EPSILON = 0.0;
  do{
    EPSILON = tmp1;
    tmp1  /= 2;
    tmp2   = 1.0 + tmp1;
  }while (tmp2 > 1.0);
  return EPSILON;
})();

The epsilon of the double precision data type used in ECMAScript as a default has a known epsilon: 5e-324, this little gem returns 2.220446049250313e-16. Why? Well it is the smallest representable value necessary to detect inequality. A well known and also well hated problem in floating point computing. Please take a look at…oh, it got an update! In 2012! Oh, my…but you need the link, I presume? It is a bit specialized for C/C++ but is is very well portable. The principle is the same everywhere.

The problem in short (I’ll use Bruce Dawson’s example here):
It is well known from the trigonometry hours in middle-school (you remember? the really long ones?) that \sin\left(\pi\right) = 0. Your ECMAScript engine has most probably a different opinion. Try it:

console.log(Math.sin(Math.PI));

Will print 1.2246063538223773e-16 (according to the standard, your implementation might be, how shall I put it politely of a different brand.) which is close but still without a smoke. Comparing that result to zero gives a wrong answer. That is the place where the calculated epsilon comes at help.

a = 0.0;
b = a + Math.in(Math.PI);
console.log("a = " + a + "  --  b = " + b);
console.log("a == b  " + (a == b) );
console.log(Math.abs(a - b) <= Math.EPSILON);

But it has one big cavet, if you have read Bruce Dawson’s articel (do it, it is really good!): this epsilon is only valid for comparing to zero! Another idea of him is to compare the integer values. This needs some kind of frexp(3).

It is now possible in ECMAScript to translate unsigned char * byte_string = (unsigned char *) & double_value; into JavaScript (that’s the union trick written out) and without further ado here is the translation of the good old SunPro frexp():

/*
 * ====================================================
 * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
 *
 * Developed at SunPro, a Sun Microsystems, Inc. business.
 * Permission to use, copy, modify, and distribute this
 * software is freely granted, provided that this notice
 * is preserved.
 * ====================================================
 */

Over twenty years already? Time’s running, eh?

function frexp(dnum)
{
    "use asm";
    var high = 0>>>0, tmp = 0>>>0, low = 0>>>0;
    var exp = 0|0;
    var num, cdnum;
    if (dnum === 0) {
	return dnum;
    }
    cdnum = dnum;
    /*  unsigned char * byte_string = (unsigned char *) & double_value; */
    num = new DataView(new ArrayBuffer(8));
    num.setFloat64(0, cdnum);
    /*
      typedef union
      {
         double cdnum;
         struct
         {
            u_int32_t high;
            u_int32_t low;
         } halves;
      } num;
    */
    high = num.getUint32(0);
    low  = num.getUint32(4);
    /* exponent = (high & 0x7ff00000) >> 20 */
    tmp = 0x7fffffff & high;
    /* errors (0, inf, nan) see IEEE-754 */
    if (tmp >= 0x7ff00000 || ((tmp | low) === 0)) {
	return cdnum;
    }
    /* See IEEE-754 for explanation of subnormal but
       the problem is easily resolved by up-sizing */
    if (tmp < 0x00100000) {	/* subnormal */
        /* One of the disadvantages of the Arraybuffer */
        var temp = num.getFloat64(0);
	temp *= 18014398509481984.0;	// 2^54
	num.setFloat64(0, temp);

	high = num.getUint32(4);
	tmp = high & 0x7fffffff;
        /* down-size to normal */
	exp = -54;
    }
    /* extract exponent and de-bias it */
    exp += (tmp >>> 20) - 1022;
    /* normalize mantissa  0.5 <= |dnum| < 1.0 */
    high = (high & 0x800fffff) | 0x3fe00000;
    num.setUint32(0, high);
    return [num.getFloat64(0), exp];
}

I did the asm.js by hand, so I don’t know if it works as advertised but it is quite fast.
I think I’ll dig out my copy of IEEE-754 and loose some words about the special terms used in the comments above if Wikipedia hasn’t it. But not today.

The details of the actual algorithm are explained elsewhere. Yes there is no link, I was not able to find some explanation online. A bit too much off theme for now, but it is interessting enough to do it in one of the future posts when discussing the Bignumber implementation. These two functions come handy for casts between Number and Big* to and fro.

But I can explain some details of how I did my port:
The "use asm" line should trigger the use of some special optimizations by the engine. It is said that node.js and Firefox (don’t know since when) support it quite well, as does Chrome. I don’t know much about the others.
The curious variable declarations belong to the asm.js optimization triggers. The 0>>>0 marks an unsigned integer and the 0|0 a signed one. It is not strictly binding, if I understand it correctly, just a proposal.

The trick that does the unsigned char * byte_string = (unsigned char *) & double_value; from C is the use of the new ArrayBuffer. We reserve one with a length of eight bytes and attach a DataView to it. The DataView allows us to manipulate the ArrayBuffer in a lot of ways albeit with some restrictions.
We put a number in it. All native numbers in ECMAScript are of the 64-bit floating point type, so eight bytes are exactly right.
We take two numbers out of it. These numbers are two unsigned integers, just as the function name suggests, and are the high word (with the most significant bit) and the low word (with the least significant bit). See also the C-sketches in the comments.

One of the disadvantages is the impossibility to manipulate the data in-place (not impossible but quite tedious and error prone because you have to do it all by hand) so we have to read the data, work with it and put it back in. Doesn’t make it faster but it is still simpler then doing all of the things manually.
The rest should be obvious. Hopefully.

The counterpart ldexp(3) works the same, it is just the inverse, you don’t even have to multiply. (ok, in case of a subnormal you have to multiply with 2^(-54) at the end, but a subnormal is not regular and does not come often).

/*
 * ====================================================
 * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
 *
 * Developed at SunPro, a Sun Microsystems, Inc. business.
 * Permission to use, copy, modify, and distribute this
 * software is freely granted, provided that this notice
 * is preserved.
 * ====================================================
 */
function ldexp( /*double */ mantissa, /*int */ exponent)
{
    "use asm";
    var k = 0 >>> 0, hx = 0 >>> 0, lx = 0 >>> 0, sign = 0 | 0;
    var num, cdnum;

    cdnum = mantissa;

    num = new DataView(new ArrayBuffer(8));
    num.setFloat64(0, cdnum);

    hx = num.getUint32(0);
    lx = num.getUint32(4);

    /* Get the exponent of the mantissa */
    k = (hx & 0x7ff00000) >>> 20;
    /* Check for zero and a subnormal mantissa  */
    if (k === 0) {
	/* Mantissa is plus or minus zero */
	if ((lx | (hx & 0x7fffffff)) === 0) {
	    return cdnum;
	}
	/* normalize */
	/* One of the disadvantages of the Arraybuffer */
	var temp = num.getFloat64(0);
	temp *= 18014398509481984.0;	// 2^54
	num.setFloat64(0, temp);

	hx = num.getUint32(0);
	k = ((hx & 0x7ff00000) >>> 20) - 54;
    }
    /* Check for IEEE errors */
    /* NaN or Inf */
    if (k === 0x7ff) {
	return (2 * num.getFloat64(0));
    }
    k = k + exponent;
    /* Overflow */
    sign = (mantissa < 0) ? -1 : 1;
    if (k > 0x7fe) {
	return 1.0e+300 * sign;
    }
    if (exponent < -50000) {
	return 1.0e-300 * sign;
    }
    /* normal, set exponent, build double and return */
    if (k > 0) {
	hx = (hx & 0x800fffff) | (k << 20);
	num.setUint32(0, hx);
	return num.getFloat64(0);
    }
    /* Rest of errorhandling */
    if (k <= -54) {
	if (exponent > 50000) {
	    return 1.0e+300 * sign;
	}
	return 1.0e-300 * sign;
    }
    /* The case of the subnormal result */
    k += 54;
    hx = (hx & 0x800fffff) | (k << 20);
    num.setUint32(0, high);
    /*   2^(-54)  */
    return (num.getFloat64(0) * 5.55111512312578270212e-17);
}

The algorithm is very simple but hidden behind all of the IEEE-754 exceptions and error codes: take the exponent of the mantissa add the exponent, put the result back into the mantissa, and return the mantissa. Calling the first argument of ldexp() mantissa is a misnomer, it is just a simple number. I’ll think about renaming it. But I won’t, I know me 😉

Note: I did the simple, non bit-pushing, test to find the sign of the number because the bit-juggling algorithm would be slower.

function sign(/* double */x)
{
    "use asm";
    var high = 0 >>> 0;
    var num = new DataView(new ArrayBuffer(8));
    num.setFloat64(0, x);
    high = num.getUint32(0);
    return (high & 0x80000000) ? -1 : 1;
}

On the other hand: if speed is not most urgent goal but legibility is (e.g.: for teaching), some abstractions might be useful:

Number.prototype.getSignBit = function(){
    "use asm";
    var high = 0 >>> 0;
    var num = new DataView(new ArrayBuffer(8));
    num.setFloat64(0, x);
    high = num.getUint32(4);
    return (high & 0x80000000);
};
Number.prototype.getLowWord = function(){
    var num = new DataView(new ArrayBuffer(8));
    num.setFloat64(0, this);
    return num.getUint32(4);
};
Number.prototype.setLowWord = function(an_uint32){
    var num = new DataView(new ArrayBuffer(8));
    num.setFloat64(0, this);
    num.setUint32(0,an_uint32);
    this = num.getFloat64(0);
};
Number.prototype.getHighWord = function(){
    var num = new DataView(new ArrayBuffer(8));
    num.setFloat64(0, this);
    return num.getUint32(4);
};
Number.prototype.setHighWord = function(an_uint32){
    var num = new DataView(new ArrayBuffer(8));
    num.setFloat64(0, this);
    num.setUint32(4,an_uint32);
    this = num.getFloat64(0);
};
Number.prototype.getExponent = function(){
    return frexp(this)[1];
};
Number.prototype.setExponent = function(a_number){
    this = ldexp(a_number, frexp(this)[1]);
};
Number.prototype.getMantissa = function(){
    return frexp(this)[0];
};
Number.prototype.setMantissa = function(an_uint32){
    this = ldexp(frexp(this)[0],an_uint32);
};

(Code is “written-only” and has no checks at all, please test before use.)
Only the last four would be useful for our Little language.

I opened the directory libs over at Little and will put all of the stuff from here to there if appropriate.

Next post: dunno.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s