Adventures of a Programmer: Parser Writing Peril XXXVIII

Problems with endianess are normally nothing a JavaScript programmer comes closer to than the sun to the earth: not a lot of things happen for the first billion years or so but than, slowly…

…it comes with a bang.

I’m doing a lot of bit-pushing in the code of Little and a lot of that depends on the right byte-order, too.

Now, two alternatives come up and show their blood dripping fangs: checking for endianess and refusing anything else but little endian or checking for endianess and act accordingly.

Both variants need a check, so do that first and push the rest down the deadline a bit further, ignoring the painful howls of the pointy haired type with the red face standing behind the french door waving his arms.

The first idea is to do something very simple:

var byteorder;
var test = 0xaabbccdd;
if((test&0xff) == 0xdd) byteorder = "little_endian";
else if((test&0xff) == 0xaa) byteorder = "big_endian";
else if((test&0xff) == 0xbb){
  print(  "Please send a note to czurnieden@gmx.de and tell me"
         + " which browser/javascript parser you installed on"
         + " which machine (PDP-11, old VAX in comp.mode, emulator etc.) and"
         + " how... no, wait: WHY did you do that? (as if I do not know the"
         + "answer)");
  byteorder = "PDP_endian";
}
// Yes, there exist bytorders that are even more exotic than
// the PDP-11 byteorder. Please send a note to the author.
else return undefined;
return byteorder

I have an old SparcStation in the cellar with an old Netscape 4.79 running on OpenBSD to test it. Will do it when I remember where I put the password πŸ˜‰

Problem: “test” needs to act like an unsigned 32 bit integer. Does it? Always? Everywhere?

The new typed Arrays are so near to the hardware that they give out warnings that endianess can be a problem here. So how does e.g.: emscripten handles that? Can I just steal the code and be OK?
Well, no, of course not *sigh*

assert(HEAPU8[0] === 255 && HEAPU8[3] === 0, "Typed arrays 2 must be run on a little-endian system");

(Found in the preamble at line 934 but YMMV, as always)
But at least: I know how to do the test. Actually, it is the very same test as in my code at the beginning just with a more modern flavour and known to work correctly.
The check is just

var buffer = new ArrayBuffer(TOTAL_MEMORY);
/* ... */
HEAP32 = new Int32Array(buffer);
/* ... */
HEAP32[0] = 255;
assert(HEAPU8[0] === 255 && HEAPU8[3] === 0, "Typed arrays 2 must be run on a little-endian system");

The number 255 is 0xff, one byte, all bits set. The algorithm is: put 0x000000ff in and protest when it comes back as 0xff000000.
OK, even I can do that πŸ˜‰

var buffer = new ArrayBuffer(8);
var strerror = 'Bigint was written for a little-endian system only, sorry.';
var MP_ENDIANESS_TEST_32 = new Int32Array(buffer);
var MP_ENDIANESS_TEST_8  = new Uint8Array(buffer);
MP_ENDIANESS_TEST_32[0] = 0xff;
if (MP_ENDIANESS_TEST_8[3] === 0xff && MP_ENDIANESS_TEST_8[0] === 0) {
  throw {
    name: 'FatalError',
    message: strerror
  };
}

Problems:

  • needs a try/catch block to function properly
  • has to be at the very beginning, so all global variables stay
  • waste of precious memory

The first point is simple to resolve: just put the whole thing in a try/catch block. The last point, well, just set the variables to null and let the garbage collector do its hard and underpaid work. But the variables will still stay there and clutter the namespace.

There is not much fine-grained scoping in JavaScript, only block-statements have their own scope, e.g.: loops:

for(var i=0;i<10;i++){
  a[i] = 0;
}
i++; /* i = 1 or error with "use strict" */

So, can we put the whole thing in a loop and are done? Like the multiple-line macros in C?

do{
  var i = 10;
}while(false);
i++; /* i = 1 or error with "use strict" */

I might be a bit conservative (technically, not politically!) but I have the opinion that there should be no logic at the root of the program, only global variables and only if absolutely necessary.

ECMAScript offers anonymous functions, I think we should use one of these here, they look pretty.

(function () {
  try {
    var buffer = new ArrayBuffer(8);
    var strerror = 'Bigint was written for a little-endian system only, sorry.';
    var MP_ENDIANESS_TEST_32 = new Int32Array(buffer);
    var MP_ENDIANESS_TEST_8 = new Uint8Array(buffer);
    MP_ENDIANESS_TEST_32[0] = 0xff;
    if (MP_ENDIANESS_TEST_8[3] === 0xff && MP_ENDIANESS_TEST_8[0] === 0) {
      throw {
        name: 'FatalError',
        message: strerror
      };
    }
  } 
  catch (e) {
    if (typeof console.log === 'function') {
      console.log(e.message);
    }
    if (typeof alert === 'function') {
      alert(e.message);
    }
    throw {
      name: 'FatalError',
      message: strerror
    };
  }
})();

I changed the logic of the test to give the whole thing a more positive attitude.

Now the whole thing got put into an anonymous function which calls itself once and only once.
Inside is a try/catch block where we try the endianess and throw a fatal error if we find a big endian (mroe correctly: not a small endian).
We catch that error and print a nice albeit cheap excuse.

Then throw the same error again. Why? We caught the first error to print the message but the whole program would go on without hesitation because the error was caught. So we neeed to throw another, fatal error that does not get caught; at least not by our program. That way we can get out of the main program loop.

It is a neat little trick to get an exit() function in JavaScript.

You can’t get out of the JavaScript interpreter in a browser. You can close the window which ends the current instance but that’s all. You can do it in node.js:

stdlinux$ node
>  process.exit(42)
stdlinux$ echo $?
42

The process object has a lot of things available

for (i in process)console.log(i);

I don’t have the faintest idea what half of it is without reading the documentation πŸ˜‰

Now, that we have the check, why not think about making it endianess independent?
Nope, sorry, way too much work.

But I can give you the code I have already written several years ago based on the endianess check at the beginning:

Number.BYTEORDER = (function (){
  var byteorder;
  var test = 0xaabbccdd;
  if((test&0xff) == 0xdd) byteorder = "1234";
  else if((test&0xff) == 0xaa) byteorder = "4321";
  else if((test&0xff) == 0xbb){
    alert(  "Please send a note to czurnieden@gmx.de and tell me"
          + " which browser/javascript parser you installed on"
          + " which machine (PDP-11, old VAX in comp.mode, emulator etc.)and"
          + " how... no, wait: WHY did you do that? (as if I do not know the"
          + "answer)");
    byteorder = "3412";
  }
  // Yes, there exist bytorders that are even more exotic than
  // the PDP-11 byteorder. Please send a note to the author.
  else return undefined;
  return byteorder
})();

Number.prototype.swap32 = function(){
  var t = this;
  return (((t & 0x000000ff) <<  24) |
          ((t & 0x0000ff00) <<   8) |
          ((t & 0x00ff0000) >>>  8) |
          ((t & 0xff000000) >>> 24)   );
};

Number.prototype.swaphw32 = function(){
  var t = this;
  return  (((t & 0x0000ffff) <<  16) | 
           ((t & 0xffff0000) >>> 16)  ); 
};

Number.prototype.swaphb32 = function(){
  var t = this;
  return (((t & 0x00ff00ff) <<  8) |
          ((t & 0xff00ff00) >>> 8)  );
};
// cpu = pdp-endian
Number.prototype.pdp_htonl = function(){
  return this.swaphb32();
};
Number.prototype.pdp_ntohl = function(){
  return this.swaphb32();
};
Number.prototype.pdp_cpuTole32 = function(){
  return this.swaphw32();
};
Number.prototype.pdp_le32Tocpu = function(){
  return this.swaphw32();
};
Number.prototype.pdp_cpuTobe32 = function(){
  return this.swaphb32();
};
Number.prototype.pdp_be32Tocpu = function(){
  return this.swaphb32();
};

// cpu = bigendian
Number.prototype.be_htonl = function(){
  return this;
};
Number.prototype.be_ntohl = function(){
  return this;
};
Number.prototype.be_cpuTole32 = function(){
  return this.swap32();
};
Number.prototype.be_le32Tocpu = function(){
  return this.swap32();
};
Number.prototype.be_cpuTobe32 = function(){
  return this;
};
Number.prototype.be_be32Tocpu = function(){
  return this;
};
// cpu = little endian
Number.prototype.le_htonl = function(){
  return this.swap32();
};
Number.prototype.le_ntohl = function(){
  return this.swap32();
};
Number.prototype.le_cpuTole32 = function(){
  return this;
};
Number.prototype.le_le32Tocpu = function(){
  return this;
};
Number.prototype.le_cpuTobe32 = function(){
  return this.swap32();
};
Number.prototype.le_be32Tocpu = function(){
  return this.swap32();
};

Number.prototype.ntohl = function(){
  if(Number.BYTEORDER == 1234)
    return this.le_ntohl();
  if(Number.BYTEORDER == 4321)
    return this;
  if(Number.BYTEORDER == 3412)
    return this.pdp_ntohl();
  return undefined;
};

Number.prototype.htonl = function(){
  if(Number.BYTEORDER == 1234)
    return this.le_ntohl();
  if(Number.BYTEORDER == 4321)
    return this;
  if(Number.BYTEORDER == 3412)
    return this.pdp_ntohl();
  return undefined;
};

The whole stuff could be streamlined and shortened with todays ECMAScript but I don’t need it and leave it where it is.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s