Writing asm.js by Hand

There are some pieces of code in the library for Little that may not suffer from speed enhancements, namely some very basic bit-pushing snippets.

All of the ports from SunPro (ldexp, frexp, lgamma, prob. more when you read this) need access to the internal guts of the double datatype (Number in JavaScript). This gets done in C with a dirty little trick:

#include <endian.h>
#include <sys/types.h>
#if (BYTE_ORDER == __BIG_ENDIAN)
typedef union
{
  double v;
  struct
  {
    u_int32_t h;
    u_int32_t l;
  } w;
} u_d;
#else
typedef union
{
  double v;
  struct
  {
    u_int32_t l;
    u_int32_t h;
  } w;
} u_d;
#endif
#define GET_WORDS(xh,xl,x) \
do {                       \
  u_d u;                   \
  u.v = (x);               \
  (xh) = u.w.h;            \
  (xl) = u.w.l;            \
} while (0)

#define GET_HIGH(xh,x)     \
do {                       \
  u_d u;                   \
  u.v = (x);               \
  (xh) = u.w.h;            \
} while (0)

#define GET_LOW(xl,x)      \
do {                       \
  u_d u;                   \
  u.v = (x);               \
  (xl) = u.w.l;            \
} while (0)

#define SET_WORDS(x,xh,xl) \
do {                       \
  u_d u;                   \
  u.w.h = (xh);            \
  u.w.l = (xl);            \
  (x) = u.v;               \
} while (0)


#define SET_HIGH(x,xh)     \
do {                       \
  u_d u;                   \
  u.v = (x);               \
  u.w.h = (xh);            \
  (x) = u.v;               \
} while (0)


#define SET_LOW(x,xl)      \
do {                       \
  u_d u;                   \
  u.v = (x);               \
  u.w.l = (xl);            \
  (x) = u.v;               \
} while (0)

No compiler is bound to follow the intention here, but no compiler would dare not to 😉

Doing that in JavaScript is a bit more complicated and got easier only with the advent of typed Arrays. Now we can “fake” such a C union together with the macros by similarly simple JavaScript (Didn’t I publish that code before? *sigh* I’m really getting old):

var double_int = new DataView(new ArrayBuffer(8));
Number.prototype.getSignBit = function(){
    var high = 0 >>> 0;
    double_int.setFloat64(0, this);
    high = double_int.getUint32(4);
    return (high & 0x80000000) >>> 31;
};
Number.prototype.getLowWord = function(){
    double_int.setFloat64(0, this);
    return double_int.getUint32(4);
};
Number.prototype.setLowWord = function(an_uint32){
    double_int.setFloat64(0, this);
    double_int.setUint32(4,an_uint32);
    return double_int.getFloat64(0);
};
Number.prototype.getHighWord = function(){
    double_int.setFloat64(0, this);
    return double_int.getUint32(0);
};
Number.prototype.setHighWord = function(an_uint32){
    double_int.setFloat64(0, this);
    double_int.setUint32(0,an_uint32);
    return double_int.getFloat64(0);
};

The function DataView allows for the setUint32 etc. shortcuts, it needs to be done manually in ams.js. A function, or module, as they like to call it, in asm.js is closed in itself. It gets some functions, a slice of memory and returns some functions for that nice food. The main problem is that you need to know how many functions and how much memory in advance.
But one thing after the other.
The basic logic of an asm.js function is

Reserve memory
built a function with three arguments
  first argument are some global functions to be used
  second argument are some local functions to be used
  third and last argument is (a pointer to) the memory
    The string "use asm" inside the function at 
        the start (or outside but earlier)
    some handlers for the memory
    some variables (beware: var a=0; is verboten, var a; is allowed)
    a long row of functions

    An object is returned with all of the public functions.

Global functions are for example the Math functions (e.g.: Math.sin()).
Local functions are all functions that are not global.
A very simple example:

var heap = new ArrayBuffer(8);
var intstuff = (function (stdlib,env,heap){
 "use asm";
  var HEAPU32 = new stdlib.Uint32Array(heap);
  var HEAPD64 = new stdlib.Float64Array(heap);
  function getSignBit(x){
    var high = 0 >>> 0;
    HEAPD64[0] = x;
    high = HEAPU32[1] >>> 0;
    return (high & 0x80000000) >>> 31;
  }
  return {
    getSignBit: getSignBit
  };
})(
/* stdlib */ {
               "Uint32Array": Uint32Array,
               "Float64Array": Float64Array
             },
/* env    */ {},
/* heap   */ heap
);

Before we can do anything we need to allocate some memory. The smallest amount is one byte, a double has eight of these (yes, that is not a given, but for now…) so we allocate these eight.
The arguments to the functions consists only of the two functions to handle typed Arrays, no local functions (would have the very same syntax) and a pointer to the memory already allocated.
We use both for the given the memory, so we can use it for one double or two 32 bit unsigned integers and because both occupy the very same memory we can do our stuff as described above.
Now lets test for speed with the following code:

//"use strict"
var heap = new ArrayBuffer(8);
var intstuff = (function (stdlib,env,heap){
 "use asm";
  var HEAPU32 = new stdlib.Uint32Array(heap);
  var HEAPD64 = new stdlib.Float64Array(heap);
  function getSignBit(x){
    var high;
    HEAPD64[0] = x;
    high = HEAPU32[1];
    return (high & 0x80000000) >>> 31;
  }
  function getLowWord(x){
    HEAPD64[0] = x;
    return HEAPU32[0]  >>> 0;
  }
  function setLowWord(x,an_uint32){
    HEAPD64[0] = x;
    HEAPU32[0] = an_uint32 >>> 0;
    return HEAPD64[0];
  }
  function getHighWord(x){
    HEAPD64[0] = x;
    return HEAPU32[1] >>> 0;
  }
  function setHighWord(x, an_uint32){
    HEAPD64[0] = x;
    HEAPU32[1] = an_uint32 >>> 0;
    return HEAPD64[0];
  }
  return {
    getSignBit: getSignBit,
    getLowWord: getLowWord,
    setLowWord: setLowWord,
    getHighWord: getHighWord,
    setHighWord: setHighWord
  };
})(
/* stdlib */ {
               "Uint32Array": Uint32Array,
               "Float64Array": Float64Array
             },
/* env    */ {},
/* heap   */ heap
);

var double_int = new DataView(new ArrayBuffer(8));


Number.prototype.getSignBit = function(){
    var high = 0 >>> 0;
    double_int.setFloat64(0, this);
    high = double_int.getUint32(4);
    return (high & 0x80000000) >>> 31;
};
Number.prototype.getLowWord = function(){
    double_int.setFloat64(0, this);
    return double_int.getUint32(4);
};
Number.prototype.setLowWord = function(an_uint32){
    double_int.setFloat64(0, this);
    double_int.setUint32(4,an_uint32);
    return double_int.getFloat64(0);
};
Number.prototype.getHighWord = function(){
    double_int.setFloat64(0, this);
    return double_int.getUint32(0);
};
Number.prototype.setHighWord = function(an_uint32){
    double_int.setFloat64(0, this);
    double_int.setUint32(0,an_uint32);
    return double_int.getFloat64(0);
};

var a = 1; // or any other number
var b = a;
var c = a;
var d = a;
var e = a;

var sbit,glow,glow2,ghigh,ghigh2,bb,cc;
var x = 0x12345678,y = 0x87654321,bbasm,ccasm,f,g;
var isbit,iglow,ighigh;

var start = performance.now();
for(var i = 0;i< 10000;i++){
sbit = a.getSignBit();
  glow = a.getLowWord();
  ghigh = a.getHighWord();
  bb = b.setLowWord(x);
  cc = c.setHighWord(y);
  glow2 = bb.getLowWord();
  ghigh2 = bb.getHighWord();
}
var stop = performance.now();
var _native = stop-start;

start = performance.now();
for(var j = 0;j< 10000;j++){
  isbit = intstuff.getSignBit(a);
  iglow = intstuff.getLowWord(a);
  ighigh =  intstuff.getHighWord(a);
  bbasm  = intstuff.setLowWord(d,x);
  ccasm = intstuff.setHighWord(e,y)
  f = intstuff.getLowWord(bbasm);
  g = intstuff.getHighWord(bbasm);
}
stop = performance.now();
var asm = stop -start;
// For testing the results for correctness
/*
console.log(
 "a = " + a + "\n" +
 "sbit = " + sbit.toString(16) +"\nsbit = " + isbit.toString(16)+ "\n" +
 "glow = " + glow.toString(16) +"\nglow = " + iglow.toString(16)+ "\n" +
 "ghigh = " + ghigh.toString(16) +"\nghigh = " +ighigh.toString(16)+ "\n" +
 "bb = " + bb +"\nbb = " + bbasm+ "\n"+
 "cc = " + cc +"\ncc = " + ccasm+ "\n"+
 "glow2 = " + glow2.toString(16) +"\nglow2 = " + f.toString(16)+ "\n" +
 "ghigh2 = " + ghigh2.toString(16) +"\nghigh2 = " + g.toString(16)+ "\n"
);
*/
console.log("\nnative = "+ _native + "\nasm =    " + asm + "\n")

For the ability to test for correctness, here is teh same in C:

#include <stdio.h>
#include <stdlib.h>
#include <endian.h>
#include <sys/types.h>
#if (BYTE_ORDER == __BIG_ENDIAN)

typedef union
{
  double v;
  struct
  {
    u_int32_t h;
    u_int32_t l;
  } w;
} u_d;

#else

typedef union
{
  double v;
  struct
  {
    u_int32_t l;
    u_int32_t h;
  } w;
} u_d;
#endif

#define GET_WORDS(xh,xl,x) \
do {                       \
  u_d u;                   \
  u.v = (x);               \
  (xh) = u.w.h;            \
  (xl) = u.w.l;            \
} while (0)

#define GET_HIGH(xh,x)     \
do {                       \
  u_d u;                   \
  u.v = (x);               \
  (xh) = u.w.h;            \
} while (0)

#define GET_LOW(xl,x)      \
do {                       \
  u_d u;                   \
  u.v = (x);               \
  (xl) = u.w.l;            \
} while (0)

#define SET_WORDS(x,xh,xl) \
do {                       \
  u_d u;                   \
  u.w.h = (xh);            \
  u.w.l = (xl);            \
  (x) = u.v;               \
} while (0)


#define SET_HIGH(x,xh)     \
do {                       \
  u_d u;                   \
  u.v = (x);               \
  u.w.h = (xh);            \
  (x) = u.v;               \
} while (0)


#define SET_LOW(x,xl)      \
do {                       \
  u_d u;                   \
  u.v = (x);               \
  u.w.l = (xl);            \
  (x) = u.v;               \
} while (0)


int main (int argc , char **argv){
  unsigned long high,low;
  unsigned long x = 0x12345678;
  unsigned long y = 0x87654321;
  /*double a = -12345.12345;
  double bb = -12345.12345;
  double cc = -12345.12345;*/
  double a;
  double bb;
  double cc;

  char *eptr;

  unsigned sbit;

  if(argc < 2){
    fprintf(stderr, "Usage: %s double\n", argv[0]);
    exit(EXIT_FAILURE);
  }

  a = strtod(argv[1],&eptr);
  /* if(errno == ERANGE ... */
  bb = a;cc = a;

  GET_WORDS(high,low,a);

  printf("%f\n",a);
  sbit = (unsigned)(high & 0x80000000)>>31;
  printf("sbit = %d\n",sbit);
  printf("glow = %08lx\n",low);
  printf("ghigh = %08lx\n",high);

  SET_LOW(bb,x);
  printf("bb = %f\n",bb);
  SET_HIGH(cc,y);
  printf("cc = %f\n",bb);

  GET_WORDS(high,low,bb);
  printf("glow2 = %08lx\n",low);
  printf("ghigh2 = %08lx\n",high);

  exit(EXIT_SUCCESS);
}

Compile with gcc -W -Wall -pedantic -std=c99 -o get_double_parts get_double_parts.c (current standard is C11, I know, but I let standards ripe before use, and I usually let them ripe for a long, long time.)
You might set the number of loops up a little, I have an old and rusty computer, even you cell has more power.
The results for this box are

native = 846.2487160000019
asm =    588.8807439999655

Really faster? Who would’ve thought!
Oh, forgot to switch on “use strict”. Again:

native = 428.11447999998927
asm =    527.9323789998889

Oooohkaaaaaay. Interesting, to say the least. No, I tried it several times (Firefox’ scratchpad caches, you have to change something in between the runs, a space is enough), of course.

Is that the treat, the reward, the carrot to persuade their Dear Coder to write better code? 😉

But serious, the speed gain is low in contrast to native code and quasi non-existent in contrast to “strict” native code, at least in this case. It has one advantage, though, it is threadsafe. That is, if you give every webworker their own heap to work with.

No, I don’t think I’ll use asm.js for writing directly. It is quite nice to have for the random piece of C/C++-code you need to port to JavaScript yesterday and I think I’ll use some for the output of the parser, but otherwise I am a bit disappointed.
It is now possible to play Quake in the browser. Yeah. Great.
It is now possible to use QT in the browser. OK, that is something really useful!

One problem is left and can’t be argued away: the result is only barely more readable than the average assembler dump. You cannot read, safe change it easily. That is against the principles of Open Source if the “real” sources are not available.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s