Adventures of a Programmer: Parser Writing Peril XVIII

[UPDATE]
The regular expression nagged me quite a bit until I got rid of it. updated this post to mirror my reflections, pardon, the changes (added at the end).

Last post I talked about using Tom Wu’s bigint library JSBN and the need to add some faster algorithms especially for multiplication. I just found out that I already did it 😉 (needs some testing, of course, I haven’t even checked if it works at all).
But serious:
One of the thing needed for the front-end operator overloading is the backend operator overloading, because there is no way to do a useful operator overloading beside some fiddling with valueOf in ECMA-script.

We have a lot of different kind of numbers, all of them are able to do some kind of arithmetic; almost all can do additions and multiplications. There are exceptions, e.g.: it is possible to add strings by concatenating but how do you multiply two of them? What is “foo” times “bar”?
Let’s assume that we have the ECMA Number and our Complex data type. It is an ideal example because all functions on complex numbers work for reals, too but not vice versa. See for example the differential calculus: the function f(x) = |x| is differentiable on the real line with the exclusion of the origin but it is not differentiable on the complex plane (described in more details at Mathworld). What? Me and digressing? Can’t be!
But “normal” arithmetic works well (sort of). Less ado more code? Ok.
Lets take addition and multiplication:

function Complex(){this.Re = 0;this.Im = 0;}
Complex.prototype.add = function(num){
  var a = this.Re, b = this.Im;
  var c, d;
  var ret = new Complex();
  /*
     If the type of the argument is a Complex
     go on. If the type of the argument is 
     Number do a Number->Complex transformation
     And so on.
   */
  c = num.Re, d = num.Im;

  // this + num = 
  // (this.Re + num.Re), (this.Im + num.Im)

  return ret;
};

(The implementation of the algorithm itself has been intentionally left out.)
Well, that is the way to add a complex number to a complex/real number. Now what if we want to add a real number to a complex/real number? Exchange the operands? That would be possible at the parser stage or later but it is quite hard earlier. It is much easier to add a function to the ECMA object Number:

Number.prototype.add = function(num){
  /*
     If num is Number go on
     If num is Complex do a this->Complex
     And so on
   */
    // this + num = this + num
};

(You can do it in more ways than this way, I just prefer prototyping and couldn’t even pinpoint it to some precise reason).
Let’s go from top to bottom. The first problem is to decide what the type the argument given is. To find out if it is of type Number the typeof operator will do. But if you ask the typeof operator what type a Complex is it will return “object”; as it will for Array, which is a more common problem. The solution for the latter is in most cases something like

var result = obj instanceof Array;

Or something more elaborated like

var toString = Object.prototype.toString;
var result = toString.call(obj).match("Array");

This would cause a long chain of if-elses in our case, because we want to include bignumbers, matrices, quaternions, and whatever-I-have-forgotten, too. This long chain is what I have now in the original implementation. It works but it is ugly. Ugliness is no rational reason to change something that works but since when does this kind of rationality hinder a programmer from changing something ugly into something more elegant? Where elegance lies in the eye of the beholder, of course 😉
At first some test-objects to play with:

function someobject(){this.a = 0;}
function someobjecttwo(){this.b = 0;}
var a = new someobject();
var b = new someobjecttwo();
var c = [1,2,3]; 
var d = 123.32e-5;
var e = "A string";
var f = new String("Another string");
var g = {foo:"bar",dead:"beef"};

Some of the objects can be checked with the typeof operator because they return something different from “object”, we should check that first. Some other things, like e.g.: Array, can be checked with the toString.call() construct. Only the rest needs some extra massage with the instanceof operator.
The format for the names of the objects is [object Name] according to ECMA 5.1 15.2.4.2 but this format is in the standard right on from the beginning, as far as I know.
With all of that stuff we can write the following little function:

function xtypeof(obj){
  // try it the traditional way
  var tmp = typeof obj;
  if( tmp !== "object"){
    return tmp;
  }
  else{
    // try the toString prototype
    var toString = Object.prototype.toString;
    tmp = toString.call(obj);
    if(tmp !== "[object Object]"){
      /* Regular expressions are quite expensive
      return tmp.replace(/(\[object )([A-Z][a-z]+)(\])/,"$2")
                .toLowerCase(); 
      */
      /* UPDATE */
      return tmp.slice(8, -1).toLowerCase();
    }
    else{
      // see comment in the article
      var list = {
                  "someobject":someobject,
                  "someobjecttwo":someobjecttwo
                 };
      for(var p in list){
        if(obj instanceof list[p]){
          return p;
        }
      }
      return "object";
    }
  }
}

Test it with

result += "a = " + (typeof a) + ", " + xtypeof(a) + ", " + (toString.call(a)) + "\n";
result += "b = " + (typeof b) + ", " + xtypeof(b) + ", " + (toString.call(b)) + "\n";
result += "c = " + (typeof c) + ", " + xtypeof(c) + ", " + (toString.call(c)) + "\n";
result += "d = " + (typeof d) + ", " + xtypeof(d) + ", " + (toString.call(d)) + "\n";
result += "e = " + (typeof e) + ", " + xtypeof(e) + ", " + (toString.call(e)) + "\n";
result += "f = " + (typeof f) + ", " + xtypeof(f) + ", " + (toString.call(f)) + "\n";
result += "g = " + (typeof g) + ", " + xtypeof(g) + ", " + (toString.call(g)) + "\n";
if(console.log){
  // node.js
  console.log(result);
}
else{
  // browser
  alert(result);
}

The last branch loops over a list and uses the instanceof operator on each. The, admittedly quiet cheap trick her is to use a key-value list instead of an array and use the key as the return value. This function should work in any browser. I said: it should, I don’t have the faintest idea if it actually does.
You might have detected one curious thing if you run the code: the typeof operator returns two different things for a String. That is correct. I hope.

A neat little function, indeed, but slow. No, really: slow as hell! [UPDATE] not that slow anymore[UPDATE] The overhead is negotiable if we handle large numbers with several hundred digits of precision or more but for normal, day-to-day work it is just too slow.
The regular expression is expensive, but if we don’t do it there we would have to add all of the common objects into the list, too.
Looping over the list is slow, too. Any alternatives?
An alternative to the above function would be to analyse the probability of the individual objects to show up, sort them and do a if-else chain in that order.
Nah, nope, waaay to much work for such a little effect, if there is a effect at all and has no outliers.
The objects do have another thing: a constructor, also in from day one.

result += "a = " + (typeof a) + ", " + xtypeof(a) + ", " + (toString.call(a))", " + a.constructor +  "\n";
result += "b = " + (typeof b) + ", " + xtypeof(b) + ", " + (toString.call(b))", " + b.constructor +  "\n";
result += "c = " + (typeof c) + ", " + xtypeof(c) + ", " + (toString.call(c))", " + c.constructor +  "\n";
result += "d = " + (typeof d) + ", " + xtypeof(d) + ", " + (toString.call(d))", " + d.constructor +  "\n";
result += "e = " + (typeof e) + ", " + xtypeof(e) + ", " + (toString.call(e))", " + e.constructor +  "\n";
result += "f = " + (typeof f) + ", " + xtypeof(f) + ", " + (toString.call(f))", " + f.constructor +  "\n";
result += "g = " + (typeof g) + ", " + xtypeof(g) + ", " + (toString.call(g))", " + g.constructor +  "\n";
if(console.log){
  // node.js
  console.log(result);
}
else{
  // browser
  alert(result);
}

But this gives different results at different machines. In node.js it gives a short, easily parsable answer. To show just two:

> a.constructor
[Function: someobject]
> c.constructor
[Function: Array]

In Firefox v. 32 it shows the source (which is correct, by the way):

// assuming a kind of shell for simplicity
$ a.constructor
function someobject(){this.a = 0;}
$ c.constructor
function Array() {
    [native code]
}

But the Function object has a name property:

result += "a = " + a.constructor.name +  "\n";
result += "b = " + b.constructor.name + "\n";
result += "c = " + c.constructor.name + "\n";
result += "d = " + d.constructor.name + "\n";
result += "e = " + e.constructor.name + "\n";
result += "f = " + f.constructor.name + "\n";
result += "g = " + g.constructor.name + "\n";
if(console.log){
  // node.js
  console.log(result);
}
else{
  // browser
  alert(result);
}

Yep, that’s nice and usable. The question that comes up now: is it portable, that is, is it only in the newest version of the ECMA-standard or is it old? Well, bad news: it is so new, it isn’t even in the standard but in the draft for ECMA-6. It is implemented in Firefox ver. 32 and node.js ver. 0.10.30, but I didn’t check anything else but the compatibility table at the end of the MDN page of Function.name put a big red “Not supported” in the column for the Internet Explorer.
Do I care? Yes.
Do I have to? No, it is easily changed with a single line of perl if necessary:

 echo "foo bar fuzz breed.constructor.name bar foo"| perl -pe 's/([A-Za-z_]+[A-Za-z0-9]*)(\.constructor\.name)/xtypeof\(\1\)/g'

Assumes pure ASCII of course and no spaces/newlines in between breed.constructor.name.

So the example from the beginning would look like

Complex.prototype.add = function(num){
  var a = this.Re, b = this.Im;
  var c, d cn;
  var ret = new Complex();
  /*
     If the type of the argument is a Complex
     go on. If the type of the argument is 
     Number do a Number->Complex transformation
     And so on.
   */
  cn = num.constructor.name;
  if(cn === "Number"){
    c = num, d = 0;
    ret.Re = this.Re + num.Re;
    ret.Im = this.Im + num.Im;
    return ret;
  }
  /*
     and so on
  */
  else{
    c = num.Re, d = num.Im;
  }
  /*
   this + num for other data types
   like e.g.: Bigint
  */
  ret.Re = this.Re.add(num.Re);
  ret.Im = this.Im.add(num.Im);

  return ret;
};

It is also possible to replace the if-else chain with a switch, especially if all objects start with a different letter. This condition would mean to rename things like Bigint, Bigfloat, and Bignumber to something like Floatbig, Intbig, and Bignumber (Ha! 😉 ).

Yes, that’s the way I spend my free time and I’m not even ashamed about it!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s