CtoJS: Handling Pointers

The most complicated thing to port to JavaScript it the pointer-juggling while parsing a string.
You can translate it with the String object and handle it natively, that is, native to JavaScript but you can build it instead, with some ado, with the new typed Arrays.

To get an ECMAScript into such an Array we need to allocate memory with ArrayBuffer. How much? As much as the String has characters, of course but that is not as easy as it looks. ECMAScript uses two-byte Unicode so every character occupies two bytes, except when they don’t.

So if you are really sure that you have only one byte characters in the string use string.length otherwise use string.length * 2. Doubles the memory needed so be carefull if you want to pass the results over the net or store them in a local storage like localStorage.
Example:

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main(int argc, char **argv){
  char *s = "ajshrbfgtd123.234e-2numberend";
  char *string;
  char *endptr;
  double result;

  printf("string before  = \"%s\"\n",s);
  string = s;
  while(!isdigit(*(string++)));
  string--;
  printf("string after = \"%s\"\n",string);
  result = strtod(string, &endptr);
  printf("number  = \"%f\"\n",result);
  printf("rest of string  = \"%s\"\n",endptr);

  exit(EXIT_SUCCESS);
}

Is in JavaScript (one of many ways, of course and almost all with less unnecessary complications) for one byte long characters:

"use strict";
function isdigit(c) {
  return ((c - 48) < 10);
}

function buftostr(mem) {
  return String.fromCharCode.apply(null, mem);
}

var s = 'ajshrbfgtd123.234e-2numberend';
var slen = s.length;
var buffer = new ArrayBuffer(slen);
var string = new Uint8Array(buffer);

// truncate anything to one byte, just in case
for (var i = 0; i < slen; i++) {
  string[i] = s.charCodeAt(i) & 0xff;
}

var stringp = 0;
var endptr = 0;

console.log('string before  = "' + buftostr(string.subarray(stringp)) + '"');
// ignores a leading sign
while (!isdigit(string[stringp++]));
// repair leading sign. Very inelegant
if(string[stringp-1] == 0x2B || string[stringp-1] == 0x2D){
  stringp--;
}
console.log('string after  = "' + buftostr(string.subarray(stringp)) + '"');

var result = parseFloat(buftostr(string.subarray(stringp)));

console.log('number  = "' + result + '"');

// finding endptr is more difficult, parseFloat doesn't set one
// one would need to parse manually
console.log('rest of string  = "' + buftostr(string.subarray(stringp + 10)) + '"')

// But we can show how to snip a part of the string out
console.log('number as a string = "' + buftostr(string.subarray(stringp, stringp + 10)) + '"');

Here TypedArray.subarray(start[,end]) returns a new TypedArray. If you want to do it in-place you can use offsets: snippet = new TypedArray(buffer,start[,end]). The offset is kept in snippet.byteOffset, the length of the snippet in bytes is kept in snippet.byteLength.
Such a typed array can have different sizes of elements but the number of elements is kept in typedarray.length.
The exact size (in bytes) of a single element can be read from TypedArray.BYTES_PER_ELEMENT
Access to the raw buffer (read-only) is by way of the property snippet.buffer.
Juggling with the elements inside of the typed arrays is possible: TypedArray.copyWithin(destination, start[, end]) copies the elements from start to end (or the end of the typed array if omitted) at the place in the typed array starting at the index destination.
A shortcut is also available: you can place an Array (typed or not) into the typed array starting at a specific index (or zero if omitted) with TypedArray.set(array[,start]). This method overwrites any values that have been there before.

The method subarray is the same as the Array.slice method. I don’t know who was the one who decided that. A committee perhaps?

Please be aware that the whole thing is defined in the next ECMAScript standard 6 which is still a draft, although almost all current browsers and other ECMAScript engines offer basic support at least. Basic methods are the methods I intentionally restricted myself to in this post.

Lets use this knowledge to build our own parseFloat. A function that takes a typed array that got set to the beginning of a number, returns a number and sets endptr accordingly.

A IEEE-745 compliant base-10 number has a sign at the start or not; an integer part or not; a period (radix point, decimal point) or not, a fractional part or not, an exponent part or not.
The exponent part starts with the letter “E” or “e”; has a sign or not; has at least one digit.

"use_strict";
function strtod(str, strptr, endptr){
    // these are all flags and should be done with a bit-mask instead
    var decimal_point = 0;
    var sign_mantissa = 0;
    var mantissa = 0;
    var sign_exponent = 0;
    var exponent_flag = 0;
    var exponent = 0;
    var goto_out_of_for_flag = 0;
    // set endptr to start value
    endptr[0] = strptr;
    // we restrict to base 10, so no prefix
    for (var i = 0; i < (str.length - strptr); i++) {
	// get the next character
	var c = str[strptr + i];
	// to avoid using numbers only
	switch (String.fromCharCode(c)) {
	case '+':
	case '-':
	    // only two signs max
	    if (sign_exponent != 0) {
		return Number.NaN;
	    } else if (exponent_flag != 0) {
		sign_exponent = 1;
	    } else {
		sign_mantissa = 1;
	    }
	    break;
	case '.':
	    // only one decimal point allowed
	    if (decimal_point != 0) {
		return Number.NaN;
	    } else {
		decimal_point = 1;
	    }
	    break;
	case 'e':
	case 'E':
	    // only one exponent allowed
	    if (exponent_flag != 0) {
		return Number.NaN;
	    } else {
		exponent_flag = 1;
	    }
	    break;
	case '0':
	case '1':
	case '2':
	case '3':
	case '4':
	case '5':
	case '6':
	case '7':
	case '8':
	case '9':
	    if (exponent_flag != 0) {
		exponent = 1;
	    } else {
		mantissa = 1;
	    }
	    break;
	default:
            // the first non-Number character. For lack of goto set a flag
	    goto_out_of_for_flag = 1;
	    break;
	};
	if (goto_out_of_for_flag != 0) {
	    break;
	}
        // keep pace
	endptr[0]++;
    }
    // we need a mantissa or a mantissa and an exponent
    // if all is still zero after the first character, we have a problem
    if (decimal_point == 0 &&	sign_mantissa == 0 &&	mantissa == 0 &&
	sign_exponent == 0 && exponent_flag == 0 && exponent == 0) {
	return Number.NaN;
    }
    //single exponent not allowed
    else if (mantissa == 0 && exponent_flag != 0 && exponent != 0) {
	return Number.NaN;
    }
    //decimal point without digits not allowed
    else if (mantissa == 0 && decimal_point != 0) {
	return Number.NaN;
    }
    // the string can be declared clean by now. If the parsing failed
    // at any point causing the return of NaN, endptr[0] is set to the index
    // of the mishap.
    return parseFloat(buftostr(str.subarray(strptr)));
}

It is a bit of a silly example because all of that can be done in native JavaScript with not much more than a regular expression. A regular expression is quite expensive, admitted, but we doing the same work twice here because parseFloat() will do the very same, only better and faster.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s