Adventures of a Programmer: Parser Writing Peril XXIII

Types and Modifiers

ECMAScript does not know much about types and modifiers, but Little should. There are not many, but enough to justify the small hassle the implementation costs.

Types

The names are all case sensitive, but I’m not sure about the typographical details. All lower case? CamelCase?

Default Type

The default type for variables and functions when nothing else is given is the Complex data type. This type is slower than the Real (native double precision floating point number), so some explicit types should be implemented.

Real

The Real data type, when prefixed to the definition of a variable restricts the domain of this variable to the real line and when prefixed to a function restricts the range of the function.
Examples:

Real rinvsqrt(Real x){
  if(x < 0){
    return DOMAIN_ERROR;
  }
  ...
  return invsqrt_of_x;
}
cinvsqrt(z){
  // exponentiation is defined for nearly all types
  ...
  return cinvsqrt_of_z;
}

Rational

There is no real initialization for rationals but the following might work

rational = (decimal-literal / imaginary-literal) "/"
           (decimal-literal / imaginary-literal)

But I am not very convinced it will work out very well.

Matrix

The Matrix does not restrict anything, just the opposite: a matrix with only one element is undistinguishable from a simple number. It is just to allow for some shortcuts to avoid too many checks internally.

The actual way to initialize a Matrix is quite similar to what Octave uses:

matrix         = "[" 1*matrix-row "]"
matrix-row      = matrix-col *( ";" matrix-col)
matrix-col      = matrix-element *("," matrix-element)
matrix-element = decimal-literal / imaginary-literal 

Vector

A Vector is syntactically either a Matrix with one row or with one column. It is simpler to restrict the initializing to a row vector and turn it around later. That means that there has to be something internally to keep the orientation.

vector         = "[" matrix-col "]"

Quaternion

A Quaternion can be written as an ordered set, so a vector is used here, too.

quaternion       = "[" 4matrix-element "]"

ContinuedFraction

Even simple continued fractions are too close to an irregular matrix initialization. If we restrict the initialization of matrices to regular (rectangular) matrices we can use the following:

continued-fraction       = "[" matrix-element ";" matrix-col "]"

Example:
The simple continued fraction
x = a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \cfrac{1}{a_3 + \cfrac{1}{a_4} } } }
can also be noted as
x = [a_0; a_1, a_2, a_3]
And that is what we do here.

Surds

Quadratic numbers, or Surds are numbers of the form a + b*\sqrt{d} with d \in \mathbb{Z}. It is a kind of generalization of the complex numbers and with d = -1 we indeed get the complex numbers.

Polynomial

For polynomials with a single variable, a vector is useful, for polynomials with several variables, a matrix is probably better suited (with one variable per row).

Bitset

A bitset is also a vector.

No, not the variable-length-arguments from C but a real ellipsis, the kind of typographical shorthand for “e.t.c.”.

With so many different types listed above, some of it are more related than the other, it might be a good idea to offer something more generall. Not as generall as in ECMAScript but something more comfortable. Landon Noll’s Calc has something in this direction to build custom types. The syntax is quite like the straightforward implementation of prototypes your’s truly prefers to do in JavaScript and has a long list of predefined operations, too.
Let’s reduce it to something simple, syntactically and formally and define the complex numbers in that way:

Object complex = {a,b};
complex(a,b){
  Object complex x;
  x.re = a;
  x.im = b;
  return x;
}
complex.print(a){
  print a.re " + " a.im "i";
}
complex.add(a,b){
  Object complex x;
  x.re = a.re + b.re;
  x.im = a.im + b.im;
  return x;
}

By using a predefined basic set of operators we are able to overload the operators and do e.g.: a + b no matter what types these two variables are. To enable such comfortable function we would need to make the whole overloading system completely type independent. Not that large of a problem but making it fast is.

I do not know if I am willing to implement that and stay with the ones I already have instead: Real, Complex, Matrix, Polynomial, and Vector. The rest has to be called/initialized with their respective functions.

Type Casting

Complicated but doable if we restrict ourselves to the types listed above which can be mapped in one, the other or even both ways. For example with Real and Complex we can do:

  • RealComplex iff \Im = 0
  • RealComplex
  • Real|Complex|

Modifiers

The scope of variables should be variable. The default is to keep the scope of the variables inside blocks and change explicitly, e.g.:

var_a;             scope 0 (global)  
static var_b;      scope 1 (aka. 0 in this file only)
foo(arg_x,arg_y){  scope 2 (arguments belong to function)
  var_c = 0;       scope 2
  for(var_i=0;;){  scope 3
    var_d = 1;     scope 3
    global var_e;  scope 0
    static var_f;  scope 1 (aka. 0 in this file only)
  }
}

The implementation may differ wildly from the above example but the basics stay.

unsigned

ECMAScript defaults to signed, an explicit unsigned might come up as useful.

static

Scope modifier, used as in C/C++ for global variables being global in the included file only because it is not always possible to avoid global variables without a lot of hassle.

global

Scope modifier, used as in C/C++ for global variables if ambiguity occurs. See example at the start of the modifier section.

threadsafe

We have Web Workers and we have CPUs with multiple cores as a common now. We shall make use of both.

synchronized

The same as in Java but for variables only. It seems to be possible for functions, too, but I’m not much of an expert in web workers now. Kind of a Posix semaphore but only kind of.

ABNF grammar

To produce the nasty looking hexadecimal mess use Perl:

while (<>) {
  $s = $_;
  chomp $s;
  foreach $char (split //, $s){
    printf "%%x%02x ", ord($char);
  }
  print " / ; ", $s, "\n";
}

The reason for the tag “oneliner”:

shell$: echo "Bool" | perl -e  'while (<>) {$s = $_;chomp $s;foreach $char (split //, $s){printf "%%x%02x ", ord($char);} print " / ; ", $s, "\n";}'
%x73 %x74 %x61 %x74 %x69 %x63  / ; Bool

Ok, admitted, it’s a long line đŸ˜‰

Finally, the resulting grammar:

; expression not yet defined, put in next posting
casting_expression  =  "(" type_specifier  ")" expression
; Types. May be implemented or not
type_specifier =  %x42.6f.6f.6c / ; Bool
                  %x52.65.61.6c / ; Real
                  %x43.6f.6d.70.6c.65.78 / ; Complex
                  %x4d.61.74.72.69.78 / ; Matrix
                  %x50.6f.6c.79 / ; Poly
                  %x56.65.63.74.6f.72 ; Vector

; Modifiers
modifier_static       =  %x73.74.61.74.69.63  / ; static
modifier_global       =  %x67.6c.6f.62.61.6c / ; global
; for web-workers.
modifier_thread_safe  =  %x74.68.72.65.61.64.73.61.66.65 / ; threadsafe
modifier_synchronized =  %x73.79.6e.63.68.72.6f.6e.69.7a.65.64 / ; synchronized

The lexer will handle all strings in the implementation, the parser is for logic only. The ABNF is just simpler to proof correct because everything in it has to be closed, hence no negations, set subtraction, and all that stuff. Which means, too, that it will get quite ugly for large, complicated grammars.

Next post: expressions

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s