Adventures of a Programmer: Parser Writing Peril XXXX

The final[1] grammar together with the lexer. The whole file with a lot of comments is on Github. It is currently in one file for convenience but needs to be split into a lexer (Flex) and a parser (Bison) later.

The file has a lot of comments already, not much to say here, but nevertheless…

The lexer is relatively straight forward. No complicated start conditions (they can get nested very deeply if you do not watch them very carefully) or anything else.
With one exception already explained in the comments: the numbers you want to parse need to be unsigned. The sign gets added later by an unary function, so it gets not parsed as (-123e-32) (nodes are enclosed in parentheses) but as (-)(123e-32).

The construction of the number-literal is overly complicated, too but I might want to add other bases later (binary and hexadecimal at least) which are simply added now without an error-prone rewrite.

The JISON lexer is slightly different from Flex but it offers the option %options flex (matches the longest match instead of JISON’s first-rule match. But, as a comment in the source of angular-dragdrop suggests: “The safest thing to do is have more important rules before less important rules, which is why . is last”). There are more differences, described in JISON’s “documentation”

The strings are very simple. No difference between single and double quotes and only the basic escape characters. The high complexity of the construction has the same reason as for the numbers: I want to be able to easily make additions (e.g.: Unicode escapes) later.

The whole language is a stripped down version of ECMAScript 5.1 with some differences:

  1. Variable declaration has the keyword let instead of var
  2. Function declaration has the keyword define instead of function
  3. It has the ability to include files (highly restricted in the JavaScript version, of course)
  4. The scope of variables is blockwise (everything between “{” and “}” has its own scope) not functionwise
  5. A bit of syntactic sugar, e.g.: a matrix array type e.g.: let identity_3 = [1,0,0;0,1,0;0,0,1]
  6. It may get types (e.g.: int, float, matrix…) if I can be… what’s the word?
  7. It may get storage classifiers (e.g.: static, extern, global, local) if I can be…still can’t remember the word, but I had it on the tip of my tongue.
  8. I am still unsure about objects (the things with the dots, y’know?)
  9. Everything is stricter now: semicolons at the end of the statements are mandatory as are brackets around every block.
  10. Function definitions in the lowest scope only and on their own. No function definitions inside functions, for example. This has some disadvantages but not many for a highly numerically oriented language.
  11. There is an additional operator // (together with //=) for explicit integer division
  12. The power symbol is the double-asterix ** instead of ^ which is used for boolean operations (XOR)
  13. The hash character # gets used as the length operator (cardinality). e.g.: print(#123); will either print 1 (number of numbers, the default), 3 (decimal digits) or 3 (number of units) depending on the configuration; a = [1, 2, 3];print(#a); will print 3; a = [1, 2, 3; 4 , 5, 6];print(#a); will print 3 or 2,3 depending on configuration. Fo the last case the form of the matrix is relevant: a = [1, 2, 3; 4 , 5; 6, 7, 8];print(#a); will print 3,[3, 2, 3] (nested array). This exact behaviour of this operator might change, I am still unsure.

The C-version will most probably have a goto added (too hard to implement one in JavaScript, although not impossible).

Next in this series: printing the AST in a way that allows the result to run as a JavaScript program. (Only real and complex numbers to make it not too complicated for a start)

[1] Honestly: I doubt it as much as you do đŸ˜‰


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s