## Number

The program Little is all about numerical mathematics and, as the expression suggests, contains numbers. That means, in all consequences that we must be able to handle numbers. The ability to parse a number is based on a mathematical rigorous description of the term *Number* and not in a “I know when I see it!” way as Justice Potter Stewart described his own ability to detect obscenities so succinctly in Jacobellis v. Ohio (1964)—that is *not* sufficient.

Such a need for an exact description of *Number* is quite common, hence a committee…oh no! Not again! A committee! But all hand-wringing is moot and of no help, we have to go through all of it, 地獄へ行.

The standard for numbers in the case of floating point numbers (how do I know that we need floating point numbers and nothing else? Don’t know, call it intuition.) is the international standard ISO/IEC/IEEE 60559:2011, mainly based, if not identical to IEEE 754-2008. There is no need to buy a copy (the ISO standard comes down to CHF 178 for the 58 pages), I have a copy of IEEE 754-2008.

To blame for the content of IEEE 754-2008 are: Robert M. Grow, Chair; Thomas Prevost, Vice Chair; Steve M. Mills, Past Chair; Judith Gorman, Secretary and 23 others not including the *non-voting IEEE-SA Standards Board liaisons* as they liked to call them. Only three women in the committee (but three of four in the *liasons*). Who would have guessed *that*?

Back to the main theme.

The format IEEE-754 defines is *a finite subset of real numbers*(3.1.1). The basic formats have a fixed bit-length (e.g.: 64 bit) which we can not use, but this inability is only theoretical, in praxi we use a fixed bit-length, too it is just presentable in a variable way—the bit-length itself is only limited by hardware. But the generall format, described in section 3.2, table 3.1 as level 3, is applicable for any base and precision not zero:

That means that there exists a signed zero (+0 and -0 are distinct), two different kind of infinities which contradicts the *finite* part of the standard (but I already mentioned that it was a work of a committee, did I not?) and two different kinds of errors: `qNaN`

which is a silent `NaN`

(may include information) and `sNaN`

(must cry for help).

Ok, not *that* bad.

In section 3.3 comes the real meat. As it seems.

We need the integer parameters

*b*= the radix*p*= the number of digits in the significand (precision)*emax*= the maximum exponent*e**emin*= the minimum exponent*e*where*emin*shall be 1 −*emax*for all formats

The full format is the well known where , the exponent and the mantissa with the sum with . The floating point numbers the standard describes is for where the mantissa is a normalized real number .

It is more useful for our program to use an integer as the mantissa. That changes the rule to with and where is the precision. The mantissa is now defined as .

The always finite precision of the numbers mean that any number smaller than has less precision than wanted and is called a *subnormal number* hence.

We are able to use arbitrary long mantissas in our program thanks to the `bigint`

code but we cannot, without good conscience use a *bigint* as an exponent^{[1]}. That reduces the size of the exponent to the maximum integer the native number allows. It is still 2^{52} in JavaScript which is a very large number, no doubt about *that* but might be too small for some calculations. I don’t have the faintest idea what that could be (approximating Graham’s number? No, that is too large, even for `bigint`

loaded exponents) but tell me freely if you have one.

So far for the mathematics, I hope I got them right.

Section 5.12.3 describes the format for an external hexadecimal sequence representing the number. The grammar (translated to ABNF without the shortcuts) looks promising.

hex-sequence = *1sign hex-indicator hex-significand decimal-exponent decimal-exponent = hex-exponent-indicator *1sign 1*digit hex-significand = (*hex-digit "." 1*hex-digit) / (1*hex-digit ".") / (1*hex-digit) hex-indicator = "0" ("x" /%x58 ) ; 0[xX] hex-exponent-indicator = "p" / %x50 ; [Pp] hex-digit = digit / hexl-owercase / hex-uppercase ; [0-9a-fA-F] hex-uppercase = %x41-46 ; [A-F] ABNF is case insensitive hex-lowercase = "a" / "b" / "c" / "d" / "e" / "f" digit = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" sign = "+" / "-"

This can be used without much ado for purely decimal sequences, too.

dec-sequence = *1sign dec-significand decimal-exponent dec-exponent = dec-exponent-indicator *1sign 1*digit dec-significand = (*digit "." 1*digit) / (1*digit ".") / (1*digit) dec-exponent-indicator = "e" / %x45 ; [Ee] digit = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" sign = "+" / "-"

To make the input of binary numbers easier a sequence can be derived from the hexadecimal sequence by exchanging the indicator. We keep the decimal exponent.

bin-sequence = *1sign bin-indicator bin-significand decimal-exponent decimal-exponent = bin-exponent-indicator *1sign 1*digit bin-significand = (*bin-digit "." 1*bin-digit) / (1*bin-digit ".") / (1*bin-digit) bin-indicator = "0" ("b" /%x42 ) ; 0[bB] bin-exponent-indicator = "e" / %x45 ; [Ee] bin-digit = "0" / "1" digit = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" sign = "+" / "-"

The standard identifier for octal numbers is a leading zero. That means that any number starting with a zero gets taken as an octal number. But what is with e.g.: 0.01e1?

On way to solve this ambiguity is to insist on at least one digit before the radix point:

oct-sequence = *1sign dec-significand decimal-exponent decimal-exponent = oct-exponent-indicator *1sign 1*digit oct-significand = (1*digit "." 1*digit) / (1*digit ".") / (1*digit) bin-indicator = "0" ; leading zero oct-exponent-indicator = "e" / %x45 ; [Ee] oct-digit = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" digit = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" sign = "+" / "-"

The largest base representable in ASCII is not the mere 64 characters of `[0-9a-zA-Z#$]`

but all visible characters. The latter list includes many characters used elsewhere and such a number would be hard to parse amidst other code. Even the simple `[0-9a-zA-Z#$]`

has the problem of how to indicate the exact base—all letters and also all other usable signs are already in use! The biggest usable base is therefore `58`

with the digits `[0-9a-xA-X]`

.

The letter “z” is pronounced like “zet” in Great Britain and “zee” in the USA and, in honour of Prof. Dr. P. Z. “Peezee” Myers, I take “zee” and call that base the zee-base^{[2].}

zee-sequence = *1sign zee-indicator zee-significand decimal-exponent decimal-exponent = zee-exponent-indicator *1sign 1*digit zee-significand = (*zee-digit "." 1*zee-digit) / (1*zee-digit ".") / (1*zee-digit) zee-indicator = "0" ("z" / %x58 ) ; 0[zZ] zee-exponent-indicator = "y" / %x59 ; [Yy] zee-digit = digit / zee-lowercase / zee-uppercase ; [0-9a-yA-Y] zee-uppercase = %x41-58 ; [A-X] ABNF is case insensitive zee-lowercase = %x61-78 ; [a-x] digit = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" sign = "+" / "-"

In the case that two more characters are safe to use (e.g.: the characters “$” and “#” from the 64er base example above) these can be used for the indicators but I refuse unwaveringly to do any of the cheap shots related to it! Especially not the one where

the numerical representation of the national dept of the USA shall start with “0$”. Nope, that’s too cheap a shot, even for me.

Some programming-languages, e.g: `Perl`

allow for spaces in between the digits as a thousands separator (which is correct according to ISO 80000-1:2009). That is no problem if the programming language is not dependant on white-space as others are, that is, if you strip the code of all whitespace (with the exception of the content of the strings, of course) it still runs without error.

Another problem can be localization, even if we restrict the whole thing to ASCII. The radix point, decimal point or how do you call it, is not necessary a period (ASCII 0x2e) but sometimes a comma (ASCII 0x2c). The thousands delimiters are sometimes periods if the radix point is a comma and vice versa.

Next in this loose series: Strings. Don’t worry, will be quite short if I ignore Unicode.

I think I won’t.

[1] You can turn it around and use a native number (a `double`

in JavaScript) as the mantissa and a `bigint`

as the exponent. Normalizing would mean here to put the exponent of the native float into the exponent. That is good for handling extremely large numbers with only small precision. But that’s what logarithms are for, so such a number is only good for pedagogical reasons.

[2] but never again.

Pingback: Adventures of a Programmer: Parser Writing Peril XXI | De Amentiae Mundi