Multi-Precision Floats: Radix Conversion

[UPDATE]: bugfix.
The Little programming language is in need of arbitrary precision floating point numbers, too. The necessary basics at least: addition; multiplication; division; roots; logarithm; an exponential function; and the three basic trigonometric function sin, cos, and tan; not to forget the constant \pi computed to the current precision. More to come, of course, but these are the minimum for a comfortable environment.

Well, that’s all fine and dandy but how do we get the numbers in and out? Yes, that’s not that simple, indeed.

Let’s start at the beginning:

Arbitrary precision floating point numbers are floating point numbers but not of arbitrary precision. It is not possible because some numbers are representable in a finite number of characters but others are not.

(1)\displaystyle{  \frac{1}{3} = 0.\bar{3}_{10} = 0.1_3   }

That means the number of digits of the fractional part need a limit which is commonly called the precision of the floating point number and has to be set in advance. Together with all of the other necessities of computerization the definition of a floating point real number is as follows:

  • Number of digits allowed: the precision
  • Integer part
  • Fractional part
  • A sign

An optimization you won’t even recognise as one is missing: the exponent. It is a scaling factor that compresses the possibly long number of leading or trailing zeros to a single number, the count of the zeros. A negative exponent numbers leading zeros:

(2)\displaystyle{ 0.002 = 2 \times 10^{-3}   }

And a positive exponent counts trailing zeros.

(3)\displaystyle{ 2000.0 = 2 \times 10^{3}   }

We can see that the rule is to move the decimal point to the right[1] of the first non-zero digit and count the steps. This allows for moving the radix point quite freely which makes the division between an integer and a rational part not obsolete but easy to handle. We can, for example, fix the radix point and move the number around.

The conversion of floats between different radices gets done in the following way:

  1. get a number of radix r1
  2. convert it to radix r2
  3. round it to precision p

There are a couple of steps in between step one and step two, we come to them later in this post but point three is the most interesting for now.

IEEE-754 knows and defines four different rounding modes:

  • round towards nearest, the default
  • round towards zero, which is basically truncating
  • round towards negative infinity (or “down”), the floor function
  • round towards positive infinity (or “up”), the ceil function

The default function, round to nearest, has in IEEE-754 the rule “half to even” if the number is an exact half integer

(4)\displaystyle{ \mathop{\mathrm{round}}\bigg(n+\frac{1}{2} \bigg) = \begin{cases}n & \text{if } n \text{ is even } \\ n + 1 & \text{otherwise}\end{cases} }

Although this method is the most complicated, it is also the default option in IEEE-754 and as I try to follow this standard as much as possible (but not even the smallest step further!) it is the method of choice for my own implementation.

There is also one extra condition: the number x_{b_{y}}, once converted to x_{b_{z}}, but only then, must never change if converted back and forth. That is

(5)\displaystyle{ x_{b_{y}} = x_{b_{z}} }

Meaning: if you, say convert the decimal number x =  0.1_{10} to a binary number with the precision and behaviour of and IEEE-754 double you get x' = 0.10000\,00000\,00000\,00555\,1_{10} but when you take x' and feed that back to the conversion routine you must get x' back if you used the very same rounding mode: x' = f(x').

Now, how to do radix conversion?
The most used radix conversion is probably the conversion between 2 and 10, between decimal and binary numbers. We can assume that we have a method to work with arbitrary long integers (that’s what I hope, at least), do arithmetic with them and have the radix conversion already build in. Nevertheless a short reminder of radix conversion for integers:

  1. get the most significant digit of the number a1 of radix r1
  2. Add it to the number a2 that has radix r2
  3. Multiply a2 by r1
  4. rinse and repeat

That is the way how it is done in my implementation of a big integer library for now. It is very slow. It would be faster if we do it in a slightly different way by observing that we repeatedly multiply the same r1, we could do that at once or at least in larger chunks.
We also have bigint digits that are most probably much larger than the digits of radix r1. So, if we take n digits of r1 such that r_1^n < r_2 we can speed it up a little.

This works well for e.g.: converting decimal strings to the 26-bit big-digits of my big integer implementation but not vice versa. We would need a big integer with a decimal radix for that.
It is soem time ago but I have written a big integer library with a decimal base, and tried it instead of the build-in conversion routine of Tom Wu’s big integer library. It was faster, as expected. Not much but enough that I plan it to do for my radix 226 big integers, too. In the future.

Now back to floating point radix conversions.

To get integers, we need to make the floating point number x (we assume finite precision here and anywhere else) of radix r a rational number just by moving the radix point while keeping the count n such that the resulting rational has the form

(6)\displaystyle{ \frac{xr^n}{r^{n}} \text{ with }xr^n\in\mathbb{N} }

with n the smallest possible power of r such that the equation (6) holds
Example:

(7)\displaystyle{ 3.14159\,26535\,89793 = \frac{3.14159\,26535\,89793\times 10^{15}}{10^{15}} = \frac{3\,141\,592\,653\,589\,793}{10^{15}} }

We can stop here and do rational arithmetic with it to stay exacty. But rational arithmetic can result in very large integers and we wanted to know how to convert radices in the first place, so we need to go on from here[2].

One notation before we go on: let \#\left(x_r\right) be the number of digits for the representation of the numberx in radix r and let p_r>0 \in \mathbb{N} be the fixed number of digits of radix r of the target, the precision.

We need to scale the rational \frac{N}{D} (constructed as described above) to fill up the target number exactly up to the brim, so we can do the rounding. The target is filled up exactly when (working with logarithms \log_r here, hence the simple addition)

(8)\displaystyle{ p_r-1 \leq  \#(N_r) - \#(D_r) < p_r }

This sum can get negative and with

(9)\displaystyle{ s = p_r - \left( \#\left(N_r\right) - \#\left(D_r\right) \right) }

we get (r be the radix of the target)

(10)\displaystyle{  \frac{{N'}_r}{{D'}_r}   = \begin{cases} \dfrac{N_rr^s}{D_r} &\text{if } s < 0 \\[2ex]  \dfrac{N_r}{D_rr^{-s}} &\text{if } s > 0  \\[2ex]  \dfrac{N_r}{D_r} &\text{if } s = 0 \end{cases}   }

With our example…oh, wait, we need a precision and a radix! Let’s take p = 100 and r = 2. Then \#\left(N_r\right) =  52, \#\left(D_r\right) =  50, and s = p_r - \left( \#\left(N_r\right) - \#\left(D_r\right) \right) = 100 - \left(52-50\right) = 98. With r = 2 we get the scale factor s = 2^{98}. The scale factor is positive, so we need to

(11)\displaystyle{ \frac{Ns}{D} =  \frac{3141592653589793\times 2^{98}}{10^{15}}   }

Which results in a numerator of 995,610,453,248,924,265,350,259,524,281,934,194,034,081,792

Now we are prepared to do the rounding by doing an integer division with remainder.

(12)\displaystyle{\frac{N'}{D'} = Q  D' + R }

The result still doesn’t fit into the margin of this blog. Maybe I should have chosen a smaller p? It is Q = 995\,610\,453\,248\,924\,265\,350\,259\,524\,281 and R = 934\,194\,034\,081\,792.

Thusly prepared the round-to-nearest with half-to-even of the quotient Q goes

(13)\displaystyle{ \mathop{\mathrm{round}}\left(Q\right) =\begin{cases} Q&\text{if } R <  \frac{D}{2} \\ Q + 1&\text{if } R > \frac{D}{2} \\ Q + 1&\text{if } R =  \frac{D}{2} \land Q \text{ is odd} \\  Q &\text{if } R =  \frac{D}{2} \land Q \text{ is even} \end{cases}   }

If it happens that \mathop{\mathrm{round}}\left(Q\right) = Q' = p then set Q' = p-1 and s = s - 1. If you thought that we don’t need s anymore and dumped it already…well…

Half of the quotient in our example is bigger than the remainder, that means rounding down, or truncate the result, but because of the integer result of an integer division, we do not need to do anything.

The raw, but correctly rounded result F_r ({}_r for “raw” not for “radix”) is now

(14)\displaystyle{ F_r = Q'  r^{-s)} }

With our example 995\,610\,453\,248\,924\,265\,350\,259\,524\,281 \times 2^{-98}

Good enough for many but not for IEEE-754. They want the result F_r normalized to F_n, too

(15)\displaystyle{ F_n = Q'  r^{-s}\frac{p}{r} =  Q'  r^{-s} \frac{r^n}{r}  = Q'  r^{-s} r^{n-1}  }

With our example \left(Q' \times 2^{-98}\right) \times 2^{+99} makes 3.14159265358979300000 \times 2^{+99} respectively Q' \times 2^{1} because you don’t scale the mantissa (Q') only the exponent.

The sign of the original numbers has no significance for the calculations, we can add it at the end.

[1] This is correct even in scripts that write from right to left like the arabic. They took the decimal system from Indian mathematicians who wrote them left to right, although in some of the earliest documents found in India they wrote them right to left.
[2] I will do this way in Little up to a certain amount and/or configurable.

Advertisements

2 thoughts on “Multi-Precision Floats: Radix Conversion

  1. Pingback: Conversion of Binary Floating Point Numbers—Second Try | De Amentiae Mundi

  2. Pingback: Multi-Precision Floats: Radix Conversion with Large Exponents | De Amentiae Mundi

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s