# On the Value of the Initial Value in Root Finding

While playing around with Libtommath, a nearly public domain(1) library for big integers, to see if I can change (parts of) the backend of the planned calculator to use this library instead of a different one with a different license, I did not only find out that I really love to build long-winded sentences but that the implementation of the Newton-Raphson nth-root algorithm is very slow.
One of first programs to port is a nearly straight forward implementation of the quadratic sieve with only some small but necessary optimizations to make ist useful for up to about 40 decimal digit long semiprimes build from two 20 decimal digits long primes. One of the necessary helper function sis a wy to determine if the factor the sieve found is indeed prime with a high probability or composite. State of the art to do it seems to be a combination of several primality testing algorithms called Baillie-Pomerance-Selfridge-Wagstaff primality test or BPSW for short. There are no BPSW-pseudoprimes known below 10^15 so any prime fond inside that limit can be called a quasi certified prime. That means it is exact for factoring composite numbers up to 30 decimal digits. Probable primes with more than 15 decimal digits would need a second run through the sieve or an exact test.

I digress? Yepp, that’s what I do for a living.

One of the disadvantages of all of the primality tests above is the inability to stand perfection—perfect powers get them in trouble. Testing for perfect powers can be done in essentially linear time if you do not need the radicants and exponents if any. They are needed for factorizing, though, but no large amount of root-finding-runs is necessary.

• only roots with a prime exponent need to be checked because: $4^x = 2^{2x}$ and $x^4 = x^{2+2} = x^2 \cdot x^2$
• only exponents up to $\lceil\log_2 n\rceil$ need to be checked: $x^\frac{1}{n} = exp\left(\frac{\log_e x}{n} \right)$
Together with the the fact that the smallest possible integer radicant larger than 1 (one) is, well, 2 (two) the largest integer exponent $x$ such that $2^x \le n$ is $\lceil\log_2 n\rceil$ which is just the index of the highest bit starting from 1 (one). Simple to implement. Also simple to test, no need for a fancy root finding function here.

I already wrote it with the help of another big integer library (not GMP), so porting was not a large problem.

• finding the highest bit set is implemented as mp_count_bits()
• finding the lowest bit set is not implemented, had to be done
• finding the n-th root is implemented as mp_n_root() which returns the root but no remainder.
• a small sieve delivers the primes necessary. Not shown here, it is a very simple Eratosthenes sieve.

The listing without all of the later-to-be-added code-cluttering error checks. Some of the tricks used here were needed for the other big integer library, it is in dire need of a bit of clean-up, I’m afraid.
The low/high-bit functions:

static const int nibbles =
{ -1, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0 };
/* One can do it with some simple bitjuggling w/o a loop but the author
cannot remember it and is too lazy to ask Google.
PS: the ffs() function in string.h is not in C-99, so cannot be used here.
*/
int lowbit(mp_digit d)
{
int ret = 0;
int j = 0;

while ((d != 0) && (ret == 0)) {
ret = nibbles[d & 0x0f];
if (ret > 0) {
break;
}
d >>= (mp_digit) (4);
j += 4;
d &= DIGIT_BIT;
}

if (ret != 0) {
ret += j;
}

return ret;
}

int mp_lowbit(mp_int * a)
{
int r, i;
mp_digit q;

if (a->used == 0) {
return 0;
}

r = 0;

for (i = 0; i < a->used; i++) {
q = a->dp[i];
if (q == (mp_digit) (0)) {
r += DIGIT_BIT;
continue;
} else {
r += lowbit(q);
break;
}
}
return r;
}

int mp_highbit(mp_int * a)
{
return mp_count_bits(a);
}


And the isperfpower() code (mp_issquare has been changed to return the actual root if any. Code not given here, see comment in code for the reason, but if you insist: just ask for it):

int mp_isperfpower(mp_int * z, mp_int * rootout, mp_int * exponent)
{
mp_int root, power,tempr;
bitset_t *bst;
unsigned long p, max, highbit, lowbit;
int ret, r;
p = 2;
/* Place some sanity checks here */
/* No, really! Do it! */

/* ceil(log_2(n)) */
max = (unsigned long) mp_highbit(z);
highbit = max;
lowbit = (unsigned long) mp_lowbit(z);
/* Is it N=2^p ?*/
if (highbit == lowbit) {
mp_set_int(exponent, (mp_digit) highbit);
mp_set_int(rootout, (mp_digit) (2));
return true;
}
/* mp_issquare(n) is not much faster*/
r = mp_issquare(z, &ret, rootout);
if (ret != 0) {
mp_set_int(exponent, (mp_digit) (2));
return true;
}

mp_init_multi(&root, &power, &tempr, NULL);

/* find perfection in higher powers */
bst = malloc(sizeof(bitset_t));
bitset_alloc(bst, max);
eratosthenes(bst);

while ((p = bitset_nextset(bst, p + 1)) < max) {
mp_n_root(z, p, &root);
mp_copy(&root, &tempr);
mp_expt_d(&root, p, &power);
if (mp_cmp(z, &power) == MP_EQ) {
mp_copy(&tempr, rootout);
mp_set_int(exponent, (mp_digit) (p));
mp_clear_multi(&tempr, &root, &power);
bitset_free(bst);
return true;
}
}
mp_set_int(rootout, (mp_digit) (0));
mp_set_int(exponent, (mp_digit) (0));
mp_clear_multi(&root, &power);
bitset_free(bst);

return false;
}


Testing, aaaaaand: slower than a snail heading to its tax office.
As the title of this post suggests: the culprit was the initial value of the Newton-Raphson algorithm in the root finding function. It was set to 2 (two), fix. The N-R algorithm is forgiving but not that forgiving, so somebody added an error correction method. The problem is that the error is most of the time so much off the error correction is doing more or less the whole work to calculate the root in a way that is probably the slowest possible: exponentiate the found root r with the given exponent e, compare the result with the radicant and subtract one from the root if the result is too big; rinse and repeat.
The new algorithm is based on the rule ( with $\{n,x\}\in\mathbb{N^+}$) $\displaystyle{ n^\frac{1}{x} = \exp\left( \frac{\log_e n}{x} \right) }$
Changing the base to 2 (two) gives the same equation $n^\frac{1}{x} = 2^{\left( \frac{\log_2 n}{x} \right)}$
which is simpler to compute, especially with integer logarithms. Doing it with integer logarithms gives the inequation $\displaystyle{ n^\frac{1}{x} \ge 2^{\left( \frac{\log_2 n}{x} \right)} }$
which is close enough for the Newton-Raphson trick to work fast and correct.

int mp_n_root(mp_int * a, mp_digit b, mp_int * c)
{
mp_int  t1, t2, t3;
int     res, neg;
int ilog2;

/* input must be positive if b is even */
if ((b & 1) == 0 && a->sign == MP_NEG) {
return MP_VAL;
}

if ((res = mp_init (&t1)) != MP_OKAY) {
return res;
}

if ((res = mp_init (&t2)) != MP_OKAY) {
goto LBL_T1;
}

if ((res = mp_init (&t3)) != MP_OKAY) {
goto LBL_T2;
}

/* if a is negative fudge the sign but keep track */
neg     = a->sign;
a->sign = MP_ZPOS;

ilog2 = mp_highbit(a);
ilog2 = ilog2/ b;

/* CZ: TODO: if(ilog2 == 0) return 1; ???*/
if ((  res = mp_2expt(&t2,ilog2+1)) != MP_OKAY) {
goto LBL_T2;
}
/* t2 = 2 */
/*mp_set (&t2, 2);*/

do {
/* t1 = t2 */
if ((res = mp_copy (&t2, &t1)) != MP_OKAY) {
goto LBL_T3;
}

/* t2 = t1 - ((t1**b - a) / (b * t1**(b-1))) */

/* t3 = t1**(b-1) */
if ((res = mp_expt_d (&t1, b - 1, &t3)) != MP_OKAY) {
goto LBL_T3;
}

/* numerator */
/* t2 = t1**b */
if ((res = mp_mul (&t3, &t1, &t2)) != MP_OKAY) {
goto LBL_T3;
}

/* t2 = t1**b - a */
if ((res = mp_sub (&t2, a, &t2)) != MP_OKAY) {
goto LBL_T3;
}

/* denominator */
/* t3 = t1**(b-1) * b  */
if ((res = mp_mul_d (&t3, b, &t3)) != MP_OKAY) {
goto LBL_T3;
}

/* t3 = (t1**b - a)/(b * t1**(b-1)) */
if ((res = mp_div (&t2, &t3, &t3, NULL)) != MP_OKAY) {
goto LBL_T3;
}

if ((res = mp_sub (&t1, &t3, &t2)) != MP_OKAY) {
goto LBL_T3;
}
}  while (mp_cmp (&t1, &t2) != MP_EQ);
/*
printf("Value before:");
mp_print(&t1);
puts("");
*/
/* result can be off by a few so check */
/*
CZ:
The value is most probably 1 (one) digit too large, so to save one
exponentiation we subtract 1 (one) in advance.
TODO: check if that is always the case, as the author conjectures
*/
if ((res = mp_sub_d (&t1, 1, &t1)) != MP_OKAY) {
goto LBL_T3;
}
for (;;) {
if ((res = mp_expt_d (&t1, b, &t2)) != MP_OKAY) {
goto LBL_T3;
}

if (mp_cmp (&t2, a) == MP_GT) {
if ((res = mp_sub_d (&t1, 1, &t1)) != MP_OKAY) {
goto LBL_T3;
}
} else {
break;
}
}

/* reset the sign of a first */
a->sign = neg;

/* set the result */
mp_exch (&t1, c);

/* set the sign of the result */
c->sign = neg;

res = MP_OKAY;

LBL_T3:mp_clear (&t3);
LBL_T2:mp_clear (&t2);
LBL_T1:mp_clear (&t1);
return res;
}


Testing (on a slow 1Ghz AMD w. 512 MB RAM)

./nthroot 12345678910111213141516171819202212223242526272829303112345678910\
111213141516171819202212223242526272829303112345678910111213141516171819202\
212223242526272829303112345678910112345678910111213141516171819202212223242\
5262728293031123456789101 200
15
New version: 0.020000 seconds
15
Builtin    : 492.010000 seconds


Ah, yes, that seems to work better now.
It is even more impressing if I tell you that the 0.02 seconds is for 1,000 (one thousand) rounds and the 492.01 seconds is for just one.
But that is not always the case, the original version is sometimes faster. Lets take a very large number, 1015 digits if I counted correctly.

./nthroot 123456789101112131415161718192022122232425262728293031123456789101112\
1314151617181920221222324252627282930311234567891011121314151617181920221222324\
2526272829303112345678910111213141516171819202212223242526272829303112345678910\
1112131415161718192022122232425262728293031123456789101112131415161718192022122\
2324252627282930311234567891011121314151617181920221222324252627282930311234567\
8910111213141516171819202212223242526272829303123456789101112131415161718192022\
1222324252627282930311234567891011121314151617181920221222324252627282930311234\
5678910111213141516171819202212223242526272829303112345678910111213141516171819\
2022122232425262728293031123456789101112131415161718192022122232425262728293031\
1234567891011121314151617181920221222324252627282930311234567891011121314151617\
1819202212223242526272829303112345678910111213141516171819202212223242526272829\
3031123456789101112131415161718192022122232425262728293031123456789101112131415\
1617181920221222324252627282930311234567891011121314151617181920221222324252627\
2829303112345678910111213141516171819202212223242526272829303111234567891011121\
3141516171819202212223242526272829303112345678910111213141516171819202212223242\
5262728293031123456789101112131415161718192022122232425262728293031123456789101\
1121314151617181920221222324252627282930311234567891011121314151617181920221222\
3242526272829303112345678910111213141516171819202212223242526272829303112345678\
9101112131415161718192022122232425262728293031123456789101112131415161718192022\
1222324252627282930312345678910111213141516171819202212223242526272829303112345\
6789101112131415161718192022122232425262728293031123456789101112131415161718192\
0221222324252627282930311234567891011121314151617181920221222324252627282930311\
2345678910111213141516171819202212223242526272829303112345678910111213141516171\
8192022122232425262728293031123456789101112131415161718192022122232425262728293\
0311234567891011121314151617181920221222324252627282930311234567891011121314151\
6171819202212223242526272829303112345678910111213141516171819202212223242526272\
8293031123456789101112131415161718192022122232425262728293031123456789101112131\
4151617181920221222324252627282930311 20
9006908459572274742850091692732473786309400354936204993716414314994690397129088\
10814335572796526904903015291
New version: 3.380000 seconds
## didn't wait for the builtin version


Doing square roots of this number 1,000 times and compare the timing with the builtin square root function (also run 1,000 times) gives

New version: 6.590000 seconds
Builtin    : 4.580000 seconds


So no extra branch for square roots needed almost all of the time.
But I digress again.
The 7173rd root of the above number is the first one rounding to 1:

1
New version: 0.870000 seconds
1
Builtin    : 0.220000 seconds


And we have a case when the builtin function is indeed faster.
Doesn’t hold for long, though, with the 7159th root it starts to get slower again

2
New version: 0.860000 seconds
2
Builtin    : 1.000000 seconds


And with the 7158th root:

2
New version: 0.840000 seconds
2
Builtin    : 2.990000 seconds


Root #7157

2
New version: 0.840000 seconds
2
Builtin    : 8.140000 seconds


root #7156

2
New version: 0.820000 seconds
2
Builtin    : 21.740000 seconds


So the builtin function is faster if the actual root is very near 1 and gets
slower very rapidly when the actual root gets >2. The new version is more dependent on the size of the radicant but slows down, too, when the exponent gets smaller which was expected.
Root #71

2568855334376777830014137923182
New version: 9.560000 seconds


Was too impatient to wait for the builtin version and switched off the loop but even one round stressed my patience too much.

2568855334376777830014137923182
New version: 0.010000 seconds


The extra code to save one round of exponentiation in the result check was added later that is at this point but I had only time left for one test so I checked the square root function another time. Again with 1,000 rounds and the large number from above:

New version: 7.000000 seconds
Builtin    : 6.680000 seconds


Looks negligible. Thought the difference would be larger.

(1) only nearly public domain because legislation had changed, so they had to give that beast a license. The license they chose is as near to public domain as possible. Some older versions are still public domain, that didn’t change retroactively but look in the code to be sure.

# Adventures of a Programmer: Parser Writing Peril I

As menacedpromised in the last post: a description of the long path
to a well determined end.

Goal: Writing a small parser for a simple but complete language as a front-end for a for a calculator back-end. With emphasis on “complete”. A simple task one might think, a very simple one. One that could have been described in full in some book I mentioned in another post.

So in medias res, as Caesar wouldn’t have said (he spoke Greek) and of to the very beginning: The Plan. Not that I ever saw it but it is said that The Plan cometh at every beginning.
What is needed for the already briefly described…uhm…purpose, to avoid the repetition of the infamous two words, that are unbeknownst to many programmers, what does the parser need to parse? What do we need to do to enable the poor thing to do so? Continue reading

# On Technical Books and Tutorials, Another Unfair Rant

It can get quite frustrating when you want to do something you do know nothing about but are willing to learn it. Not necessarily able but willing. In the times of fast and cheap internet connections the first idea is to search for a small tutorial to get the mere basics. Just enough to know if the tool fits the task and more so: if the tool fits your hand.
Now, what I want to do is to write a different frontend to some well established libraries to juggle numbers. The existing frontends are good, that is not the problem, but they just don’t fit my hands very well. A matter of personal taste, so to say. It is just a frontend, so nothing more than a parser needs to be written and some code to connect the new front end to the backend.
Some faint memories pointed to Flex/Bison and some short glimpses at a couple of basic online tutorials assured me: it is a better choice than writing my own.
The O’Reilly book about it set me short of 30 bucks.
As good as it is but that is too much money for it. It wants too much with an insufficient number of pages. I bought a new copy of Don Knuths’s TAoCP second volume for 29.99 just two years ago because the old just, let me put it blunt, faded away. Original price is AFAIK about 70 bucks but it definitely has more than twice the amount of information in it than the parser-book.

What nags me most in the online-tutorials and also, to some extent, the tech. books is the attitude of handwaving. When it starts to get interesting and useful it is every
f-ing time something in the line of “the rest follows”, carefully camouflaged as “Exercises”. No, folks, the rest does not follow at all! And it is good for students to do let them find it out on their own and if they get stuck they can ask their teachers because that’s what they get payed for.
But a tutorial and even less so a tech. book is a textbook! No pedagogical fine-tuning is needed, just present the stuff, complete and correct, such that you can go on with the whole bunch of other things waiting in your inbox.

It is not more than a bit enervating if some well-meaning people write a webpage, high in the Google ranks but quite useless beyond the most basic needs. It’s ok.
These are mostly people who just learned that stuff and are proud that they
finally grasped it (I know that feeling very well 😉 and want to let the world participate. It’s ok!

What is not ok are books that do the same. When I pay money for a book about
a tool I want to know everything about the tool after I read the book and
not—let me repeat NOT “the rest follows”! Or even better: those dreadful exercises!
Well, to be fair for a change: exercises are not necessarily of the same quality in every book or tutorial. Some are quite good, like the ones in the aforementioned TAoCP, some are not so good and some are plain boring. The worst exercises are those that do not offer the solutions somewhere. No chance to control your work. The single exception should be those where no solutions are known (there are some in TAoCP) and you might have the chance to get rich and famous if you solve them.

So, exercises are not bad per se but more apt for a textbook for students who have the time to learn the stuff the old fashioned way. It is not ideal in a book that is the equivalent of a user manual, especially when no solutions to the exercises are given somewhere. Anywhere.

Another nagging detail is the incompleteness of the examples, they start simple, which is the way it should be, of course, but they stay so and ignore a large amount of the capabilities of the tool they try to describe. Or the examples get very special and are of very limited general use. Which is not bad in and of itself but with a very restricted amount of available pages the paper and ink used for it is wasted.

A book about a lexer and parser generator should describe the way to write a parser for a general language. One that has numbers and strings and functions and variables and loops and jumps and scopes and all that and is a good basis for your own project, a foundation you can build skyscrapers upon. You paid for the book and it was not just some small change you paid for it, so you might think you get the complete picture if a complete picture was advertised.

[Formulating the text for this paragraph is left as an exercise for the reader]

This is the end of this rant and this rant only and the start of a loose series describing my adventures in parser-building land.

Next: constructing a grammar of a language that fits my needs. Or something completely different, who knows. I don’t.

# Make an Alarmclock with the at Command and a shell

The program at is quite useful. On Unix at least—I don’t know how much of this usefulness expands to other operating systems with a program of a similar name.
The main disadvantages of at are its restrictions, build in for security reasons but not needed for every usage, especially if you know what you do. Because nobody does know what they are doing within the set of every possible consequence and their side-effects the aforementioned restrictions are hardcoded, no simple way around.

Long foreword, short problem: I wanted to avoid typing sleep 12m&& sm date every time I need something like a teatimer. It is also not very flexible if you want to start something at a certain time.
The program at on the other side can all that, is quite flexible and works even at high loads. The problem is, that sm is an X-program and getting it started by at is a bit tricky, although not that tricky.

The restrictions of at makes it quite dumb, it does not know every environment variable especially not the DISPLAY variable, which needs to be given explicitly. That alone is not sufficient because the X-server must allow programs started by other users (which at is) access to the screen. Access to the screen gives e.g.: xhost +localhost which is too general but the security< risk is low on a single user system. The correct way wold be xhost +username where usernam is the name of the user, of course, but the username under which at resides.

So let’s replace all of the problems with a little shell script:

#!/bin/sh
if [ $# -eq 0 ];then echo "Usage:$0 [in|at] time"
exit
fi
xhost +localhost
if [ $1 = "in" ];then shift echo "env DISPLAY=$DISPLAY /usr/games/sm date" | at now + $* elif [$1 = "at" ];then
shift
echo "env DISPLAY=$DISPLAY /usr/games/sm date" | at$*
else
echo "Usage: $0 [in|at] time" exit fi exit  Put it in a file and make it executable and put it somewhere in $PATH.
Or make a function out of it and put in your .bashrc.

Usage is simple. Let’s say the name of the file is ring.

ring in 5 min


will pop the output of date fullscreen five minutes after now.

ring at 20:14


will do the same as above but 14 minutes after eight in the evening. See the manpage of at for more information about its time format.

This script may not work at all, some reasons might be:

• The existence of the file /etc/at.allow which does not have your username in it
• The existence of the file /etc/at.deny which has your username in it
• The program at is not installed
• The program sm is not installed or at a different place, try whereis sm to find out
• The operating system is not Unix/Unix-based. There are ports of most Unix-tools for other operating systems, try Google for a start