Monthly Archives: October 2013

Things that are falling apart

Bergschaden (surface damage caused by mining)

Bergschaden (sufrace damage caused by mining)


Living in a region which saw a lot of mining over the last centuries has some advantages but also some disadvantages. One of the disadvantages is the fact that the similarities of the underground with Swiss cheese are quite astonishing. The holes close after some time but the movement caused by the closing of the underground gaps goes up and breaks houses. It is not much and repairable and the mining company pays for it but you still have all the dirt and the hassle and all the things nobody really wants.
The damage is hardly visible on the picture but you can still see at the top left of the window and the lower right a long, already plastered crack (there were more, off the picture). The walls get a rough-cast and an insulation, all done by me, so don’t expect too much work the the parser for now.
The insulation (no room for it outside, so it has to be done inside, which makes it brutally more expensive) was what it started in the first place. The kitchen has three outside walls and after insulating one (outside) we got problems with condensing water which later caused some mould to settle. Ripping off the wallpapers showed some cracks, not expected, we know where we live. One call and a day later, the cracks were inspected, okay’d and another day later repaired. Not only plastered but glued together by injecting, with high pressure a synthetic resin. That at the beginning of october and the picture is from today. What happened? The first inspection was done by a woman, well known to sit on the money but also fast to decide if the client is okay with the cheapest solution. This client was me, because the only thing I wanted was to get the cracks glued shut to be able to go on with the insulation. So it went fast and after some drying time, I just cut open the first bag of concrete when the phone rang. A colleague of the aforementioned woman wanted to look at it at the next available date. Available for him, not for me and the next date was one week later.
Ohhhhkaaay…
He finally came, looked around, and after waving some instruments at the walls and a short chat he left. I do not know exactly why he did it but I have my suspicions and I don’t think it is something nice.
One week lost.
Ohhhhkaaay…
On with the work and on with the rough cast, or so I thought.
I made the cast from some leftover concrete-mix (the only-add-water type) which was a bad idea: it was probably too old and didn’t harden.
Another week lost.
Ohhhhkaaay…
And it is not the case that this wall is the single thing I have to do, and so:
Another week lost.
And we write the 31st of October now, Halloween.
On a side-note: the final rendering for the insulation is some expensive stuff which I ordered and payed in advance and is still not available. But at least:
No week lost.
Yet.
It is unknown if we’ll have a renovated kitchen before christmas.
And how was your October?

Adventures of a Programmer: Parser Writing Peril VII

The complete program should work on any operating system it can run on. It will be written in ISO/IEE C-99 without using any of the more exotic parts of it. The main problem is the build. There will be more than a handful of source file and dependencies to do it manually, some automation is needed. Tools for automated building that run cross-platform are rare, only two are large enough to have been shortlisted: qmake, the autotool delivered with QT (Who has bought it now, still Nokia?) which runs on several operating systems including some phones. It is very simple. Too simple, sadly. I tried it with a small subproject and put it up at Github under the name libczrational for all to see and writhe with disgust. Seems as if it has to be cmake.

The library has, for now, basic exact arithmetic implemented (+,-,*,/), basic inexact arithmetic (sqrt) and two integer functions (Bernoulli and Euler numbers). The Bernoulli and Euler number calculations are meant as examples but are already fast enough for general use (uses Brent-Harvey’s algorithm which is based on tangent and secant numbers).

The documentation looks good, but that is all that it does, for now πŸ˜‰

The makefile in the packet has been generated on Debian Squeeze.

Adventures of a Programmer: Parser Writing Peril VI

Yep, you guessed right, still no grammar πŸ˜‰
But I implemented a square root for the big rationals from the last post. It is not very elegant, especially too ignorant about the exact value of the precision which it puts too high, but it works.

#ifndef MPI_LOG2
#   define MPI_LOG2 3.321928094887362347870319429489390175864831393024580612054756
#endif
#include <math.h>
static int mpq_epsilon;
void mpq_set_epsilon(int eps){
   if(eps <= 0){
     mpq_epsilon = 0;
   }
   else{
     /* eps assumed to be precision in number of decimal digits */
     mpq_epsilon = (int)((double)eps*MPI_LOG2) + (int)log2((double)eps);
   }
}
int mpq_get_epsilon(void){
   return mpq_epsilon;
}
/* sqrt() meant as a test for epsilon */
int mpq_sqrt(mp_rat *a, mp_rat *c){
  if(mpq_isneg(a)){
    return MP_VAL;
  }
  mpq_copy(a,c);
  printf("epsilon = %d\n",mpq_get_epsilon());
  /*
    This is too brutal.
    There are better ways to do it.
  */
  int eps  =  mpq_get_epsilon();
  mp_mul_2d(&c->numerator,  eps,&c->numerator);
  mp_mul_2d(&c->denominator,eps,&c->denominator);

  mp_sqrt(&c->numerator,&c->numerator);
  mp_sqrt(&c->denominator,&c->denominator);
  mpq_reduce(c);
  return MP_OKAY;
}

The angstzuschlag of log2(eps) (the logarithm base two of eps which holds the number of decimal digits of precision) is purely arbitrary.
The problem with this approach is that any perfect square will result in unnecessary inexact results when eps is not zero.
Changing the bit-shifting to multiplying with exponents of ten will get rid of that problem, lets…seeeeee…
Segmentation Fault!
Oops!
What the…
Ah, the function mpq_reduce() uses mpq_set_int() which itself uses mpq_reduce() which resulted in a nice crash. The first problem is easy to solve:

#define mp_isone(a) (((a)->used == 1 && (a)->dp[0] == 1) ? MP_YES : MP_NO)
#define MPQ_DIVISION_BY_ZERO -997
int mpq_reduce(mp_rat *a){
  mp_int gcd;
  int e;
  /* check if num is zero -> return 0 if it is*/
  if(mp_iszero(&a->numerator)){
    mp_set_int(&a->numerator,0);
    mp_set_int(&a->denominator,1);
    if( (e = mpq_set_sign(a,MP_ZPOS) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
  }
  /* check if den is 1 -> return if it is*/
  if(mp_isone(&a->denominator)){
    mpq_normalize_sign(a);
    return MP_OKAY;
  }
  /* check if den is zero -> return undefined if it is*/
  if(mp_iszero(&a->denominator)){
    mp_set_int(&a->numerator,0);
    mp_set_int(&a->denominator,0);

    return MPQ_DIVISION_BY_ZERO;
  }
  /* check if den = num -> return 1 */
  if(mpq_isone(a)){
    mpq_normalize_sign(a);
    mp_set_int(&a->numerator,1);
    mp_set_int(&a->denominator,1);

    return MP_OKAY; 
  }
  mp_init(&gcd);
  /* compute gcd(num,den) */
  if( (e = mp_gcd(&a->numerator,&a->denominator,&gcd) ) != MP_OKAY){
    return e;
  }
  /* set num to num/gcd */
  if( (e = mp_div(&a->numerator,&gcd,&a->numerator,NULL) ) != MP_OKAY){
    return e;
  }
  /* set den to den/gcd */
  if( (e = mp_div(&a->denominator,&gcd,&a->denominator,NULL) ) != MP_OKAY){
    return e;
  }
  /* not to forget: */
  mpq_normalize_sign(a);
  return MP_OKAY;
}

The second problem is not so easy to solve because it shouldn’t have segfaulted before it hit the stack’s recursion limit. I leave that for later.
Where have I been…ah, yes, the problem with perfect squares. Using decimals as the multiplier doesn’t help at all, just give it a try.

  //mp_mul_2d(&c->numerator,  eps,&c->numerator);
  //mp_mul_2d(&c->denominator,eps,&c->denominator);
  mp_int tenpow;
  mp_init(&tenpow);
  mp_set_int(&tenpow,10);
  mp_expt_d(&tenpow,eps,&tenpow);
  mp_mul(&c->numerator, &tenpow ,&c->numerator);
  mp_mul(&c->denominator,&tenpow,&c->denominator);
  ...
  mp_clear(&tenpow);

It makes no difference if you expand the fraction with a power of ten or a power of two if you expand both sides of the fraction.
The actual precision gained with this method is also too low. The number of digits n of a square root of a number x with N is about N/2 depending on the number system, exactly half with base two. So we need twice the precision to get a square root of normal precision. We are using shifts, so simply double eps.
Note to self for further optimization: subtract number of digits of min(nbits(num),nbits(den)) from eps.

  /* check if eps<<1 is still smaller than the datatype used for eps!*/
  mp_mul_2d(&c->numerator,  eps<<1,&c->numerator);
  mp_mul_2d(&c->denominator,eps<<1,&c->denominator);

And a funny thing happens: it gives the exact result now.
Why?
Well, I do have a printf() giving me the actual value. I set epsilon to 50 which gives a result of 171 here. Which is odd, because doubling it is even curioser. Ok, ok, I’ll stop here πŸ˜‰
The reason is that the number of digits of the square root of a number in base two is exactly half of the number of digits of the radicand. If the number of digits of the radicand is odd it cannot work it out exactly.
A hypothesis easily tested πŸ˜‰
Two fractions given: 529/1849 ((23/43)^2), 3481/11664 ((59/108)^2), 1764/1849 ((42/43)^2), and 44521/46656 ((211/216)^2). Uh, no, that’s illegible. Let’s make a table.
With eps = 50

a/b (a/b)^2 odd eps even eps
23/43 529/1849 1258…143/235…312 23/43
59/108 3481/11664 322…847/590…109 59/108
42/43 1764/1849 229…653/235…312 42/43
211/216 44521/46656 115…927/118…219 211/216

Well, that’s what most scientist call a “failed chance to publish”. Because not many journals accept experiments with a negative outcome. That causes many already failed experiments to get duplicated because the experimentalist had no chance to know that it doesn’t work in the first playe. That heap of a male cow’s excrements wastes a lot of money and hours.
So, if you have such a failed experiment you can publish it at e.g.: arxive.org. But check if you will not get punished for it. Yes, that might happen!
But I digress.
So it is the even multiplier that counts?
Why, you ask?
Yes, that’s an interesting question.

int mpq_sqrt(mp_rat *a, mp_rat *c){
  int eps;
  if(mpq_isneg(a)){
    return MP_VAL;
  }

  /* place shortcut for zero and one  here */
  /* place shortcut for integers here */

  mpq_copy(a,c);
  eps  =  mpq_get_epsilon();
  if((eps & 0x1)){
    eps++;
  }
  eps <<= 1;

  /* reduce size of eps r.t. to min(digits(num),digits(den)) here */

  mp_mul_2d(&c->numerator,  eps,&c->numerator);
  mp_mul_2d(&c->denominator,eps,&c->denominator);
  mp_sqrt(&c->numerator,&c->numerator);
  mp_sqrt(&c->denominator,&c->denominator);
  mpq_reduce(c);
  return MP_OKAY;
}

Adventures of a Programmer: Parser Writing Peril V

Part five and still no grammar? Oh, my…

Before I blow the dust off of libtomfloat I decided to try it with rationals first and armed myself with my good ol’ math-textbook from primary school and D. Knuth’s second volume of TAoCP.
I also try to get a bit more structure into it this time πŸ˜‰

Apropos structure: the main data structure shall hold a fraction and a fraction consists of a numerator and a denominator, as every pupil knows. What not every pupil knows (well, some do!) is that there is a sign, too. So the structure is a C struct with some aptly named inhabitants:

typedef struct  {
    int sign;
    mp_int numerator;
    mp_int denominator;
} mp_rat;

From the data type mp_int it is a reasonable assumption that I still use Libtommath as the big-number library. The name of this structure has been chosen very carefully—the ‘r’ sounds better after a ‘p’ than after a ‘f’.

The mp_int numbers need to get initialized before and freed after use, and so do the mp_rat numbers. The implementation makes use of the respective functions of libtommath to keep it simple.
Initialisation of a single mp_rat number is the simplest, we use libtommath’s mp_init_multi() directly.

int mpq_init(mp_rat *a){
    int e;
    if( (e = mp_init_multi(&a->numerator,&a->denominator,NULL) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
}

initializing several rationals needs a bit more work, and the inclusion of stdarg.h, too. But that was all of the unpaid overtime, we can take the rest more or less verbatim from libtommath. Including the comments.

#include <stdarg.h>
int mpq_init_multi(mp_rat *mp, ...){
    mp_err res = MP_OKAY;      /* Assume ok until proven otherwise */
    int n = 0;                 /* Number of ok inits */
    mp_rat* cur_arg = mp;
    va_list args;

    va_start(args, mp);        /* init args to next argument from caller */
    while (cur_arg != NULL) {
        if (mpq_init(cur_arg) != MP_OKAY) {
            /* Oops - error! Back-track and mp_clear what we already
               succeeded in init-ing, then return error.
            */
            va_list clean_args;
            
            /* end the current list */
            va_end(args);
            
            /* now start cleaning up */            
            cur_arg = mp;
            va_start(clean_args, mp);
            while (n--) {
                mpq_clear(cur_arg);
                cur_arg = va_arg(clean_args, mp_rat*);
            }
            va_end(clean_args);
            res = MP_MEM;
            break;
        }
        n++;
        cur_arg = va_arg(args, mp_rat*);
    }
    va_end(args);
    return res;                /* Assumed ok, if error flagged above. */
}

Freeing the numbers works similar. We take the liberty to not only steal code from libtommath but the names, too.

void mpq_clear(mp_rat *a){
    mp_clear_multi(&a->numerator,&a->denominator,NULL);
    a->sign = MP_ZPOS;
}

void mpq_clear_multi(mp_rat *mp, ...){
    mp_rat* next_mp = mp;
    va_list args;
    va_start(args, mp);
    while (next_mp != NULL) {
        mpq_clear(next_mp);
        next_mp = va_arg(args, mp_rat*);
    }
    va_end(args);
}

One of the most useful helper-functions is the printing function, of course, so let’s implement one.

void mpq_print(mp_rat * a){
    /* If ypu wonder what the zero does, try it without */
    printf("%c",(a->sign == MP_ZPOS)?0:'-');
    mp_fwrite(&a->numerator,10,stdout);
    putchar('/');
    mp_fwrite(&a->denominator,10,stdout);
}

A reading function is not implemented, yet, at least not directly but it is easy to build one from the following functions. Let’s go from left to right because that’s how fractions get written in most of the western alphabets: ±x/y
So, at first, there need to be some functions to handle the sign.

/* sets sign of mp_rat a to sign given in "sign", returns old sign */
int mpq_set_sign(mp_rat *a, int sign){
  int sgn;
  sgn = a->sign;
  a->sign = (sign < 0 || sign == MP_NEG)?MP_NEG:MP_ZPOS;
  a->numerator.sign = MP_ZPOS;
  a->denominator.sign = MP_ZPOS;
  return sgn;
}
/* just for symmetry: returns the sign of mp_rat a */
int mpq_get_sign(mp_rat *a){
  return a->sign;
}

We keep the numerator and the denominator strictly positive but that can can change in some of the calculations, so we need something to set the sign of mp_ratfrom the signs of the numerator and denominator.

void mpq_normalize_sign(mp_rat *a){
  int sign_n, sign_d;
  sign_n = a->numerator.sign;
  sign_d = a->denominator.sign;
  a->sign = (sign_n != sign_d)?MP_NEG:MP_ZPOS;
  a->numerator.sign = MP_ZPOS;
  a->denominator.sign = MP_ZPOS;
}

And while we are at it, we should implement negation, too:

int mpq_neg(mp_rat *a){
  int sgn;
  sgn =  mpq_get_sign(a);
  sgn = (sgn == MP_NEG)?MP_ZPOS:MP_NEG;
  a->numerator.sign = MP_ZPOS;
  a->denominator.sign = MP_ZPOS;
  return mpq_set_sign(a, sgn);
}

The next one to read is the numerator and after that the denominator. Reading the slash or whatever else denotes a fraction is not part of this library. But might be in the future, who knows?

/* sets given mp_int b as the numerator in mp_rat a */
int mpq_set_num(mp_rat *a, mp_int *b){
    int e;
    if( (e = mp_copy(b,&a->numerator) ) != MP_OKAY){
      return e;
    }
    if( (e = mpq_reduce(a) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
}

/* sets given mp_int b as the denominator in mp_rat a */
int mpq_set_den(mp_rat *a, mp_int *b){
    int e;
    if( (e = mp_copy(b,&a->denominator) ) != MP_OKAY){
      return e;
    }
    if( (e = mpq_reduce(a) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
}

And now all together because everybody just loooooves syntactic sugar!

/* sets given mp_int p,q as the numerator,denominator in mp_rat a */
int mpq_set(mp_rat *a, mp_int *p, mp_int *q){
    int e;
 
    if( (e = mp_copy(p,&a->numerator) ) != MP_OKAY){
      return e;
    }
    if( (e = mp_copy(q,&a->denominator) ) != MP_OKAY){
      return e;
    }
    if( (e = mpq_reduce(a) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
}

More sweets? Ok, Halloween is near, so here are the methods and functions to handle normal C-99 data types, unsigned long as an example:

/* sets given unsigned long b as the denominator in mp_rat a */
int mpq_set_den_int(mp_rat *a, unsigned long b){
    int e;
 
    if( (e = mp_set_int(&a->denominator,b) ) != MP_OKAY){
      return e;
    }
    if( (e = mpq_reduce(a) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
}
/* sets given unsigned longs p,q as the numerator,denominator in mp_rat a */
int mpq_set_int(mp_rat *a, unsigned long p,unsigned long q){
    int e;

    if( (e = mp_set_int(&a->numerator,p) ) != MP_OKAY){
      return e;
    }
    if( (e = mp_set_int(&a->denominator,q) ) != MP_OKAY){
      return e;
    }
    if( (e = mpq_reduce(a) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
}

You might have found out that one of the functions used mpq_reduce() which has not been handled, yet. That is correct. keeping the fractions in their reduced forms is essential to working with rationals, especially if done numerically. The numbers can get very large, very fast otherwise.

But before we come it we need two more functions to handle copying.
Deep copy:

int mpq_copy(mp_rat *a, mp_rat *b){
    int e;
    if( (e =    mp_copy(&a->numerator,&b->numerator) ) != MP_OKAY){
      return e;
    }
    if( (e =    mp_copy(&a->denominator,&b->denominator) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
}

Pointer exchange, only:

void mpq_exch(mp_rat *a, mp_rat *b ){
   mp_exch(&a->numerator,&b->numerator);
   mp_exch(&a->denominator,&b->denominator);
}

Reducing a fraction is simple but computationally expensive, so we not only need some checks and balances but some shortcuts, too. With the following equations in mind, we can write some of these short cuts.
\displaystyle{  \frac{0}{x}= 0  }

\displaystyle{  \frac{x}{x}= 1  }

But \displaystyle{  \frac{0}{0}= \text{undefined}   }  because of \displaystyle{  \frac{x}{0}= \text{undefined}  }

\displaystyle{  \frac{x}{1}= 1  }

All of the above can be used more often and are worth their own functions, we take three of them.

#define mp_isone(a) (((a)->used == 1 && (a)->dp[0] == 1) ? MP_YES : MP_NO)
int mpq_iszero(mp_rat *<span class="hiddenGrammarError" pre="returns "><span class="hiddenGrammarError" pre="returns ">a){
   if</span></span>(     mp_iszero(&a->numerator)
       &&  mp_isone(&a->denominator)){
      return MP_YES;
     }
   return MP_NO;
}
int mpq_isone(mp_rat *a){
   if(mp_cmp(&a->numerator,&a->denominator) == MP_EQ){
      return MP_YES;
     }
   return MP_NO;
}
int mpq_isinteger(mp_rat *a){
  if(mp_isone(&a->denominator)){
    return MP_YES;
  }
   return MP_NO;
}

The actual reducing is simply
\dfrac{p/\gcd\left(p,q\right)}{q/\gcd\left(p,q\right)}

#define MPQ_DIVISION_BY_ZERO -997
int mpq_reduce(mp_rat *a){
  mp_int gcd;
  int e;
  /* check if num is zero -> return 0 if it is*/
  if(mp_iszero(&a->numerator)){
    if( (e = mpq_set_int(a,0,1) ) != MP_OKAY){
      return e;
    }
    if( (e = mpq_set_sign(a,MP_ZPOS) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
  }
  /* check if den is 1 -> return if it is*/
  if(mp_isone(&a->denominator)){
    mpq_normalize_sign(a);
    return MP_OKAY;
  }
  /* check if den is zero -> return undefined if it is*/
  if(mp_iszero(&a->denominator)){
    if( (e = mpq_set_int(a,0,0) ) != MP_OKAY){
      return e;
    }
    return MPQ_DIVISION_BY_ZERO;
  }
  /* check if den = num -> return 1 */
  if(mpq_isone(a)){
    mpq_normalize_sign(a);
    if( (e = mpq_set_int(a,1,1) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY; 
  }
  mp_init(&gcd);
  /* compute gcd(num,den) */
  if( (e = mp_gcd(&a->numerator,&a->denominator,&gcd) ) != MP_OKAY){
    return e;
  }
  /* set num to num/gcd */
  if( (e = mp_div(&a->numerator,&gcd,&a->numerator,NULL) ) != MP_OKAY){
    return e;
  }
  /* set den to den/gcd */
  if( (e = mp_div(&a->denominator,&gcd,&a->denominator,NULL) ) != MP_OKAY){
    return e;
  }
  /* not to forget: */
  mpq_normalize_sign(a);
  return MP_OKAY;
}

It might not be the wish of the programmers using this library to do a full stop every time they approach a division by zero, it might be more useful sometimes (I have to admit, I couldn’t come up with a single example) to keep it and take it along the calculations so it might be a good idea (I doubt it, but who am I?) to write a test for it.

int mpq_isundefined(mp_rat *a){
   if(     mp_iszero(&a->numerator)
       &&  mp_iszero(&a->denominator)){
      return MP_YES;
     }
   return MP_NO;
}

Ok, that were the functions for the household chores. Now for the real stuff.
The basic computations one should be able to do are:

  • Addition
  • Subtraction
  • Multiplication
  • Division

Funnily, the simplest thing to do in integer calculation is the most complicated when handling fractions and vice versa.
Addition first.

int mpq_add(mp_rat *a, mp_rat *b, mp_rat *c){
  int e;
  mp_int gcd,tmp;
  if(mp_iszero(&a->numerator)){
    if( (e = mpq_copy(b,c) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
  }
  if(mp_iszero(&b->numerator)){
    if( (e = mpq_copy(a,c) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
  }
  /* let libtommath take care of the signs */
  (&a->numerator)->sign = a->sign;
  (&b->numerator)->sign = b->sign;
  if(mpq_isinteger(a) && mpq_isinteger(a) ){
    if( (e = mp_add(&a->numerator,&b->numerator,&c->numerator) ) != MP_OKAY){
      return e;
    }
    c->sign = (&c->numerator)->sign;
    (&a->numerator)->sign = MP_ZPOS; 
    (&b->numerator)->sign = MP_ZPOS; 
    (&c->numerator)->sign = MP_ZPOS;
    (&c->denominator)->sign = MP_ZPOS;
    if( (e = mp_set_int(&c->denominator,1) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;   
  }
  mp_init_multi(&gcd,&tmp,NULL);
  /*gcd = gcd(b,d)*/
  if( (e = mp_gcd(&a->denominator,&b->denominator,&gcd) ) != MP_OKAY){
    return e;
  }
  /*c->numerator = ((A*D)/gcd) + ((C*B)/gcd)*/
  /*c->numerator = ((A*D)/gcd) */
  if( (e = mp_mul(&a->numerator,&b->denominator,&tmp) ) != MP_OKAY){
    return e;
  }
  if( (e = mp_div(&tmp,&gcd,&c->numerator,NULL) ) != MP_OKAY){
    return e;
  }

  /* ((C*B)/gcd)*/
  if( (e = mp_mul(&b->numerator,&a->denominator,&tmp) ) != MP_OKAY){
    return e;
  }
  if( (e = mp_div(&tmp,&gcd,&tmp,NULL) ) != MP_OKAY){
    return e;
  }
  if( (e = mp_add(&tmp,&c->numerator,&c->numerator) ) != MP_OKAY){
    return e;
  }
  if(mp_iszero(&c->numerator)){
    if( (e = mp_set_int(&c->denominator,1) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
  }
  /* c->denominator = (max(B,D)/gcd) * (min(B,D))*/
  if( (e = mp_div(mp_max(&a->denominator, &b->denominator),&gcd,&tmp,NULL) )
                                                                   != MP_OKAY){
    return e;
  } 
  if( (e = mp_mul(&tmp, mp_min(&a->denominator, &b->denominator),
                                                &c->denominator) ) != MP_OKAY){
    return e;
  }
  c->sign = (&c->numerator)->sign;
  if( (e = mpq_reduce(c) ) != MP_OKAY){
    return e;
  }
  (&a->numerator)->sign = MP_ZPOS; 
  (&b->numerator)->sign = MP_ZPOS; 
  mp_clear_multi(&gcd,&tmp,NULL);
  return MP_OKAY;
}

You remember it now? πŸ˜‰
The two functions min() and max() respectively are additions of me to libtommath and are merged into my fork but they are short, so here they are.

/* returns a if equal, min value otherwise */
mp_int *mp_min(mp_int *a, mp_int *b){
  int r;
  r = mp_cmp(a,b);
  if(r == MP_EQ ||  r == MP_LT){
    return a;
  }
  return b;
}
/* returns a if equal, max value otherwise */
mp_int *mp_max(mp_int *a, mp_int *b){
  int r;
  r = mp_cmp(a,b);
  if(r == MP_EQ || r == MP_GT ){
    return a;
  }
  return b;
}

Subtraction, on the other side, is much simpler if we use addition instead, relying on a- b = -a +b = a + (-b) . We do not use deep copies of the arguments of the functions so we need not to forget to reverse every modifications we do to them as we have done so, although not commented, above.

int mpq_sub(mp_rat *a, mp_rat *b, mp_rat *c){
  int e;
  if(mp_iszero(&a->numerator)){
    if( (e = mpq_neg(b) ) != MP_OKAY){
      return e;
    }
    if( (e = mpq_copy(b,c) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
  }
  if(mp_iszero(&b->numerator)){
    if( (e = mpq_copy(a,c) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
  }

  if(mpq_isinteger(a) && mpq_isinteger(a) ){
    (&a->numerator)->sign = a->sign;
    (&b->numerator)->sign = b->sign;
    if( (e = mp_sub(&a->numerator,&b->numerator,&c->numerator) ) != MP_OKAY){
      return e;
    }
    c->sign = (&c->numerator)->sign;
    (&c->numerator)->sign = MP_ZPOS;
    if( (e = mp_set_int(&c->denominator,1) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;   
  }

  /* negate(b) */
  if( (e = mpq_neg(b) ) != MP_OKAY){
    return e;
  }
   /* c = a+ (-b) */
  if( (e = mpq_add(a,b,c) ) != MP_OKAY){
    return e;
  }
  /* reverse negation */
  if( (e = mpq_neg(b) ) != MP_OKAY){
    return e;
  }
  /* No reducing here, mpq_add() did it already */
  return MP_OKAY;
}

Multiplication is even simpler

int mpq_mul(mp_rat *a, mp_rat *b, mp_rat *c){
  int e;
  if(mp_iszero(&a->numerator) || mp_iszero(&b->numerator)){
    if( (e = mpq_set_int(c,0,1) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
  }
  if( (e = mp_mul(&a->numerator,&b->numerator,&c->numerator) ) != MP_OKAY){
    return e;
  }
  if( (e = mp_mul(&a->denominator,&b->denominator,&c->denominator) ) != MP_OKAY){
    return e;
  }
  if( (e = mpq_reduce(c) ) != MP_OKAY){
    return e;
  }
  c->sign = (a->sign != b->sign)?MP_NEG:MP_ZPOS;
  return MP_OKAY;
}

And the simplest is, believe it or not, division.

int mpq_div(mp_rat *a, mp_rat *b, mp_rat *c){
  int e;
  if(mp_iszero(&a->numerator)){
    if( (e = mpq_set_int(c,0,1) ) != MP_OKAY){
      return e;
    }
    return MP_OKAY;
  }
  if(mp_iszero(&b->numerator)){
    return MPQ_DIVISION_BY_ZERO;
  }
  if( (e = mpq_inverse(b) ) != MP_OKAY){
    return e;
  }
  if( (e = mpq_mul(a,b,c) ) != MP_OKAY){
    return e;
  }
  if( (e = mpq_inverse(b) ) != MP_OKAY){
    return e;
  }
  if( (e = mpq_reduce(c) ) != MP_OKAY){
    return e;
  }
  c->sign = (a->sign != b->sign)?MP_NEG:MP_ZPOS;
  return MP_OKAY;
}

What I seem to have forgotten in the Tools section (don’t look for it, there is no explicitly named Tools section) is a method to compare two rationals. If comparing two big numbers is already a computationally expensive method if none of the shortcuts hit, comparing two rationals is even more so. The only way to do it for two fully reduced fractions is by way of the lowest common denominator, although the full algorithm is not needed, only the multiplication part.

Given two rationals Q_1 = \frac{p_1}{q_1} and Q_2 = \frac{p_2}{q_2} with p_1 \nmid q_1 and p_2 \nmid q_2 with a difference in magnitude Q_1 - Q_2 = Q_m the sign of the difference {\mathop{\mathrm{sgn}}}\; Q_m can be evaluated by multiplying out P_1 = p_1q_2 and P_2 = p_2q_1 and comparing P_1 and P_2.

It is should be obvious from the above that the multiplications can get quite costly if the involved parts of the fractions are very large!

Besides of the normal shortcuts by looking at the individual signs we can also use the fact that we are only interested in the magnitude of the results and not the exact values. If the magnitudes of the intermediary results are large enough in difference we don’t need to do the expensive multiplications.
I am using the number of digits here but one can go more fine-grained with the indices of the highest bits set.

int mpq_cmp(mp_rat *a, mp_rat *b){
  mp_int p1,p2;
  int e,wnuma,wdena,wnumb,wdenb,s1,s2;

  /* a == b, actually the very same number */
  if(a == b){
    return MP_EQ;
  }
  /* a < b */
  if(a->sign == MP_NEG && b->sign == MP_ZPOS){
    return MP_LT;
  }
  /* a > b */
  if(a->sign == MP_ZPOS && b->sign == MP_NEG){
    return MP_GT;
  }
  /* some more shortcuts */
  if(mpq_iszero(a)){
    /* a == b (both zero) */
    if(mpq_iszero(b)){
      return MP_EQ;
    }
    /* a < b */
    if(b->sign == MP_ZPOS){
      return MP_LT;
    }
    /* a > b */
    return MP_GT;
  }
  if(mpq_iszero(b)){
    /* a > b */
    if(a->sign == MP_ZPOS){
      return MP_GT;
    }
    /* a < b */
    return MP_LT;
  }

  if(mpq_isinteger(a) && mpq_isinteger(b)){
    return mp_cmp(&a->numerator,&b->numerator);
  }

  /*
     We can use a rough approximation of the result because we are only
     interested in the magnitudes.
   */
  wnuma = (&a->numerator)->used;
  wdena = (&a->denominator)->used;
  wnumb = (&b->numerator)->used;
  wdenb = (&b->denominator)->used;
  s1 = wnuma + wdenb;
  s2 = wnumb + wdena;
  if(s1 < s2 - 1){
    return ( (a->sign == MP_NEG)?MP_ZPOS: MP_NEG);
  }
  if(s2 < s1 - 1){
    return ( (a->sign == MP_NEG)?MP_NEG: MP_ZPOS);
  }
  /* This is gonna be brutal, so don't use cmp() frivolously! */
  mp_init_multi(&p1,&p2,NULL);
  if( (e = mp_mul(&a->numerator,&b->denominator,&p1)) != MP_OKAY){
    return e;
  }
  if( (e = mp_mul(&b->numerator,&a->denominator,&p2)) != MP_OKAY){
    mp_clear(&p2);
    return e;
  }

  e = mp_cmp(&p1,&p2);
  mp_clear_multi(&p1,&p2,NULL);
  if(a->sign == MP_NEG){
    return ( (e == MP_ZPOS)?MP_NEG: MP_ZPOS );
  }
  return e;
}

To test what we did above and if I didn’t any of my infamous C&P errors:

int main (int argc, char **argv){

    mp_int num1,den1,num2,den2;
    mp_rat q1,q2,q3,q4;

    if (argc < 5) {
	fprintf(stderr, "usage: %s integer integer integer integer \n", argv[0]);
	exit(EXIT_FAILURE);
    }
  
    mp_init_multi(&num1,&den1,&num2,&den2,NULL);

    mp_get_str(argv[1], &num1, 10);
    mp_get_str(argv[2], &den1, 10);
    mp_get_str(argv[3], &num2, 10);
    mp_get_str(argv[4], &den2, 10);

    printf("Numerator   1: ");mp_print(&num1);puts("");
    printf("Denominator 1: ");mp_print(&den1);puts("");
    printf("Numerator   2: ");mp_print(&num2);puts("");
    printf("Denominator 2: ");mp_print(&den2);puts("");   

    mpq_init_multi(&q1,&q2,&q3,&q4,NULL);puts("000");

    mpq_set(&q1,&num1,&den1);puts("111");
    printf("Rational1: ");mpq_print(&q1);puts("");
    mpq_set(&q2,&num2,&den2);puts("222");
    printf("Rational2: ");mpq_print(&q2);puts("");

    mpq_add(&q1,&q2,&q3);;
    printf("R1 + R2 = ");mpq_print(&q3);puts("");
    mpq_sub(&q1,&q2,&q3);
    printf("R1 - R2 = ");mpq_print(&q3);puts("");
    mpq_mul(&q1,&q2,&q3);
    printf("R1 * R2 = ");mpq_print(&q3);puts("");
    mpq_div(&q1,&q2,&q3);
    printf("R1 / R2 = ");mpq_print(&q3);puts("");
    printf("cmp(R1, R2) = %d\n",mpq_cmp(&q1,&q2));
    printf("cmp(R2, R1) = %d\n",mpq_cmp(&q2,&q1));
    printf("cmp(R1, R1) = %d\n",mpq_cmp(&q1,&q1));
    exit(EXIT_SUCCESS);
}

The function mp_get_str() is a simple wrapper:

int mp_get_str(const char *str, mp_int * a, int radix)
{
    return mp_read_radix(a, str, radix);
}

Seems to work here. Makes me wonder the most πŸ˜‰
Some of the more interesting things like e.g.: exponentiation are left as an an exercise…no, just kidding, they are work in progress and not ready yet.

Notes to self: GIT

Mnemonics are called mnemonics because they are hard to memorize.

The following is more or less just a reminder for myself. A good online reference is Git Reference for the basics.

To add one file (assuming /bin/sh and common GIT naming):

git add a_file.c
git commit -m "Added a_file.c with the miraculous function one_file()"

To add more than one file, commit them all and give all of them the same comment. I used this to commit the three files necessary for the most basic build.

git add tommath.h tommath_class.h makefile
git commit -m "Addition of a_file.c with function one_file()"

Push all of them to remote GIT repository

git fetch upstream
git push -v  origin master

the line git fetch upstream checks the original for intermediate changes that might ruin your daymake what you did obsoleteintersperse with what you have done. If something happened (not in libtommath, last change is a long time ago) you need to do git merge upstream/master where master is the main branch of upstream which is the remote original one. The full command merges that branch with your local branch. Uhm…you can find out the aliases (they are not fixed, the above are just the commonly used ones!) with git remote -v . Here are my results:

czurnieden@home:~/GITHUB/libtommath$ git remote -v
origin	https://github.com/czurnieden/libtommath.git (fetch)
origin	https://github.com/czurnieden/libtommath.git (push)
upstream	https://github.com/libtom/libtommath.git (fetch)
upstream	https://github.com/libtom/libtommath.git (push)

To find out what happened in the meantime before fetching

git log upstream/master ^master

If unsure which branch you are hacking right now

git branch

Say:”Ooops!” and change to branch testing which you have made with git branch testing

git checkout testing

Check if you haven’t forgoten any file to commit:

git diff-tree --no-commit-id --name-only HEAD~20..HEAD

The number 20 is the maximum number git checks, so if you commited more, increment that number.

On the numerical evaluation of factorials IX

Subfactorial

The family of the factorials is larger than the mere number of permutations. On of the many offsprings is the subfactorial or derangement.
The normal implementation, using only integers can be done with a recursion or, to save some memory and leave the stack where it is, in an iterative way.
The recursion is quite complicated with libtommath (more probable is the reason that I am just too stupid to get it right πŸ˜‰ )

static mp_int kill_the_stack;
static int subfactorial_error = MP_OKAY; 
static mp_int * __subfactorialrecursive(unsigned long n){
  mp_digit sign;
  int e;
  if(n==0){
    if ( ( e =  mp_set_int(&kill_the_stack,1) ) != MP_OKAY){
      subfactorial_error = e;
    }
    return &kill_the_stack;
  }
  if(n==1){
    if ( ( e =  mp_set_int(&kill_the_stack,0) ) != MP_OKAY){
      subfactorial_error = e;
    }
    return &kill_the_stack;
  }
  if(n==2){
    if ( ( e =  mp_set_int(&kill_the_stack,1) ) != MP_OKAY){
      subfactorial_error = e;
    }
    return &kill_the_stack;
  }
  if ( ( e =  mp_mul_d(__subfactorialrecursive(n-1),n,&kill_the_stack) ) != MP_OKAY){
      subfactorial_error = e;
  }
  sign = (n&01)?-1:1;
  if ( ( e =  mp_add_d(&kill_the_stack,sign,&kill_the_stack) ) != MP_OKAY){
      subfactorial_error = e;
  }
  return &kill_the_stack;
}
static int subfactorial_recursive(unsigned long n, mp_int *result){
  mp_init(&kill_the_stack);
  subfactorial_error = MP_OKAY;
  __subfactorialrecursive(n);
  mp_copy(&kill_the_stack,result);
  mp_clear(&kill_the_stack);
  return subfactorial_error;
}

I left all o the error checks in this time to show one of the problems with recursion: making clear that an error occured. The deep copy in subfactorial_recursive() is needed to be able to clear the globali-ish variable kill_the_stack. Keeping would cause a small but significant memory-leak.

The iterative version is more…uhm…straight forward:

static int subfactorial_iterative(unsigned long n, mp_int *result){
  mp_int temp1, temp2;
  mp_digit k;
  int e;

  if(n==0){
    return mp_set_int(result,1);
  }
  if(n==1){
    return mp_set_int(result,0);
  }
  if(n==2){
    return mp_set_int(result,1);
  }
  if ( ( e = mp_init_multi(&temp1, &temp2,NULL) ) != MP_OKAY){
    return e;
  }
  if ( ( e = mp_set_int(&temp1, 0) ) != MP_OKAY){
    mp_clear_multi(&temp1, &temp2,NULL);
    return e;
  }
  if ( ( e = mp_set_int(result,1) ) != MP_OKAY){
    mp_clear_multi(&temp1, &temp2,NULL);
    return e;
  }
  for(k=3;k<=n;k++){
    mp_exch ( &temp1,&temp2 );
    mp_exch ( result, &temp1);
   if ( ( e = mp_add(&temp1, &temp2,result))  != MP_OKAY){
     mp_clear_multi(&temp1, &temp2,NULL);
     return e;
   }
   if ( ( e = mp_mul_d(result,(k-1),result))  != MP_OKAY){
     mp_clear_multi(&temp1, &temp2,NULL);
     return e;
   }     
  }
  mp_clear_multi(&temp1, &temp2,NULL);
  return MP_OKAY;
}

The use of libtommath’s mp_set_int() needs a restriction to the size of n. The type allowed is an unsigned long; I restrict further to the actual bit-size of mp_digit which is probably way too conservative, especially when those digits are smaller than the default of 28.
The subfactorial of 2^28 is about 1.7862e2,146,019,442, that’s a number with over two billion decimal digits. One needs a different method to calculate it exactly. This one is good up to about 100,000 which already lasts a minute with this old computer I have here, more with the iterative version.

#ifndef MP_DIGIT_SIZE
#define MP_DIGIT_SIZE (1<<DIGIT_BIT)
#endif
int mp_subfactorial(unsigned long n, mp_int *result){
  if(n > MP_DIGIT_SIZE ){
    return MP_RANGE;
  }
#ifdef SAVE_SOME_MEMORY
   return subfactorial_iterative(n, result);
#else
  return  subfactorial_recursive(n, result);
#endif
}

The code above has been merged to my fork of libtommath over at github.

On the numerical evaluation of factorials VIII

The code for the FFT multiplication is working now and had been uploaded to my fork of libtommath. It is still limited to the 28 bit digits, The SchΓΆnberg-Strassen algorithm should be able to let us get rid of that ancient limit bt it is still in the works.
BTW: for some reasons unknown to me, not all commits of the original libtommath are in my fork of it. One of those commits fixed a bug that prevented GCC from compiling libtommath, I put it in there manually wich is not the very best method to do it, but as mentioned above: I still don’t know what I made wrong when I forked libtommath.

The cut-offs and limits may need some adjusting for other processors and architectures then my poor little 1GHz AMD Duron.

Now that FFT multiplication is working we can get back to the factorial. The last state was about 12 minutes for the factorial of one million if I remember it correctly. That is definitely way too much even for such an old processor like the one in my box. To avoid linking to the last post…oops, never mind. However, the last code was:

int factorial(unsigned long n, mp_int *result){
    unsigned long *p_list;
    unsigned long *exp_list;
    unsigned long p, i,cut;
    long bit;
    int shift;
    mp_int temp;

    unsigned long  r=0;
    p_list = fill_prime_list(3, n+1, &r);
    exp_list = malloc(sizeof(unsigned long)* (r+1));
    if(exp_list == NULL){ return MP_MEM;}

    mp_set_int(result,1);
    shift = (int)prime_divisors(n,2LU);
    cut = n / 2 ;
    for(p = 0; p < r; p++) {
        if(p_list[p] > cut) break;
        exp_list[p] = prime_divisors(n,p_list[p]);
    }
    bit = highbit(exp_list[0]);
    for( ; bit>=0; bit--) {
        mp_sqr(result, result);
        for(i=0; i<p; i++) {
            if ((exp_list[i] & (1 << bit)) != 0) {
                mp_mul_d(result,(mp_digit)p_list[i], result);
            }
        }
    }
    for(; p < r; p++){
       mp_mul_d(result,(mp_digit)p_list[p], result);
    }
    mp_mul_2d (result, shift, result);
    return MP_OKAY;
}

Anything open for optimization? What’s about the last loop over the primes with an exponent of 1 (one)? The loop adds all of the last primes to the result. What’s about summing them in a temporary variable and multiply the result with the temporary variable instead?

#define MP_FACTORIAL_BORW_PRIMORIAL_CUTOFF 200000
int factorial(unsigned long n, mp_int *result){
    unsigned long *p_list;
    unsigned long *exp_list;
    unsigned long p, i,cut;
    long bit;
    int shift;
    mp_int temp;

    unsigned long  r=0;
    p_list = fill_prime_list(3, n+1, &r);
    exp_list = malloc(sizeof(unsigned long)* (r+1));
    if(exp_list == NULL){ return MP_MEM;}

    mp_set_int(result,1);

    shift = (int)prime_divisors(n,2LU);

    cut = n / 2 ;

    for(p = 0; p < r; p++) {
        if(p_list[p] > cut) break;
        exp_list[p] = prime_divisors(n,p_list[p]);
    }
    mp_init(&temp);
    bit = (int)highbit(exp_list[0]);
    for( ; bit>=0; bit--) {
        mp_sqr(result, result);
        for(i=0; i<p; i++) {
            if ((exp_list[i] & (1 << bit)) != 0) {
                mp_mul_d(result,(mp_digit)p_list[i], result);
            }
        }
    }
    if(n<MP_FACTORIAL_BORW_PRIMORIAL_CUTOFF){
      for(; p < r; p++){
         mp_mul_d(result,(mp_digit)p_list[p], result);
      }
    }
    else {
      mp_init(&temp);
      mp_set_int(&temp,1);
      for(; p < r; p++){
         mp_mul_d(&temp,(mp_digit)p_list[p], &temp);
      }
      mp_mul(result,&temp,result);
    }
    mp_mul_2d (result, shift, result);

    return MP_OKAY;
}

After some tests I found a cut-off point at 200,000. The factorial of one million is now available in 6 (six) minutes. Half of the time than before!
Can we do more? Well, let’s see: the primes are consecutive, increasing and already in an array. That should enable us to use the binary splitting method. It is a bit more complicated here because we get into big-number territory in parts of it but that is good, because we can use some of the optimizations for the multiplication of large numbers. The size, or better: magnitude, number of digits, of the multiplicands is also the same, ideal for the implemented Toom-Cook 2 and 3 and FFT.

static unsigned long *primelist;
static   mp_int p_temp;

mp_int *primorial__lowlevel(unsigned long a, unsigned long b ,
                          unsigned long p)
{
  unsigned long  c;
  unsigned long long prod;

  mp_init(&p_temp);

  mp_int ta,tb;
  
  if( b == a){
     mp_set_int(&p_temp,p);
     return  &(p_temp);
  }
  if( b-a > 1){
    c= (b + a) >> 1;
    mp_init_multi(&ta,&tb,NULL);
    mp_copy(primorial__lowlevel( a   , c , primelist[a] ),&ta);
    mp_copy(primorial__lowlevel( c+1 , b , primelist[b] ),&tb);
    if(ta.used == 1 && tb.used == 1){
      if(ta.dp[0] <= MP_DIGIT_HALF || tb.dp[0] <= MP_DIGIT_HALF ){
        prod = (mp_word)ta.dp[0] * tb.dp[0];
        if( prod< MP_DIGIT_SIZE){ 
          mp_set_int(&p_temp,(mp_digit)prod);
          return &p_temp;
        }
      }
    }
    mp_mul(&ta,&tb,&p_temp) ;
    mp_clear_multi(&ta,&tb,NULL);
    return &p_temp;
  }
  if(   (primelist[a]) >=  (MP_DIGIT_HALF ) 
      || (primelist[b]) >=  (MP_DIGIT_HALF )   ){
    mp_set_int( &p_temp,primelist[a]);
    mp_mul_d(&p_temp,primelist[b],&p_temp);
  }
  else{
    c = primelist[a]*primelist[b];
    mp_set_int(&p_temp,c);
  }
  return &p_temp;
}
void primorial(unsigned long a, unsigned long b, mp_int *result){
    unsigned long  r=0;
    primelist = fill_prime_list(a+1, b+1, &r);
    *result = *primorial__lowlevel(0,r-1,1);
}

It even needs some quasi global (static) variables which is rarely a good idea but avoiding them wold clobber the code even more. However, let’s put it in to see if it helps.

#define MP_FACTORIAL_BORW_PRIMORIAL_CUTOFF 200000
int factorial(unsigned long n, mp_int *result){
    unsigned long *p_list;
    unsigned long *exp_list;
    unsigned long p, i,cut;
    long bit;
    int shift;
    mp_int temp;

    unsigned long  r=0;
    p_list = fill_prime_list(3, n+1, &r);
    exp_list = malloc(sizeof(unsigned long)* (r+1));
    if(exp_list == NULL){ return MP_MEM;}

    mp_set_int(result,1);

    shift = (int)prime_divisors(n,2LU);

    cut = n / 2 ;

    for(p = 0; p < r; p++) {
        if(p_list[p] > cut) break;
        exp_list[p] = prime_divisors(n,p_list[p]);
    }
    mp_init(&temp);
    bit = (int)highbit(exp_list[0]);
    for( ; bit>=0; bit--) {
        mp_sqr(result, result);
        for(i=0; i<p; i++) {
            if ((exp_list[i] & (1 << bit)) != 0) {
                mp_mul_d(result,(mp_digit)p_list[i], result);
            }
        }
    }
    if(n<MP_FACTORIAL_BORW_PRIMORIAL_CUTOFF){
      for(; p < r; p++){
         mp_mul_d(result,(mp_digit)p_list[p], result);
      }
    }
    else {
      mp_init(&temp);
      primorial(cut,n,&temp);
      mp_mul(result,&temp,result);
    }
    mp_mul_2d (result, shift, result);

    return MP_OKAY;
}

Result: roughly the same. The theoretical complexity is sometimes different from the one of the implementation. The main culprit here is most probably the deep copy nexessary in our implementation, but it will win for much larger numbers.
But there is another one of the same in the main loop. Let’s replace that with our first algorithm:

#define MP_FACTORIAL_BORW_LOOP_CUTOFF 500000
#define MP_FACTORIAL_BORW_PRIMORIAL_CUTOFF 200000

int factorial(unsigned long n, mp_int *result){
    unsigned long *p_list;
    unsigned long *exp_list;
    unsigned long p, i,cut;
    long bit;
    int shift;
    mp_int temp;

    unsigned long  r=0;
    p_list = fill_prime_list(3, n+1, &r);
    exp_list = malloc(sizeof(unsigned long)* (r+1));
    if(exp_list == NULL){ return MP_MEM;}

    mp_set_int(result,1);

    shift = (int)prime_divisors(n,2LU);

    cut = n / 2 ;

    for(p = 0; p < r; p++) {
        if(p_list[p] > cut) break;
        exp_list[p] = prime_divisors(n,p_list[p]);
    }
    mp_init(&temp);
    bit = (int)highbit(exp_list[0]);
    if(n < MP_FACTORIAL_BORW_LOOP_CUTOFF){
      for( ; bit>=0; bit--) {
          mp_sqr(result, result);
          for(i=0; i<p; i++) {
              if ((exp_list[i] & (1 << bit)) != 0) {
                  mp_mul_d(result,(mp_digit)p_list[i], result);
              }
          }
      }
    }
    else{
      for( ; bit>=0; bit--) {
          mp_sqr(result, result);
          mp_set_int(&temp,1);
          for(i=0; i<p; i++) {
              if ((exp_list[i] & (1 << bit)) != 0) {
                  mp_mul_d(&temp,(mp_digit)p_list[i], &temp);
              }
          }
          mp_mul(&temp,result,result);
      }
    }
    if(n<MP_FACTORIAL_BORW_PRIMORIAL_CUTOFF){
      for(; p < r; p++){
         mp_mul_d(result,(mp_digit)p_list[p], result);
      }
    }
    else {
      primorial(cut,n,&temp);
      mp_mul(result,&temp,result);
    }
    mp_mul_2d (result, shift, result);
    return MP_OKAY;
}

After some test runs the cut-off found is something around half a million.
Run time for the factorial of one million: a mere 25 (twenty-five) seconds! Oh, and with the linear multiplication instead of the binary splitting of the final primorial it is about 30 seconds. Now we have a significant difference. It is the same difference as before but five seconds hidden in six minutes are hard to detect.

I tried ten million but with only half a gig of RAM and some memory hungry apps like the Firefox I use to write this post, it starts to swap like hell quite fast and needs about just a bit less than 13 minutes. Sounds a lot but the factorial of ten million is a large number, about 1.2024e65657052. Yes, a number with 65,657,053 decimal digits.
I tried the ten million factorial without a running Firefox and got about 10 seconds saved. Not much, even if the sound the harddrive made gave the impression of ten minutes instead πŸ˜‰
The factorial of ten million is large enough to allow us a better test if the binary splitting is indeed faster. And it is: with the linear multiplication instead of the binary splitting method the run time nearly doubles and passed 24 minutes. So the theory is correct at least πŸ˜‰

BTW: I just saw that libtommath shares its birthday with me πŸ˜‰