# On the numerical evaluation of factorials IX

## Subfactorial

The family of the factorials is larger than the mere number of permutations. On of the many offsprings is the subfactorial or derangement.
The normal implementation, using only integers can be done with a recursion or, to save some memory and leave the stack where it is, in an iterative way.
The recursion is quite complicated with libtommath (more probable is the reason that I am just too stupid to get it right ðŸ˜‰ )

```static mp_int kill_the_stack;
static int subfactorial_error = MP_OKAY;
static mp_int * __subfactorialrecursive(unsigned long n){
mp_digit sign;
int e;
if(n==0){
if ( ( e =  mp_set_int(&kill_the_stack,1) ) != MP_OKAY){
subfactorial_error = e;
}
return &kill_the_stack;
}
if(n==1){
if ( ( e =  mp_set_int(&kill_the_stack,0) ) != MP_OKAY){
subfactorial_error = e;
}
return &kill_the_stack;
}
if(n==2){
if ( ( e =  mp_set_int(&kill_the_stack,1) ) != MP_OKAY){
subfactorial_error = e;
}
return &kill_the_stack;
}
if ( ( e =  mp_mul_d(__subfactorialrecursive(n-1),n,&kill_the_stack) ) != MP_OKAY){
subfactorial_error = e;
}
sign = (n&01)?-1:1;
if ( ( e =  mp_add_d(&kill_the_stack,sign,&kill_the_stack) ) != MP_OKAY){
subfactorial_error = e;
}
return &kill_the_stack;
}
static int subfactorial_recursive(unsigned long n, mp_int *result){
mp_init(&kill_the_stack);
subfactorial_error = MP_OKAY;
__subfactorialrecursive(n);
mp_copy(&kill_the_stack,result);
mp_clear(&kill_the_stack);
return subfactorial_error;
}
```

I left all o the error checks in this time to show one of the problems with recursion: making clear that an error occured. The deep copy in `subfactorial_recursive()` is needed to be able to clear the globali-ish variable `kill_the_stack`. Keeping would cause a small but significant memory-leak.

The iterative version is more…uhm…straight forward:

```static int subfactorial_iterative(unsigned long n, mp_int *result){
mp_int temp1, temp2;
mp_digit k;
int e;

if(n==0){
return mp_set_int(result,1);
}
if(n==1){
return mp_set_int(result,0);
}
if(n==2){
return mp_set_int(result,1);
}
if ( ( e = mp_init_multi(&temp1, &temp2,NULL) ) != MP_OKAY){
return e;
}
if ( ( e = mp_set_int(&temp1, 0) ) != MP_OKAY){
mp_clear_multi(&temp1, &temp2,NULL);
return e;
}
if ( ( e = mp_set_int(result,1) ) != MP_OKAY){
mp_clear_multi(&temp1, &temp2,NULL);
return e;
}
for(k=3;k<=n;k++){
mp_exch ( &temp1,&temp2 );
mp_exch ( result, &temp1);
if ( ( e = mp_add(&temp1, &temp2,result))  != MP_OKAY){
mp_clear_multi(&temp1, &temp2,NULL);
return e;
}
if ( ( e = mp_mul_d(result,(k-1),result))  != MP_OKAY){
mp_clear_multi(&temp1, &temp2,NULL);
return e;
}
}
mp_clear_multi(&temp1, &temp2,NULL);
return MP_OKAY;
}
```

The use of libtommath’s `mp_set_int()` needs a restriction to the size of `n`. The type allowed is an `unsigned long`; I restrict further to the actual bit-size of `mp_digit` which is probably way too conservative, especially when those digits are smaller than the default of 28.
The subfactorial of 2^28 is about 1.7862e2,146,019,442, that’s a number with over two billion decimal digits. One needs a different method to calculate it exactly. This one is good up to about 100,000 which already lasts a minute with this old computer I have here, more with the iterative version.

```#ifndef MP_DIGIT_SIZE
#define MP_DIGIT_SIZE (1<<DIGIT_BIT)
#endif
int mp_subfactorial(unsigned long n, mp_int *result){
if(n > MP_DIGIT_SIZE ){
return MP_RANGE;
}
#ifdef SAVE_SOME_MEMORY
return subfactorial_iterative(n, result);
#else
return  subfactorial_recursive(n, result);
#endif
}
```

The code above has been merged to my fork of libtommath over at github.