# On Binary Splitting

The method of binary splitting to evaluate sums of rational numbers is well known. A slightly more detailed overview (with some code examples) is given in [2] (preprint in [1]) for the sums of rational numbers.

The binary splitting algorithm uses the divide&conquer method to keep the single operations as small as possible for as long as possible. Another advantage of this method is that all operands are of roughly the same size which is favored by all of the fast multiplication algorithms like e.g.: FFT. Asymmetric Toom-Cook algorithms exist but only for a handful of different relations.

The basic idea for divide&conquer is to evaluate $a \odot b \odot c \odot d$ (with $a \odot b = b \odot a$) as $(a \odot b) \odot (c \odot d)$. That keeps the sizes equal if the sizes of $\{a,b,c,d\}$ are equal, too. In times of multi-core processors the ability to process as many of the operations in parallel as there are cores available is also not a thing one should deem as insignificant too easily.

But there is also something named CPU-cache, a still very finite but very fast memory where the CPU keeps its bricks and mortar it works with, so it might be a good idea too keep as many things in this memory as possible. Here divide&conquer is not the most ideal algorithm because it has to grab its material from different parts of the memory a lot of times. Although the access to memory can be assumed to be $\mathcal{O}(1)$ but the hidden constant can get very large for the different types of memory. So large, that it might be better to use an asymptotically worse algorithm if it is able to keep all data in the CPU-cache. In the case of multiplication it is the school book algorithm here for the first level if $a\odot b$. fits into a single CPU register, e.g. the product of two 32-bit values fits into one 64-bit CPU-register.

An example with libtommath (checks and balances omitted)

#define BIN_SPLIT_TUNABLE_CONSTANT 64
void mp_bin_split(mp_digit *array, int n, mp_int * result)
{
mp_digit first_half, second_half;
int i;
mp_int tmp;

if (n == 0) {
mp_set(result, 1);
return;
}
if (n <= BIN_SPLIT_TUNABLE_CONSTANT) {
mp_set(result, array[0]);
for (i = 1; i < n; i++)
mp_mul_d(result, array[i], result);
return;
}

first_half = n / 2;
second_half = n - first_half;
mp_bin_split(array, second_half, result);
mp_init(&tmp);
mp_bin_split(array + second_half, first_half, &tmp);
mp_mul(result, &tmp, result);
mp_clear(&tmp);
}


If BIN_SPLIT_TUNABLE_CONSTANT is set very small it could be a fruitful idea to do the splitting by hand (mp_word is defined to be large enough to hold the product of two mp_digit).

/* ---- snip --- */
if (n <= 8 ) {
mp_set(result, array[0]);
for (i = 1; i < n; i++)
mp_mul_d(result, array[i], result);
return;
}
if (n == 8 ) {
mp_word s1,s2,s3,s4;
mp_digit carry, a[8],b[8],
c1[8]={0,0,0,0,0,0,0,0},
c2[8]={0,0,0,0,0,0,0,0},*temp;
double C1[16],C2[16];
s1 = array[0] * array[1];
array[0] = (unsigned long) (s1  & ((mp_word) MP_MASK));
array[1] = (unsigned long) (s1 >> ((mp_word) DIGIT_BIT));
s2 = array[2] * array[3];
array[2] = (unsigned long) (s2  & ((mp_word) MP_MASK));
array[3] = (unsigned long) (s2 >> ((mp_word) DIGIT_BIT));
s3 = array[4] * array[5];
array[4] = (unsigned long) (s3  & ((mp_word) MP_MASK));
array[5] = (unsigned long) (s3 >> ((mp_word) DIGIT_BIT));
s4 = array[6] * array[7];
array[6] = (unsigned long) (s4  & ((mp_word) MP_MASK));
array[7] = (unsigned long) (s4 >> ((mp_word) DIGIT_BIT));
if(array[1] == 0 && array[3] == 0 && array[5] == 0 && array[7] == 0){
s1 = s1 * s2;
a[0] = (unsigned long) (s1  & ((mp_word) MP_MASK));
a[1] = (unsigned long) (s1 >> ((mp_word) DIGIT_BIT));
s2 = s3 * s4;
b[0] = (unsigned long) (s2  & ((mp_word) MP_MASK));
b[1] = (unsigned long) (s2 >> ((mp_word) DIGIT_BIT));
if(a[1] == 0 && b[1] == 0){
s1 = s1 * s2;
a[0] = (unsigned long) (s1  & ((mp_word) MP_MASK));
a[1] = (unsigned long) (s1 >> ((mp_word) DIGIT_BIT));
if(a[1] == 0){
result->dp[0] = a[0];
result->used = 1;
return;
}
result->dp[0] = a[0];
result->dp[1] = a[1];
result->used = 2;
return;
}
k = (a[1] == 0)?1:2;
l = (b[1] == 0)?1:2;
temp = result->dp;
for(i=0;i<k;i++){
carry = 0;
temp = result->dp + i;
for(j=0;j<MIN(l,4-i);j++){
s1 =  ((mp_word)*temp) + (mp_word)a[i] * (mp_word)b[j] + (mp_word)carry;
*temp++ = (unsigned long) (s1  & ((mp_word) MP_MASK));
carry  = (unsigned long) (s1 >> ((mp_word) DIGIT_BIT));
}
if(i+j<4){
*temp = carry;
}
}
result->used = 4;
mp_clamp(result);
return;
}
else{
a[0] = array[0];
a[1] = array[1];
b[0] = array[2];
b[1] = array[3];
k = (a[1] == 0)?1:2;
l = (b[1] == 0)?1:2;
temp = c1;
for(i=0;i<k;i++){
carry = 0;
temp = c1 + i;
for(j=0;j<MIN(l,4-i);j++){
s1 =  ((mp_word)*temp) + (mp_word)a[i] * (mp_word)b[j] + (mp_word)carry;
*temp++ = (unsigned long) (s1  & ((mp_word) MP_MASK));
carry  = (unsigned long) (s1 >> ((mp_word) DIGIT_BIT));
}
if(i+j<4){
*temp = carry;
}
}
a[0] = array[4];
a[1] = array[5];
b[0] = array[6];
b[1] = array[7];

k = (a[1] == 0)?1:2;
l = (b[1] == 0)?1:2;
temp = c2;
for(i=0;i<k;i++){
carry = 0;
temp = c2 + i;
for(j=0;j<MIN(l,4-i);j++){
s1 =  ((mp_word)*temp) + (mp_word)a[i] * (mp_word)b[j] + (mp_word)carry;
*temp++ = (unsigned long) (s1  & ((mp_word) MP_MASK));
carry  = (unsigned long) (s1 >> ((mp_word) DIGIT_BIT));
}
if(i+j<4){
*temp = carry;
}
}
for(i = 0,j=0;i<8;i++,j+=2){
if(i < 8){
C1[j]   = (double) (c1[i] & MP_DIGIT_MASK);
C1[j+1] = (double)((c1[i] >> MP_DIGIT_BIT_HALF ) & MP_DIGIT_MASK);
C2[j]   = (double) (c2[i] & MP_DIGIT_MASK);
C2[j+1] = (double)((c2[i] >> MP_DIGIT_BIT_HALF ) & MP_DIGIT_MASK);
}
if(i >= 8){
C1[j]   = 0.0;
C1[j+1] = 0.0;
C2[j]   = 0.0;
C2[j+1] = 0.0;
}
}

mp_fft(C1,C2,16);

carry = 0;
for(i=0;i<16;i++){
s1 = carry;
carry = 0;
s1  += (mp_word)(round(C2[i]));
if(s1 >= MP_DIGIT_HALF){
carry = s1 / (mp_word)MP_DIGIT_HALF;
s1  = s1 % (mp_word)MP_DIGIT_HALF;
}
C2[i] = (double)s1;
}
mp_grow(result,17);
for(i=0,j=0;j<16;i++,j+=2){
result->dp[i] <<= MP_DIGIT_BIT_HALF;
result->used++;
}
if(carry){
result->dp[i] = carry;
result->used++;
}

mp_clamp(result);

return;
}
}
/* ---- snap --- */


And if you think you have seen the worst waste of blog-space you’ve never met the kind of programmers who’s products are described in detail at thedailywtf.com.

[1] Haible, Bruno, and Thomas Papanikolaou. “Fast multiprecision evaluation of series of rational numbers.” (1997). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.3698&rep=rep1&type=pdf
[2] Haible, Bruno, and Thomas Papanikolaou. “Fast multiprecision evaluation of series of rational numbers”. Springer Berlin Heidelberg, 1998.