Long Division

Integer types with at least 64 bits have been a part of the C standard for a while now (they were added in C99, and were a standard extension in many 32-bit compilers before that). But have you ever wondered what exactly happens when you use them?

Consider the following function (substitute long long with __int64 if you are using an older version of Visual C++):

long long div64(long long x, long long y)
{
    return x / y;
}

let’s first have a look at what the VC 64-bit compiler gives us:

div64:
    mov   r8, rdx
    mov   rax, rcx
    cdq
    idiv  r8
    ret

Pretty much what you would expect, a little setup and an idiv instruction to perform the division. Now let’s try the VC 32-bit compiler:

div64:
_x$ = 8
_y$ = 16
    mov   eax, _y$[esp]
    mov   ecx, _y$[esp-4]
    mov   edx, _x$[esp]
    push  eax
    mov   eax, _x$[esp]
    push  ecx
    push  edx
    push  eax
    call  __alldiv
    ret

A little setup and .. a call?

The thing is, 32-bit x86 only has instructions for mixed mode 32/64 bit operations — with one assembly instruction you can multiply two 32-bit integers to get a 64-bit result, or you can divide a 64-bit integer by a 32-bit as long as the quotient fits in 32 bits. But to divide two 64-bit integers, you have to emulate the operation in software.

That is where the _alldiv function comes in. It is a support function in the C runtime library that performs a 64-bit divide. In fact there is a whole family of such functions to do multiply, divide and shifts on 64-bit numbers (addition and subtraction are inlined since they only require two assembly instructions).

Other compilers do likewise, for instance GCC calls __divdi3 to perform 64-bit division.

It is important to be aware of this, because using long long instead of int means that arithmetic operations you probably expect to be single instructions, all of a sudden become library calls who’s execution time can depend on the arguments.

As an example, I timed a simple implementation of the extended Euclidean algorithm with 32- and 64-bit integers. Compiling with the VC 32-bit compiler, the 64-bit version was roughly 3 times slower (depending on the CPU).

2 thoughts on “Long Division

  1. Jibz Post author

    Thank you, I know the updates here are rather infrequent, but don’t worry I will post whenever I find something I think is interesting enough.

    Btw, I found a quote by Jan Gray that nicely underlines my comment about the programmer expecting arithmetic operations to map to single instructions:

    It was easier in the good old days. Good C programmers knew. Each operator and operation in C, be it assignment, integer or floating-point math, dereference, or function call, mapped more or less one-to-one to a single primitive machine operation. True, sometimes several machine instructions were required to put the right operands in the right registers, and sometimes a single instruction could capture several C operations (famously *dest++ = *src++;), but you could usually write (or read) a line of C code and know where the time was going. For both code and data, the C compiler was WYWIWYG—”what you write is what you get”. (The exception was, and is, function calls. If you don’t know what the function costs, you don’t know diddly.)

Leave a Reply