Category Archives: Bugs

Padding Trouble

When Intel expanded the 8086 architecture to 32-bit in 1985, they extended the 16-bit registers present to 32-bit registers. ax became eax, but it was still possible to use the low 16 bits of eax as ax just like before. Their choice was that performing operations on the low 16 bits did not change the high 16 bits of the register.

AMD expanded the 32-bit architecture to 64-bit in 2003. This was again a superset of the original, making it backwards compatible. They extended the 32-bit registers to 64-bit, and eax became rax. Again it was possible to to perform operations on the low 32 bits, but doing so clears the high 32 bits of the register.

“Operations that output to a 32-bit subregister are automatically zero-extended to the entire 64-bit register. Operations that output to 8-bit or 16-bit subregisters are not zero-extended (this is compatible x86 behavior).”

Now both choices work as far as backwards compatibility goes, and as long as we as programmers are aware of what happens, neither is a problem.

When building the aPLib compression library, I use Visual C++ to generate assembly listings, which I then perform some changes on with a perl script, before assembling the object files. While working on the recently released 64-bit version, I ran into a problem — the debug build of the library worked fine, but the release build did not.

Bugs like this are often caused by some improper memory usage, so I spent a day trying to track down the problem without much luck. Somehow the contents of a register was corrupted.

Looking through the code in HIEW I finally found the cause; a seemingly random instruction that wrote to the 32-bit part of a register, thereby clearing the high 32 bits. Then it dawned on me.

Visual C++ emits padding macros into assembly listings to align code and improve performance. These macros, npad, are defined in a file called listing.inc which resides in the Visual C++ include folder. But there is no 64-bit version of this file!

Let’s have a look then:

And there we have it. An instruction like mov edi, edi is safe to use as padding in 32-bit code, because moving the register to itself has no effect. But if you insert it in 64-bit code, it all of a sudden has an effect — the high 32 bits of rdi are cleared.

I have reported the problem to Microsoft and they say it will be addressed in a future release.

Loophole in Visual C++, Part 2

Here is a slightly more elaborate example:

This program goes through the entire range of the unsigned int type, performing some action for each. It shows the progress by calling a function to compute the ratio of count to the maximum possible value. Again, count is incremented in each step, and hence will reach the value zero at some point.

The program works as expected on the compilers I tried, except for cl.exe from VC7 and VC71 with the /O2 switch, which stop at 25%. In case you wondered about the starting point of 0x3fffffff, that’s the reason — no need to watch your machine chew it’s way through all integers up to 25%.

Looking at the code generated for the loop:

We see that it fails because the two instructions before the conditional jump have been reversed. Again it looks like the optimizer fails to recognize the importance of the increment to the loop.

Loophole in Visual C++, Part 1

Lets start this post by recalling what the gosp^H^H^H^Hstandard has to say about unsigned arithmetic:

“A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.”

This is from the C89 draft (3.1.2.5p5), statements to the same effect are present in the C99 standard (6.2.5p9) and the C++ standards.

Now consider the following program:

Since count starts at zero and is incremented each time through the loop, the standard tells us it will wrap to zero when it reaches a result that cannot be represented by an unsigned int, making the program terminate. Compiling the program with various compilers gives the expected stream of increasing numbers.

However, if you compile it with cl.exe from Visual C++ using the /O2 switch (maximize speed) you get a somewhat surprising result; a single zero and the program exits. This goes for VC6, VC7 and VC71.

If you initialize count to one instead, the program works fine. So it looks like the optimizer fails to recognize the addition as changing the value of count, and thus optimizes away the loop.

I have not tested the various VC8 betas, so if you have any of them installed, feel free to try it out and post your results (just remember to compile from the command-line using cl.exe and /O2).