While adding a few header files to WCRT (a small C runtime library for Visual C++), I stumbled upon something that caught my interest.
INT_MIN in <limits.h> is a macro that expands to the minimum value for an object of type int. In the 32-bit C compilers I have installed at the moment, it is defined as:
#define INT_MIN (-2147483647 - 1) |
So what exactly is wrong with the integer constant -2147483648 ?
Well, firstly it is not an integer constant. Let’s see what the standard says:
“An integer constant begins with a digit, but has no period or exponent part. It may have a prefix that specifies its base and a suffix that specifies its type.”
You will notice there is no mention of a sign. So -2147483648 is in fact a constant expression, consisting of the unary minus operator, and the integer constant 2147483648.
This still does not explain why that expression is not used directly in the macro. To see that, we have to revisit the rules for the type of integer constants.
The type of an unsuffixed integer constant is the first of these in which its value can be represented:
| C89 | : | int, long int, unsigned long int |
|---|---|---|
| C99 | : | int, long int, long long int |
| C++ | : | int, long int, long long int |
The problem is that 2147483648 cannot be represented in a signed 32-bit integer, so it becomes either an unsigned long int or a long long int.
So we have to resort to a little trickery, and compute -2147483648 as (-2147483647 – 1), which all fit nicely into 32-bit signed integers, and INT_MIN gets the right type and value.
If you happen to look up INT_MIN in the standard you will see:
minimum value for an object of type int
INT_MIN -32767
Which brings up the question why isn’t it (-32767 – 1)?
Pretty much any computer available today uses two’s complement to represent signed numbers, but this hasn’t always been the case.
Since C was designed to work efficiently on a variety of architectures, the standard’s limits allow for using other representations as well.
I will end this post with a little (not quite standard conformant) example. Try compiling it with your favorite C compiler, and let us know if something puzzles you.
1 2 3 4 5 6 7 8 9 10 11 12 13 | #include <stdio.h> #include <limits.h> #include <float.h> int main(void) { if (-2147483648 > 0) printf("positive\n"); if (-2147483647 - 1 < 0) printf("negative\n"); if (INT_MIN == -INT_MIN) printf("equal\n"); if (FLT_MIN > 0) printf("floating\n"); return 0; } |
nice!
it took me a minute to realize you were saying that the root of the problem was that the POSITIVE value 2147483648 could not be stored as a (signed) int, and that this little bit of assymetry is the problem.
My favourite compiler does a good job of this:
So, the surprising standard behaviour gets a warning, and the unsurprising standard behaviour doesn’t, even with no warnings explicitly enabled.
I was briefly surprised by FLT_MIN, but only because I never use it, so it wasn’t until I saw the answer that I remembered what it means
gcc has some very good warnings on this example yes.
Thanks for the reply.
Tcc doesn’t complain at all. It apparantly simply let’s the value overflow, even though it claims to support most of C99:
positive
negative
equal
floating
That is interesting, in C99 it should not print ‘positive’.
Might be worth letting the developers know.
this was kind of funny to me. i ran this in windows on VC++ Express and on minGW. the assembly for each is what is interesting.
minGW just decides to print them
%%BEGIN minGW
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
movl $LC0, (%esp)
call _printf
movl $LC1, (%esp)
call _printf
movl $LC2, (%esp)
call _printf
movl $LC3, (%esp)
call _printf
movl $0, %eax
leave
ret
%%END minGW
CL decides to check but… it checks if the value in the register is equal to what is in the register???
%%BEGIN CL.exe
; Line 7
mov eax, 1 ; Sets the register to 1 then…
test eax, eax ; Makes sure 1 is equal to 1
je SHORT $LN4@main
push OFFSET $SG2536
call _printf
add esp, 4
$LN4@main:
%%END CL.exe
Nice trick.
The test instruction performs a bit-wise and operation and sets the flags without modifying the operands.
Testing a register against itself is a common idiom to check if the register is zero, probably mostly because it is shorter than a cmp.
So Visual C++ is being very very literal about translating the if statement directly to assembly instructions.
Thanks for the reply, it is interesting to see what different compilers do.
The assembly output from the MSVC example is likely the result of a non-optimized (debug) build. Generally the complier follows things very literally (so as not to confuse things when debugging – also to let you have the ability to set breakpoints at any point corresponding to the original source code).
It’s never a great idea to look at the assembly output for a debug build and draw conclusions about the quality of code generation of the compiler.
Running on a Mac only prints negative, equal and floating.
melker:Programming peter$ gcc -v
Using built-in specs.
Target: i686-apple-darwin11
Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2335.15~25/src/configure –disable-checking –enable-werror –prefix=/Developer/usr/llvm-gcc-4.2 –mandir=/share/man –enable-languages=c,objc,c++,obj-c++ –program-prefix=llvm- –program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ –with-slibdir=/usr/lib –build=i686-apple-darwin11 –enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2335.15~25/dst-llvmCore/Developer/usr/local –program-prefix=i686-apple-darwin11- –host=x86_64-apple-darwin11 –target=i686-apple-darwin11 –with-gxx-include-dir=/usr/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)
melker:Programming peter$ ./a.out
negative
equal
floating
melker:Programming peter$
This bit of asymmetry also leads to this interesting security implication: http://my.opera.com/taviso/blog/show.dml/639454
(In short, (INT_MIN / -1) can crash your program with an exception, since the result can’t be represented in an integer. It’s an easy way to crash a program that does integer math on untrusted input.
print(“%d\n”, -2147483648 < 0);
print("%d\n", (unsigned int)-2147483648 < 0);
print("%d\n", (int)-2147483648 < 0);
print("%d\n", (long)-2147483648 < 0);
print("%d\n", (unsigned long)-2147483648 < 0);
print("%d\n", -2147483647 – 1 < 0);
When compiled with Ken Thompson's C compiler for the 386, the above program prints the following output:
0
0
1
1
0
1
Great! I have saw this on CSAPP before, and the C99 output surprised me:)
Clang in OS X Lion 10.7.2:
$ clang signed.c -o signed -Wall
$ ./signed
negative
equal
floating
WTF; NM. This was on HN so I thought it was new; didn’t notice the 2009 dates on the comments or the post itself.
Thank you for the comments (even if the post is a few years old, I believe the issue remains).
Peter, normally I would expect GCC to default to C89 (with extensions) which should show ‘positive’. I wonder if it being a LLVM version makes a difference?
Josh, that is an interesting observation, thanks for the link!
lowell, it looks like clang follows the C99 standard mostly, so the output is expected, but it is a bit surprising it does not produce any warnings at all.
@ Peter Hamilton
Because you’re compiling it as a 64-bit application on your 64-bit OS. The article is pointing out a 32-bit problem. If you compile the source code with “-arch i386″, forcing it as a 32-bit executable, the test will work.
Typo: (-32766 – 1)
Oops. I need to read harder.
Pingback: Smashing the Stack, IO – Level 2 | A Tale of Glim