Long Division, Part 3

I ended part two of this series with an open question:

And of course I can’t help but wonder if the CLR is compiled with Visual C++, so doing arithmetic on 64-bit numbers in C# and other .NET languages ends up at the same runtime functions?

I don’t have a lot of experience in debugging the CLR myself, so I asked Brian Rasmussen if he might be interested in taking a look at it. He was kind enough to take the time to point me in the right direction.

A little digging showed that the CLR does in fact call some of these functions from the C runtime, but with a twist.

The C# function we looked at was this:

The JIT compiler produced the following 32-bit code:

We note the call to JIT_LDiv, which is a helper function in clr.dll (the CLR runtime, formerly mscorwks.dll). Let’s have a look at it:

This excerpt is from the Shared Source CLI, but the same code is present in clr.dll from .NET Framework 4.0.

The division in the return statement at line 28 generates a call to _alldiv from the C runtime library.

Now, the twist is that this helper function, besides checking for division by zero and overflow, also optimizes division of two values that both fit in 32 bits (the division with casts at line 24).

This is one of the optimizations I used in WCRT as well, so I was interested in seeing how the CLR code performed. I removed the exception handling along with associated checks and ended up with this function, which I tested using the timing application from part 2:

Here is a summary of the results from my Athlon 64:

In the WCRT column, JIT_LDiv is compiled with _alldiv from WCRT, and in the VC column with _alldiv from the VC CRT. So the first three rows compare the speed of the CLR helper function when using those as fallback.

We can see that there is a noticeable improvement on HH and HL, even though the check for two values that both fit in 32 bits is done twice (once in JIT_LDiv, and once in the WCRT version of _alldiv).

The next three rows compare the C runtime function _alldiv from WCRT and the VC CRT. These results are the same as in part 2.

The special handling of two values that both fit in 32 bits is a big improvement. On LL, JIT_LDiv is almost as fast as the WCRT version of _alldiv. For HL and HH, JIT_LDiv is slightly slower than the regular VC _alldiv due to the extra checks.

So, in conclusion, the CLR runtime is compiled with Visual C++, and when code runs under 32-bit, the CLR runtime will fall back to the 64-bit arithmetic functions from the C runtime library.

The CLR runtime contains optimizations for certain values (usually when both operands fit in 32 bits), but for other cases optimizing the C runtime functions appears to provide a direct improvement for the CLR as well.

Leave a Reply