Recent changes
Table of Contents

Evaluation of the Microsoft CLR

No, I’m not an expert. Also, I didn’t come up with this stuff; I’m usually just taking sides on some ongoing debate about the “right way” to do things.

What is the CLR?

From what I can decipher, the CLR is a VM-type execution platform. It is the only interesting part of the big ball of marketing slime called “.Net”. I don’t have an exact definition of it and when I use the term, I might be including the standard runtime libraries (I probably wont, though, because I’m not very familiar with them).

I could be including C# as well. While you can’t technically call C# the canonical CLR language (some CLR features are only available in Visual Basic), I think C# is the language that most closely matches up with the CLR. The C++-specific features of the CLR are legacy support (surprisingly comprehensive legacy support!).

To sum things up, I don’t know what I mean when I say “CLR” and so you’re going to have to figure it out from the context.

"What’s your problem?"

First of all, lemme state that I do like the CLR. There are a lot things the CLR does better than the Java platform. On the other hand, big deal.

When the Microsoft guys designed their system, they had a fully implemented and deployed example to learn from. Given that, the CLR is depressingly similar to Java. All they had to do was evaluate the complaints against Java and deal with them one by one, but they didn’t. The CLR has some brand-new bad ideas, too.

Interestingly, many of the features that deal with practical deployment details (such as versioning, assemblies and app domains) are pretty good.

Good: Competition

Sun was avoiding new language-level features to Java for a long time. The lack of autoboxing, generics, and enumerations have resulted in many, many hours of lost productivity. Luckily, Microsoft lit a fire under their asses.

Though some of those features were already in the works before C#, the new competition has really forced Sun to get its act together. I don’t know if Java would have all the language-level features it has now if C# hadn’t come along.

Good: Virtual Method Annotations

All methods are non-virtual by default (final in Java-speak). If you want to create a new virtual method, you have to say so explicitly. If you want to override a virtual method, you have to say so explicitly.

Why creating virtual methods should be explicit

It takes extra care to design a method so that it can be safely overridden. The explicit annotation is a good way to force the programmer to to think about it (this argument was taken directly from the man himself).

Why overriding should be explicit

This one is the real big win. In Java, when you write a new method that matches the signature of one of your parent class’ methods, the new method automatically overrides the old one. This can easily happen accidentally. What’s worse is that this can also happen when your parent class gets “upgraded” and now has methods that weren’t previously there. After you recompile the base class, weird things will start happening at runtime.

In C#, if a parent class is upgraded like that, you’ll get a error when you try to compile the base class, which lets you know that something has changed. (To fix it, you can add the new modifier.)

Good: Value Types

The addition of value types allowed the semantics of numeric and boolean primitives to be described within the CLR itself. This is good.

They’re good for interoperability with C code. In Java, laying out message structures is painful (to write) and inefficient (to run). It’s good that C# has the ability to manipulate complex structures in place.

Value types also unified the type system. Just joking. It didn’t unify jack.

Bad: Value Types

When I first read about value types, I was impressed. Had I written this article back then, I would have hailed value types as the CLR’s most goodest piece of goodness. Finally, a unified type system. Except not really. They behave differently from other types. But in C#, they still look exactly the same when you use them.

Color originalColor = new Color();
originalColor.Red = 111;

Color x = originalColor;
x.Red = 222;

print("original.Red = " + original.Red);
print("x.Red        = " + x.Red);

The programmer has to know whether Color is a value type or a class type to know how the above code will behave.

I hear you saying “No big deal. A programmer is expected to know something about the classes he uses. What’s wrong with making the programmer know whether it’s a value type or a reference type?”. That’s a hard question and I don’t know if I can answer it to your satisfaction, but I’m going to have to try:

The value/class status of an object is another piece of information about every object. The worst part is that the value/type decision is essentially an optimization hint. The effects of optimization hints shouldn’t leak into the semantics.

Why doesn’t Java have this problem? After all, aren’t ints the same as value types? Yes they are, but they’re also immutable. You can share the object with anyone you want and can be assured that the sharees won’t mess around with the object behind your back. This is also true in C# for the immutable values types, but now you can have mutable value types (like Color), which don’t play well with others.

One of the main reasons for this feature was performance. Unfortunately, much of the performance gain could have been realized with a smarter compiler (without mucking with the semantics).

Good: Runtime Generic Instantation

The CLR’s design for generics was OK. Nothing spectacular. They scored a point by allowing constructor constraints on type parameters, but other than that it was pretty straight-forward. They did miss out on adding variance syntax to C#, but they can bolt that on later.

The CLR’s generics implementation is great. Normal reference types are all handled the same way by the same class. But when you use a type like int or bool the runtime system will compile new instances of the generic methods to handle the primitive types efficiently. That was a good call.

The alternative is to box each primitive inside an object (in Java, this is done by converting ints to java.lang.Integer objects). This is how Java does it. It’s a lot slower. But I suppose that you can forgive Java for that. The truly sucky aspect of the Java implementation is that they don’t save any runtime information about type parameters (“type erasure”). Though type erasure itself is not necessarily a bad thing, it wasn’t a good fit for Java.

In a well-designed programming language, you wouldn’t ever need that information. But since both C# and Java programmers have developed an unhealthy dependence on unsafe type casts, the lack of runtime type information causes things to not work as expected.

Bad: Covariant Arrays

This is unfortunate. Array references are not covariant. Pretending they are (like Java and C# do) doesn’t make it so.

Good: Assemblies

Finally, a real way of packaging your code. Java’s JARs are pathetic. Though versioning/linking is still something you have to think hard about, assemblies and “strong names” are a step forward.

On the more technical side, assemblies give the JIT compiler some flexibility. By making certain classes only visible within an assembly, the compiler can do some inter-class optimization without worrying about dynamically-loaded code coming in and screwing everything up. Though I’ve heard that JIT compilers are getting better at dealing with this problem, assembly-level encapsulation makes things easier.

Assemblies are a good example of a feature that makes things easier for the programmer and for the compiler.

Good: Application Domains

On my old machine, the startup time for Java applications was quite high. So I implemented a wrapper Java applciation that always keeps the Java compiler loaded in memory and invokes it upon request. It would be nice if you could do this for any application in general and I tried to do just that. Unfortunately, it’s not possible. Wait. I take that back. I think that some people have done this by messing around with the bytecode of external programs before loading them (replacing references to java.io.File and java.lang.System on the fly). So while they may have beaten the Java environment into submission, it still isn’t an ideal solution (see Echidna (apparently abandoned) and JNode).

With the CLR, it’s a lot easier. You can launch multiple CLR applications in the same process, each in its own “application domain”. This cuts down on resource usage, startup time and IPC overhead.

Application domains also let you define lightweight isolation boundaries between different programs. If you just let pointers run all over the place, you can’t cleanly shut down or reload one component without affecting all the others. I think the development of the ASP server forced Microsoft to deal with these issues properly (since the server has to continuously load and shutdown user programs).

(There’s an active JSR to add a similar feature to Java).

Bad: out parameters

I’m sure a lot of people think they love this feature. Every C/C++/Java programmer has run into the problem of wanting to return more than one value from a function. But “out” parameters are the bad way of doing it. What they should have done is allowed for tuples (and multiple return values would have naturally followed). There are some legitimate uses for “ref” pointers (though I’m sure people end up using them where tuples would be more appropriate), but “out” parameters are almost always the wrong solution.

Currently, you can’t pass properties as “ref” or “out” parameters because they’re essentially C-style pointers. Implementation details are leaking out again. Tuples return values would use copying semantics instead of direct pointers and so you can, once again, treat properties like regular fields.

Good: "unsafe" code

The JVM and CLR provide two things:

With the JVM, those two are tied together. The whole system provides both type safety and platform independance. So if you want to do type-unsafe things in Java, you need to write native code and compile it ahead-of-time for each target platform.

With the CLR’s unsafe subset of instructions, you lose type-safety (just like you do with JNI), but you still have platform independence. C#’s support for unsafe code makes everything more convenient too (have you used JNI?).

Of course, you can’t do everything in CLR bytecode that you can do in native code but it covers a good percentage.

Bad: Redundancy

These are mostly lessons that most of us (obviously not all of us) have learned from Java.

What do you have to do when you rename a Java class? Rename the file, rename the “class Blah” declaration and then rename all the constructors. Those are all redundant pieces of information. You also have to rename all the references to the class, but this is unavoidable with a plain-text storage format.

In C#, I think you can name a file whatever you want, but that doesn’t really help in the common case. You’re going to end up making sure the class name matches the file name anyway. They also tacked the name of the class onto a class’ static initializer, so there’s an additional rename (the D Programming Language people did the smart thing and called their constructors "this").

Also, Java package names were always absolute paths, making reorganization a pain. The same holds for C#. They could have fixed this, but they didn’t.

You might say that editors with language-sensitive renaming features can eliminate these problems. Yes they can (though I wont be able to use them until they add decent support for Vim-style editing; the pile of crap that comprises the CodeWright Vi bindings doesn’t count). I don’t like that kind of redundancy in the source file. It’s probably just a personal preference.

Good: Nullable value types

A recent addition was a simple syntax to make value types “nullable”. This means that you can declare a variable to have type int? and store either an integer value or null in it. Now value types and class types have become a little more uniform (not really, see below).

I think they forgive you if you perform arithmetic on nullable values. So if you do:

int? x = 5;
int? y = null;
int? z = x + y;

Instead of bombing out with a NullReferenceException, I think they’ll just set z to null. This “silent failure” behavior is dangerous, but that’s not even the worst part.

Reference types do throw a NullReferenceException when you try to do things to them. So, again, the split type system is causing problems.

Bad: Nullable by default

What’s wrong with this picture:

When you have a type String, you’re really using the type String? because it could either be a pointer to a string object or it could be null. The fact that null is considered to be a valid pointer value is a hack from C that should have been fixed by now. All reference types should be non-nullable by default. Unfortunately, there’s no way to express a non-nullable reference type. The CLR is too widely deployed to make non-nullable the default, but they might be able to salvage things by allowing explicit annotations to indicate that a reference is non-nullable. This, sadly, makes the common case more tedious. I think it’ll also suffer from the same problem C++’s const does in that it’ll be painful to add in the non-nullable annotations if you forget them in the beginning (and since the default case is less restrictive and involves more typing, it’ll be very easy to forget them in the beginning).

It might seem unfair to blame them for not fixing this – the only reason I’m even aware of the problem is that I randomly stumbled upon the Nice programming language and read about the elegant way it’s handled in that language. But I think we can hold a “legendary language designer” to a higher standard (I think that’s what the Microsoft PR machine has been calling him). This is the single biggest screw up in the CLR.

Good: Function Pointers (and Anonymous Functions)

Actually, they’re called delegates, but they’re the essentially the same thing. Great feature. But this was a pretty obvious one. It would have been stupid not to have added this feature. Then again, the Java still doesn’t have it…

Java’s anonymous classes can fake it a little but, but it’s way too inconvenient. (Heck, I think the word “delegate” is too long a prefix for anonymous functions; something like “#” would have been better). [C# 3.0 Update: The “delegate” keyword is no longer necessary!]

Also, “multicast delegates” are also a huge convenience. Not having to implement these yourself for all of your events is nice.

But, unfortunately, the implementation isn’t clean.

Bad: Function Pointer Hacks

While multicast delegates are useful things, they’re shoved onto the same type space as regular delegates. So you can’t statically tell whether a delegate points to a single target or if it points to multiple targets (and you often need to know this, because return values behave differently). There’s nothing in C# to protect you from this. There are separate Delegate and MulticastDelegate classes in the library, but Delegate is deprecated.

The actual implementation of delegates in the CLR is a behind-the-scenes hack. They just let the VM mess around with the parameters and hack up a new class on the fly to make up for the lack of a powerful-enough type system. With generics and tuples (and a little syntactic sugar), delegates could have been implemented cleanly. The C# language does a pretty good job of shielding you from the mess, though, so a C# programmer doesn’t have to be aware of the back-end nonsense. On the other hand, there are a lot of other delegate-like things that you can’t do without tuples.

I realize that this is kind of unfair. The CLR didn’t have generics at first, and so they didn’t really have the necessary mechanisms to implement things cleanly. First of all, maybe they should have planned on adding generics (after all, the Java people had generics in the works for a long time and so it was inevitable that C# would need them). But, ignoring the past, they really, really should do this for future releases. I have a pretty strong feeling that they wont, but maybe that’s just because we’ve all become used to Sun adamantly refusing to change Java, no matter how awesome the feature request.

BTW, the multicast delgate API should let you pass in a reduction function to handle multiple return values. Or, at least, return an array of all the return values so you can run the reduction yourself. Actually, because of the silly way delegates are implemented, you can’t add this to the static delegate API because the actual invocation routine is generated at runtime. Fixing this requires a VM change.

Bad: The CLR is not language agnostic

This is not really a complaint against the CLR as it is has better inter-language support than Java does. This is a complaint against the people who are convinced that compiling to CLR bytecode takes you to interoperability heaven.

The CLR is highly geared towards a Java-like language. There are additions to support C++, but that’s about it (and in the grand scheme of things, C++ is not very different from C#). “But what about Visual Basic?”

VB.Net is usless

From what I’ve read, it seems like the new Visual Basic is very different from the old one. Old VB programmers are complaining that too much has changed. To me, it looks exactly the same because I see similar syntax. The reason is that the type system (which is probably the most important aspect of a programming language) has changed. VB.Net is just a C# core dressed up in different syntax. Most of the features unique to VB.Net are decidedly stupid and left over from attempts at pushing a language past its limits.

They shouldn’t have even bothered with VB.Net. They should have created a variant of C# that looked just like C# except had VB-style “duck-typed” varibles and a VB-style development environment (and, of course, called it something stupid like IntelliC#). There are a couple reasons (that I can think of) they didn’t do this:

Your guess is as good as mine. Unlike C++, VB doesn’t really add any value.

Narrow-minded type system

If they wanted to be language neutral, they should have come up with a solid type system. Instead, they did just enough to accomodate both C++ and Java (and many of the differences are superficial and due to historic reasons). Too many real features are missing. The biggest being option types (yes, I mentioned this already, but it’s really important).

The lack of a CLR-level “const” restriction on parameters and return values is also kind of sad because it seems like an obvious feature. I have a feeling that this can be bolted on to the CLR later but making the libraries take advantage of this will be seriously disruptive (like trying to make an existing C++ program const-correct). They could phase it in little by little by letting “const” mistakes pass as warnings, but then the optimizer can’t take full advantage of this extra information. It would still help with program correctness, though.

Your core libraries can’t be language-agnostic

Well, maybe someone, someday, will come up with a way to write language-independant libraries (and that would be an impressive feat), but that definitely hasn’t happened yet. Sure, you can probably get away with using the same libraries for Java and C#. But try and translate that a little further to C++ and see what happens. Imagine trying to implement (or even use) the STL from within C#. It doesn’t make sense. And C++ isn’t even the biggest challenge; there is no way you’re going to get Haskell programmers to ditch the Prelude. Of course, Microsoft knows this and will create separate core libraries for each language.

The silver lining

While I think that the CLR isn’t even close to being a unified language-agnostic runtime, it does appear to be a decent candidate for replacing the C calling convention as the standard foriegn function/object interface (becoming, as originally intended, a COM replacement). The key here is JIT compilation to avoid the virtual method issues statically compiled C++ libraries have.

Yes, Java did this before the CLR, but the CLR’s comprehensive support for unmanaged code means that you can totally ignore the not-so-language-neutral CLR type system most of the time, playing by the rules only when you want to use C# libraries. So while Haskell programmers will not be able to ditch the Haskell Prelude, they’ll be able take advantage C# libraries when there’s no Haskell equivalent, even though it might be a little inconvenient.

"You made a mistake"

I realize that I’m probably wrong about some of the things written here (hopefully not too many). I’d really appreciate it if you’d point out my mistakes.

data/microsoft_clr.txt Last modified: 01.10.2008 00:37
Driven by DokuWiki