Recent changes
Table of Contents

What is type inference?

Type inference is the act of determining the type of an expression that isn’t explicitly typed. What does that mean? It means that I can write:

var result = 1 + 2

The compiler should know that even though I used the generic type var, it can be inferred that the variable result is an integer.

Most mainstream languages will not do that for you. But there’s no reason they shouldn’t.

The advantages of type inference

A lot of the time, type specifiers are useless. You don’t need the type annotation because it’s already obvious from the context. The compiler doesn’t need the annotation because it can figure it out. So it’s a redundant piece of information. And it isn’t harmless either; the explicit type specifiers get in the way.

It’s nice to have longVariableAndTypeIdentifiers. While you can go overboard with long identifiers, I find myself often using less-descriptive-than-ideal names to avoid having to type so much. Even if I didn’t mind typing them in, long names can still make the code less readable. Take this example from Java:

Set<Map.Entry<ComponentName,ComplexComponentImplementation>> entries = components.entrySet();

I might decide to use the identifiers CName and CCImpl instead of ComponentName and CCImpl to keep it readable:

Set<Map.Entry<CName,CCImpl>> entries = components.entrySet();

With the shortened names, this particular line of code is more readable, but now you’re going to have to use CName and CCImpl everywhere, making the entire program harder to understand.

What it comes down to is that, in this situation, the variable name (entries) is all the information you need. The surrounding context usually provides enough information to the reader. With type inference, I could just write:

var entries = components.entrySet();

The longer the type specifier, the less willing you’d be to type it in redundantly. So, naturally, you’ll either make your identifiers shorter than they should be or you’ll try to avoid using intermediate variables. Both are bad.

Why it’s stupid that many languages don’t support it already

Every statically-typed language I can think of (this includes Java, C# and C++) already does some type inference. It’s just that they make it inconvenient to use. If Java compilers didn’t have any sort of ability to infer types, you couldn’t do this:

String filename = pictureAlbum.getStuff().get("Big").getInfo().getFileName();

Instead, you’d have to add an annotation to every single term:

Map<String,List<Picture>> album    = pictureAlbum.getStuff();
List<Picture>             pictures = album.get("Big");
PictureInformation        info     = pictures.getInfo();
String                    filename = info.getFileName();

Also, if the compiler really needed you to specify all the types, then why does it complain when you do this:

int album = pictureAlbum.getStuff();

It complains because it knows the exact type of the expression pictureAlbum.getStuff() but still wants you to type it in. Why won’t it just accept the “var” specifier and stop complaining?

var album    = pictureAlbum.getStuff();
var pictures = map.get("Big");
var info     = pictures.getInfo();
var filename = info.getFileName();

It’s really too bad that local variable inference was left out of so many recently-designed languages.

Type inference can get complicated, but it doesn’t have to

As far as I can tell, there seem to be two basic types of inference: inference based on variable assignment and inference based on variable use. (I’m probably not using the right terms).

Inference based on variable assignment

In all the examples above, I’ve only used inference based on variable assignment. This is the easy one. This is the one all Java/C#/C++ compilers can do already. This is where you determine the type of the variable based on how it’s value is assigned. Since compilers already know the type of almost every expression, they can just use that type for the target variable.

This can also be used for function return types (and I believe C# allows this on closures):

GetFirst(List<Integer> list)
{
   return list.get(0);
}

The return type is automatically inferred to be of type Integer.

Tricky error messages

Some strange things can happen with even the most basic type inference:

1 var map = new Map();
2 map.put("Hello", "Guy");
3 var value = map.get("Hello");
4 int i = value.length();

Can you see the error?

Error on line 4: Could not find instance method 'length()' in class 'Object'.

The reason for the error is that map.get() returns an Object. This caused the compiler to make value a variable of type Object. You can’t call the length() method on Object. If you had to explicitly type the value variable, the compiler would have flagged the error at that point. With type inference, the error message has commuted a couple lines down.

First of all, I’m aware that real programs could trigger more complex error messages. On the other hand, it wont take long before we programmers learn to recognize that type of mistake and are able to deal with it just like any other seemingly-strange-at-first error message (like GCC’s “implicit declaration of function blah()”).

Multiple assignment and subtyping

It can get even tricker, though, when you allow for multiple assignments and subtyping. Let’s say you had the following class heirarchy:

Thing
+-Human
| +-Doctor
| +-Clown
+-Lawyer

The type inference policy may try to find a common supertype to handle multiple assignments:

var x;
x = new Lawyer();
x = new Clown();

var y;
y = new Doctor();
y = x;

Here, the type of x will be inferred to be Thing, because that’s the greatest common denominator. This adds complexity because, unlike before, we couldn’t determine the type of a variable based on a variable’s very first assignment statement. Remove the second assignment to x and its type changes to Lawyer.

The type of y is also interesting. After the first assignment, the “running” type is Doctor. After the second assignment, what happens? Since the type of x was determined to be Thing, y’s type might also be pulled down to Thing. However, the compiler might do some control flow analysis and see that when the assignment occurs, x cannot be a Lawyer (since the original assignment has been replaced by Clown). It might then decide to make it a Human.

This could get complicated. However, notice that the smarter compiler would give us a more specific type (i.e. Human as opposed to Thing). I’m guessing that even though this gets complicated, a good compiler will usually allow us to do more of what we mean to do without complaining about type errors.

Inference based on variable use

This is the hard stuff. This is also the stuff that lets you write entire Haskell programs with barely any type annotations at all. (However, I think that most Haskell programmers suggest that you use type annotations for publicly exported functions so that type errors are localized and easier to debug).

String GetName(key)
{
   return map.get(key);
}

What’s the type of key? Well it all depends on the type of map.get. If map were of type Map<Integer,String>, then key would be inferred to be of type Integer (because that’s the only value that fits).

I’ve done a (very) little amount of programming in ML and Haskell and have come across weird type error messages. This kind of inference can get out of control. It’s usually accompanied by parametric polymorphism (where the genericity is also automatically inferred) so maybe some of the complexity comes from the combination and not from the inference itself. Either way, I can get by fine without it.

Just give me the simple stuff

All I want is the simple form of type inference (in a mainstream language). It’s not hard to implement and it’s not hard to use. I don’t even need the compiler to handle multiple assignments. I can’t guarantee that I wont start complaining again, but this should shut me up for a while.

The Nemerle Programming Language and the Nice Programming Language both have this level of type inference.

data/simple_type_inference.txt Last modified: 01.10.2008 00:37
Driven by DokuWiki