LINQ, Lambda expressions, functional programming, F#, C# 3.0, VB 9.0 and what the heck of these guys are related? (part two)
Update:
May 147h, 2008: revisiting local type inference and more clarification on VB and C# feature comparison
Close view of Lambda Expressions and the journey to LINQ
Now, what's really a lambda?
According to MSDN Library that comes with VS 2008, a lambda expression is "an anonymous function that can contain expressions and statements, and can be used to create delegates or expression tree types." That's a mouthful description.
First, an anonymous function. Function, in a sense of C# and VB.NET, is simply a method that returns any value. In VB, it's more verbose, it uses keyword Function. It can be seen as a further development of anonymous method, only it's more like syntactic sugar, since it's more expressive.
Before enjoying this journey, C#3.0 and VB 9.0 new features are not LINQ alone. Many new features of C#3.0 and VB 9.0 forms (or compose) LINQ.
These are:
- Query expression (or query comprehension)
- Local variable type inference
- Lambda expression
- Anonymous types
- Object initializers
- Expression tree
Now grab your popcorn, relax and stay sit. :)
Before Lambda expressions, and others, there has been impedance mismatch of data and object
Now, let's see... The roadmap to the query expression of LINQ. Yes, the journey behind it. You can also see the movie about it on LANGNET 2008 website in Talks section. But for those prefer to seek another way to crunch this concept, let me enlighten you. :)
It all starter when these things happened: "impedance mismatch of objects and relational data".
Or, easily put by Anders Hejlsberg and others:
relationaldata != object
Remember XBase family software such as dBASE, FoxBase (then FoxPro and then Visual FoxPro and then "gone")? Yes, they were neat, if you need to access data, it's so simple. You got the feeling that the data was there, integrated with your language you're coding. It's also giving you compile time errors on query (although there were no such thing like Intellisense). They were also fun to use, since you're concerning only on "what" data you want to display, instead of concerning "how". And the database world has grown not just simple query, but it's grown into a standard based SQL, at least ANSI SQL-92.
The separation grew larger, since the difference was more obvious:
- Database is relational, object is hierarchical,
- Database has nullable types, object didn't have it.
- Database has indexes, primary keys, object didn't have it. But this was solved by major ORM, including #1 above, but not #2.
The long road to marry data and object
Luckily, in CLR 2.0 (also C# 2.0 and VB 8.0) we have Nullable value type, which can be simply written int? in C# or Integer? in VB. Now, we know then, at least in .NET 2.0, we have matched types from database domain.
What about query in our programming language? It's simply can't match the power of the domain specific language such as SQL. Yet, we still need to work at the heart of our general programming language discipline first, the object itself.
The journey of those thinkers in Microsoft is illustrated below.
Suppose I want to query an arbitrary collection in .NET, and later I want to make it "fluent" and looks like query I used to know. Why? Because basically a table is a collection of rows, and this collection can be a CLR IEnumerable<T>.
Now, let's do a simple query of available processes on my machine, and then print it on console.
In C# 2.0:
IEnumerable<Process> processess = Process.GetProcesses();
foreach (Process proc in processess)
{ Console.WriteLine(proc.ProcessName);
}
Console.ReadLine();
In VB 8.0:
Dim processes As IEnumerable(Of Process) = Process.GetProcesses()
For Each proc As Process In processes
Console.WriteLine(proc.ProcessName)
Next
Console.ReadLine()
Oh, what about if I want to know which processes on my machine have size > 50MB? Sure I could add if statement that filter the proc in Processes before I display, like these:
in C# 2.0:
IEnumerable<Process> processess = Process.GetProcesses();
foreach (Process proc in processess)
{ if (proc.WorkingSet64 > (50 * 1024 * 1024))
Console.WriteLine(proc.ProcessName);
}
Console.ReadLine();
in VB 8.0:
Dim processes As IEnumerable(Of Process) = Process.GetProcesses()
For Each proc As Process In processes
If proc.WorkingSet64 > (50 * 1024 * 1024) Then
Console.WriteLine(proc.ProcessName)
End If
Next
Console.ReadLine()
But, this is the query that I want. A query is not like these. I decide to implement Where as a method. But how do I pass expression to be later evaluated and spawned as iterators? Fortunately, C# 2.0 has a new keyword of C# "yield" to have the compiler generates iterators. But sadly enough, VB 8.0 doesn't have it. But you can code it in C# and use it in VB anyway.
Note: This is where the language features of C# and VB seems having unfair advantages: the yield keyword is only available on C#. Why doesn't MS include this in VB? I can't find the reason anywhere, even on MSDN.
After I think for a while, I decide I have to put these into a method with accept generic. Ah, this is a suitable implementation for .NET 2.0 anonymous generic delegate. I think I want a class that contains method that can be called anywhere, and this is definitely a static class.
So, I modified my code to add this class, let's just call it Query. And the method to filter, let's just call it Where. It's perfectly suited for our intent: to filter out elements/items and make new collection with filtered items.
But first, I have to make declaration/definition of the delegate:
public delegate TResult Func<T,TResult>(T arg);
And now, here's the resulting Query class with Where method in C# 2.0:
namespace TrialLINQinNET20
{ public delegate TResult Func<T,TResult>(T arg);
public static class Query
{ public static IEnumerable<T> Where<T>(IEnumerable<T> items,Func<T,bool> predicate)
{ foreach (T item in items)
{ if (predicate(item))
yield return item;
}
}
}
}
Now, my query somehow looks nicer:
In C# 2.0:
IEnumerable<Process> processess = Process.GetProcesses();
IEnumerable<Process> listproc = Query.Where(processess,
delegate(Process p) { return p.WorkingSet64 > (50 * 1024 * 1024); });foreach (Process proc in processess)
{ Console.WriteLine(proc.ProcessName);
}
Console.ReadLine();
In VB 8.0, I can't. It simply doesn't support anonymous functions, and this including anonymous delegate.
Note: This is another aspect of VB 8.0 having unfair advantages: the anonymous delegate/function is only available on C# 2.0. Another "why doesn't MS include this in VB". But in VB 9.0 it's being done as a lambda expression in a more verbose way.
Now, I have created filter or "map" in functional programming terms. I passed the predicate as expressed in delegate.
To tell you the truth, I copy the declaration of Func<T,TResult> delegate from MSDN Library.
What about Select? If you look at the base theory of SQL, it's called a projection. As a matter of fact, projection, filter, and others are essentially part of relational algebra. And it's fun to know that it's part of set theory, which we had in elementary school. :)
Now, let's create a select method. Basically, it projects a collection of type into other collection of type. It can be the same, but it can be different. It looks almost like Where, but I have to change and modify the signature of select method to be able to map IEnumerable<T> to another type.
This is the select method:
public static IEnumerable<U> Select<T, U>(IEnumerable<T> source, Func<T, U> selector)
{ foreach (T var in source)
{ yield return selector(var);
}
}
Yes. Now, if I want to display (and select) only the ProcessName of the running process that has taken more than 50MB of my precious RAM, the code will be:
IEnumerable<Process> processess = Process.GetProcesses();
IEnumerable<Process> listproc = Query.Where(processess,
delegate(Process p) { return p.WorkingSet64 > (50 * 1024 * 1024); });
IEnumerable<String> procnames = Query.Select<Process, String>(listproc,
delegate(Process p) { return p.ProcessName; });foreach (String procname in procnames)
{ Console.WriteLine(procname);
}
Console.ReadLine();
Now... I see the class and the code is simpler to be read. I can map a collection of Process into a collection of String, since I only need to dump the ProcessName.
Please note the sequence here: I need to filter it first, then map it. If I map first and then filter it, I'll loose some information about the data (type metadata) in the collection. So where comes before select! This is different from SQL, since SQL permits select first and then filter it, and then take from a pool of data. In OOP, we have to know from what data we want to use, filter it, and then map it.
SQL seems to cook up structural type before "from" was taken place.
Local type inference revisited...
Hmm... This is somehow satisfactory, but it's full of syntactical noises, where I have to type full type declaration, especially if I have to type long type declaration of generics.
The variable's type on the left side of declaration is already known after I initialized or have been assigned with a value.
Now this is where I begin to use C# 3.0, .NET 3.5, and Visual Studio 2008, since it can infer the type of local variable. Only, it means available only on local variable of method scope. This is why it's simply named "local type inference".
Instead of writing this:
in C# 2.0:
Dictionary<String, List<Int32>> integerdict = new Dictionary<string, List<int>>();
in VB 8.0 it's somehow shorter:
Dim integerdict As New Dictionary(Of String, List(Of Integer))()
And C# 3.0 brings you:
var integerdict = new Dictionary<string, List<int>>();
Again, it means give integerdict the type of the right hand assignment or expression has. It's neat, and it's simpler.
It's still statically typed, since by then the type of integerdict is always Dictionary<string, List<int>>. Yes, it looks like dynamic language, but it's not dynamically typed, and it's not the same as Variant in VB or object in Javascript.
Now, in C# 3.0, my little query class and its implementation is:
var processess = Process.GetProcesses();
var listproc = Query.Where(processess,
delegate(Process p) { return p.WorkingSet64 > (50 * 1024 * 1024); });var procnames = Query.Select<Process, String>(listproc,
delegate(Process p) { return p.ProcessName; });foreach (String procname in procnames)
{ Console.WriteLine(procname);
}
Console.ReadLine();
Pretty neat, isn't it? But this is not query. In real query, I just can combine Where and Select. So, the procnames is a select of filtered data of process.WorkingSet64 > 50MB. How do I do this?
Then I decide to just passing the where to my projection:
var processess = Process.GetProcesses();
var procnames = Query.
Select<Process, String>(Query.Where(processess,
delegate(Process p) { return p.WorkingSet64 > (50 * 1024 * 1024); }), delegate(Process p) { return p.ProcessName; });foreach (String procname in procnames)
{ Console.WriteLine(procname);
}
Console.ReadLine();
But now it looks confusing. In reality, I should filter it first and then map it. Do where first, and then select it. The execution order is correct, but I can't just make it into a sequences.
I was thinking about extending IEnumerable to be able to do query, something like IQuery but it has to be concrete class, not an interface. But then again, it's not so practical.
But I think it would be nice to have this notion: procname.Where(...).Select(...) instead of having to combine Select and Where just like above. In programming buzzword, this way of thinking is called "fluent", since the data/value passed flows like fluid, although it's still to be arranged in a sequence.
Simple sample of fluent thinking:
String name="Eriawan ";
String upperstring = name.ToUpper().Trim();
See? The string is passed. Now, it's more fluid.
Ahhh, it's "fluent" now
Luckily, there's this feature: method extension in C# 3.0 and VB 9.0. I can just "extend" existing class especially "locked" class or classes which has modifier of sealed in C# or NotInheritable in VB.
How to do this?
In C# 3.0:
- Create static class with static methods, with the first parameter is the type you want to add method.
- Put "this" keyword in the first parameter of the static method
In VB 9.0:
- Create new module
- Create method (either Sub or Function) with Public modifier and give the method <Extension()> attribute
And you're on!
So, my Query class in C# 3.0 would be:
public static class Query
{ public static IEnumerable<T> Where<T>(this IEnumerable<T> items, Func<T, bool> predicate)
{ foreach (T item in items)
{ if (predicate(item))
yield return item;
}
}
public static IEnumerable<U> Select<T, U>(this IEnumerable<T> source, Func<T, U> selector)
{ foreach (T item in source)
{ yield return selector(item);
}
//return source;
}
}
Now, the implementations will be:
var processess = Process.GetProcesses();
var procnames = processess
.Where(delegate(Process p) { return p.WorkingSet64 > (50 * 1024 * 1024); }) .Select(delegate(Process p) { return p.ProcessName; });foreach (String procname in procnames)
{ Console.WriteLine(procname);
}
Console.ReadLine();
Yayy! It's more fluent! But... I think it's still noisy. Just look at the delegate syntax on Where and Select. It's simply a function with a parameter and an expression. I want to be more expressive.
Now is the close view of Lambda
Now, this is where Lambda Expression comes to help. Lambda expression, as Norman said in commenting blog entry :), is just basically writing inline function. Lambda expression is just a simple synonym of Lambda Calculus, which is one of the basic of functional programming.
Let's just go down to transform our delegate parameters above, into these:
var procnames = processess
.Where((Process p) => { return p.WorkingSet64 > (50 * 1024 * 1024); }) .Select((Process p) => { return p.ProcessName; });
The fat arrow "=>" is "goes to". It can be simplified as inline function that has parameter p and then "p -> p.ProcessName".
Thanks to local type inference, the lambda parameter type can be omitted:
var procnames = processess
.Where((p) => { return p.WorkingSet64 > (50 * 1024 * 1024); }) .Select((p) => { return p.ProcessName; });
If the parameter is just one, the parentheses () can be omitted too. Also, since the return value is just a simple expression, I can just make it more expressive and looks more functional instead of writing an imperative return statement.
Now the code above looks like this:
var procnames = processess
.Where(p => p.WorkingSet64 > (50 * 1024 * 1024))
.Select(p => p.ProcessName);
Now, the noises are gone. My code looks functional!
But it's still not the query I used to be, I know it's closer. It would be nice to have select and where as keywords, and be integrated into our code.
Now, C# 3.0 and VB 9.0 include this new feature: query comprehension.
Query that is so integrated into our code: Query comprehension
The code before now is transformed using this query comprehension into:
var procnames = from p in processess
where p.WorkingSet64 > (50 * 1024 * 1024)
select p.ProcessName;
and in VB 9.0:
Dim procnames = From p As Process In Process.GetProcesses() _
Where p.WorkingSet64 > (50 * 1024 * 1024) _
Select p.ProcessName
But the LINQ wave doesn't stop here! There are many other discrete cool features in C# 3.0 and VB 9.0 that forms LINQ.
(to be continued...)