Type Level Merge Sort (Haskell)
The recently presented Haskell library superrecord is still under heavy development and making great progress. While working on it we noticed that application code using the library would become very slow to compile when the record size exceeded 10 fields. Without any deeper thought, I guessed that the current type level insertion sort (which is O(n^2)
in worse case complexity) was at fault. This turned out to be wrong, but I still implemented a more efficient merge sort at type level and would like to share the journey as it was quite fun.
Motivation
Just for as a quick recall, records in superrecord are represented as lists of type level pairs, one holding the key and one the type of the value. For example:
(For more information see the original post)
Our goal is now to write a type family that given a type level list like '["int" := Int, "foo" := String]
, it returns '[ "foo" := String, "int" := Int]
("int" > "foo"
). Thus we want to implement:
The reason we need this is that the user does not need to provide the fields in order of the type when construction a record.
Basics
If we look at a Wikipedia implementation of a “Top-down implementation using lists”, we read:
Pseudocode for top down merge sort algorithm which recursively divides the input list into smaller sublists until the sublists are trivially sorted, and then merges the sublists while returning up the call chain.
Thus, we need a mechanism for breaking type level list into “smaller sublists”, in our case we’ll break all lists into halves. There are multiple ways to do this, here’s one:
First, we describe a type level take
and a type level drop
function:
These are very similar and straight forward to implement. For take we take one element at a time from the list until the desired length is reached, for drop we drop elements and then return the rest. We can now write a small unit test for these functions:
If we implemented everything correctly, this should just compile. Otherwise, a “overlapping patterns” warning will be produced, and turned into an error using -Werror
.
As you recall, we want to break lists into halves, so we need to know the length:
Division
Now here comes an interesting part: To actually break the list in half using the combinators above, we need to divide a type level number by two. This is currently not provided by the GHC.TypeLits
module, so we have to roll our own. Before starting, I figured that a high-level combinator like
could be very useful. But the problem with a combinator like this at the type level - especially when using it with recursion - is that it’s not “lazy”. This means, both branches will get fully reduced, so you can not use this to check for a terminating condition as it will just recurse forever. In a sense, writing type families is just stating reduction rules. So you have to implement this checking inline in your type family:
Note that due to the reduction limitation of the type checker (201
by default), with this idea we can only divide numbers by two up to n = 793
. This will be fine for our use case.
Merging
With this in place, the last combinator missing is the list merge combinator:
Again, we used a helper to remove the need of an if-then-else
and depending on the key of our (key := type)
tuple we merge the head of either the left or the right list.
Putting it all together
Now that we can merge and split lists as required for the merge sort, we can implement the full sorting algorithm:
The helper is used to prevent the duplicate reduction of HalfOf (LengthOf xs)
. And that’s really all that there is to it:
It works! One could probably generalize the FieldListSort
to a MergeSortBy
that allows sorting lists of [k]
provided a comparator function f -> f -> Ordering
and a mapper k -> f
to extract the sorting criterion.
Alternative approaches
- Instead of using the
drop
/take
approach with the length, one could also implement at type family that takes a list taking the two first elements at a time and putting them into different components of a tuple. - The
singletons
package is probably a great fit for this type of problem