Providence Salumu
If you’ve been programming in a dynamic language, you’ve probably heard that type systems can catch more errors before your application even gets run. The more powerful the type system is, the more you can express in it. And because we’re talking about Haskell, we have a great number of tools at our disposal when trying to express things in terms of the types.
Why is this important? Sometimes a function has an expectation about the value that it’s receiving. In most imperative languages those expectations are implicit and up to the programmer to hold, such as the following
def foo(bar)
bar.baz
end
In this example the function foo
implicitly expects an object which is not
nil
. If you call foo(nil)
, you’ll get an exception at runtime. To combat
this we usually write unit tests to verify that our system will never get into
such state that the function would get passed in a nil
. Now this is a very
simple example, let’s take a look at a more complicated one.
Imagine you’re writing a service which receives messages from users, encrypts them, and sends them on through an unsecured channel. The messages are both being sent and received as base64 encoded strings, so you can’t easily tell if a message has been encrypted by just inspecting it.
Here’s how we could represent the message in Haskell and in Ruby, just so that we can compare the code.
data Message = Message String
class Message
attr_accessor :text
def initialize(text)
@text = text
end
end
Now this is all well and good, but we also want to keep track if the message
has been encrypted or if it is still in plain text. To do this in Haskell we’ll
use a simple Algebraic Data Type, while in Ruby we’ll add an additional
attribute called encrypted
, which will default to false
.
data Message = PlainText String | Encrypted String
class Message
attr_accessor :text, :encrypted
def initialize(text)
@text = text
@encrypted = false
end
end
While the Haskell version is less verbose, it doesn’t give us much more safety guarantees at this point. Let’s say we want to define a function which sends a message. We want it only to accept a message that has been encrypted, since sending a plain text message is unsafe and should not be allowed.
def send_message(message, recipient)
if message.encrypted
# send logic
else
raise ArgumentError, "Can’t send a plain text message"
end
end
send :: Message -> Recipient -> IO ()
send (Encrypted m) recipient = some magic with m
send (PlainText _) _ = undefined
It doesn’t really matter how we chose to represent this in Haskell. Even if we
used a Maybe
or Either
to handle the failure, we would still have to handle
this at runtime. Which means only one thing, this function needs to be for the
edge case that we pass in a message in an invalid state, and we would also need
to test the error handling. This is as far as we can go with Ruby, since
there’s no way to enforce more structure into the program.
But wouldn’t it be much nicer if a program that’s trying to call send
with
PlainText
message would get rejected by the type checker? Such program is not
valid in our business domain and it shouldn’t compile. If we manage to do that,
we can save ourselves the error handling, and also writing tests for the error
handling.
To be able to do this we need to express the relationship between the
Encrypted
message and the send
function at the type level. The trick that
allows us to do this is called Phantom Types
, but to understand those, first
let’s take a look at simple parametric data types in Haskell. They are very
similar to templates or generics in C++/C#/Java and many other languages.
Here’s a simple parametric type:
data Maybe a = Just a | Nothing
The a
on the left side is simply a type parameter. If we choose to create a
value such as Just 3
, it would have the type of Maybe Int
.
A type is called a Phantom Type if it has a type parameter which only appears
on the left hand side, but is not used by any of the value constructors. Here’s
how we could need to modify our Message
type to make it into a Phantom Type.
data Message a = Message String
This allows us to have things like Message Int
, Message String
, Message
(Maybe Char)
, and so on. In itself it might not look appealing, since no
matter what type we use it will still have a single value constructor which
works with String
s. But let’s expand this further by adding two empty data
types, one for each type of the message.
data Encrypted
data PlainText
This gives us an option to create both Message Encrypted
and Message
PlainText
types. Remember that even if we’re not using the type parameter in
any of the constructors, it is still verified by the type system, which means
we can change our send
function to have the following signature.
send :: Message Encrypted -> Recipient -> IO ()
encrypt :: Message PlainText -> Message Encrypted
decrypt :: Message Encrypted -> Message PlainText
The last thing we would need to do to make this completely safe is to make the
constructor for Message
private and only export a function for creating a new
instance of the type. This makes it impossible to change the state of the
Message
type in any other way, but by using our encrypt
and decrypt
functions, because you wouldn’t be able to use pattern matching to extract the
inner value. The function for creating a new Message
could look something
like this
newMessage :: String -> Message PlainText
newMessage s = Message s
Now armed with the power of Phantom Types, the following would be rejected by the type system, making it impossible to send plain-text messages.
send (newMessage "hello!") "john@example.com"
A similar thing could also be implemented using Generalised Algebraic Data Types (GADTs), but that’s in the scope of this article. If you’re interested in learning more, I recommend checking out the Haskell Wiki article about Phantom Types, which has some great examples, or the WikiBooks entry.
Update: As it was just pointed out in the comments on Lobste.rs, it’s worth noting that all of this safety guarantee comes for free. The types are stripped when the program type checks and compiles, so there is no runtime overhead. This might be something not so obvious to people used to programming in dynamic languages.
Subscribe to receive updates and free content from the book. You'll also get a discount when the final version of the book is released.