protoship

An invitation to ReasonML

Peter Deutsch once quipped that if you get the data structures and their invariants right, most of the code will just kind of write itself. If I had to pick just one nugget of learning from my career in programming so far, that would be it.

The effects of a poorly designed model cascades across the codebase, forcing us to litter it with special cases. That is probably why Linus Torvalds once claimed that the difference between a bad programmer and a good one is whether they consider their code or their data structures more important. Terry Crowley's article Education of a Programmer makes a similar observation about how important the data flow is in the design of a system:

Jon Devaan used to say “design the data, not the code". This also generally means when looking at the structure of a system, I’m less interested in seeing how the code interacts — I want to see how the data interacts and flows. If someone tries to explain a system by describing the code structure and does not understand the rate and volume of data flow, they do not understand the system.

I decided to learn ReasonML because I wanted to work in a language that revolved around data. I had become dissatisfied with programming in dynamic languages like Ruby and Javascript, and all my code started feeling fragile and rickety as soon as it was written. To build large systems, I needed to have a crisp, confident knowledge of the shape and flow of data across the codebase at all times. But the unpredictability inherent in dynamic languages made this impossible.

For example, when working on a large React component, you have to guess at what attributes are available in props and state, what their keys are, and what shape their values take. We're often passing these large bags of values around, and to know what is inside one, we have to reach into the function from which it came, and often trace it multiple levels up for a complete picture. This becomes harder as applications grow, and we’re often left to work with a foggy understanding of what’s going on.

I wanted to be able to grok the shape and flow of data across the system, write pure functions to operate on them, and organize everything through flexible namespaces without rigid class hierarchies. And ReasonML, which is just OCaml with a more accessible syntax and first-class support, was the perfect and pragmatic choice.

What is ReasonML?

ReasonML is OCaml with a simplified syntax. It uses OCaml’s own compiler, with the exception of a Reason-specific frontend to lex and parse the syntax. Consider it as a thin coat of paint on a powerful and battle-tested language.

And it runs on the browser! This is made possible by BuckleScript, which produces clean performant Javascript code from the optimized IR emitted by the OCaml compiler. This means we can simply write OCaml on the browser today, and it is a wonderful choice if you already know the language. But for programmers coming from other languages, the large surface area of OCaml's syntax can take some time to get used to. Reason fixes this with a minimalist language that often reads like ES6 while giving us the full power of OCaml. And since BuckleScript works on OCaml’s IR, it works without a hitch for Reason as well.

I chose to use Reason over vanilla OCaml syntax because I found it easier to read and write. Its authors are the original folks behind React, and they treat programming on the browser as an important use-case of the language. BuckleScript has a powerful foreign function interface to both native OCaml and Javascript. This allows us to easily reuse our existing Javascript code with some simple FFI annotations. Reason uses all this to good effect and makes it a breeze to use libraries from the npm ecosystem.

If we can look beyond the difference in syntax - mostly stylistic improvements than any semantic change, then any discussion about ReasonML is in truth a discussion about OCaml itself. The concepts even apply equally to Microsoft’s F#, which originated as an OCaml implementation for .NET. The syntax, type system, immutability, and functional nature of OCaml is also quite similar to Haskell and languages inspired from it like PureScript and Elm. So by learning OCaml, we're actually getting our foot wet in the varied and wonderful world of statically typed functional programming.

In the following sections I'll use the simplified Reason syntax for code snippets, but under the hood it is all OCaml.

The big deal: types around data instead of classes

When we talk about static typing we're usually thinking about object-oriented type systems as found in Java, C++, and similar languages. But their type systems are rigid and hard to use, compared to the ones found in functional languages like OCaml and Haskell.

This is because in object-oriented type systems, types are hopelessly intertwined with the notion of a class. Types in OCaml however are purely about the shape and structure of data. Here’s an example:

type user = {name: string, email: string};
let show_details u => {print_string (u.name ^ " "  ^ u.email);};

Now every time we create a record with the fields name and email, OCaml will automatically tag them as belonging to the type user. It also figures out that the parameter u in the function show_details should be of type user. This deduction is done using the Hindley–Milner type inference algorithm, and forms the backbone of the powerful static typing in most statically typed functional languages, including Haskell.

All this means the following code will work fine without any explicit type annotation:

show_details {name: "OCaml", email: "o@ocaml.org"};

But what would happen with this code?

show_details {name: "OCaml"};

It won’t even compile. Here is the error message:

Error: Some record fields are undefined: email.

But if it was a dynamically typed language, show_details would have run without complaints by substituting a null or an undefined for email. That is the ricketiness I spoke about in the opening section. We never know what data is coming in, except when the system fails at runtime. This nagging uncertainty is ever-present in even the most well-tested codebases.

Due to its ability to type the shape and structure of data, we can write OCaml code knowing that if it compiles, then the data we're passing around inside are all fine. It will fail not just when required keys are absent, but even when extra keys are present. Let’s see what happens when I invoke the method with a new key user_type.

show_details {name: "OCaml", email: "o@ocaml.org", user_type: "company"};

The compile time error reads like this:

Error: This record expression is expected to have type user (but) The field user_type does not belong to type user

This stuff is iron-clad!

Sum types

The next big thing in the type system is the sum type, which are called variants in OCaml. We can specify that a piece of data can be one among many types. This lets us write code that never forgets an edge case, without even trying! Any function that operates on a variant must handle all its possible variations. Or it simply won't compile!

Here is an example inspired from Real World OCaml. We'll define color_name as a variant and write a function to map it into a hex value.

type color_name = Blue | Black | Green;
let color = Black;

Let's now write the function to return its hex value. This is where Pattern Matching comes into play. Here's our first go:

let toHex colorName =>
  switch colorName {
  | Blue => "#0000ff"
  };

OCaml uses the return value of the last statement as the return value of the function itself. So in this case, if we invoke toHex Blue we should get #0000ff. But what about the other colors? The compiler will tell us that we forgot to handle them with this warning:

Warning 8: this pattern-matching is not exhaustive. Here is an example of a value that is not matched: (Black|Green)

If we use the recommended compiler flags (and we should), these warnings would become errors and our code wouldn't even compile. We now have to write the correct code that handles all possible variations to appease the compiler.

let toHex colorName =>
  switch colorName {
  | Blue => "#0000ff"
  | Black => "#000000"
  | Green => "#00ff00"
  };

The function toHex now handles all the possible colors and OCaml will compile it without a fuss. Please note that this is a simplistic example. The power of variants becomes apparent as it permeates a growing codebase, allowing you to fearlessly refactor it.

The option type

Pattern matching is one of the unusual things I had to get used to when learning OCaml, and it took me some time to come to grips with it. It however can be quite pervasive in OCaml codebases especially because of option types, which are also called Maybe in other languages.

Option types eliminate nulls from the language, and instead forces us to deal with every possibility of absent data explicitly through pattern matching. This is enforced at compile-time so we'll never run into null errors at runtime.

Take a look at this code:

type user_name = option string;
type user = { name: user_name, email: string };

Here we've defined user_name to be an optional string. The option type is internally defined by OCaml as type option 'a = None | Some 'a. The 'a in the definition means it can be a value of any type; this is called polymorphic typing, and is similar to generics from Java. In our case 'a is a string. Let's see what that means:

let mort = {name: "Mort", email: "mort@hogfather.com"};

This statement will fail with the error

Error: This expression has type string but an expression was expected of type user_name = option string

Let us unpack that a bit. We told OCaml that user_name is an optional value, but we just passed in a bare string. It has to be written instead as:

let mort = {name: Some "Mort", email: "mort@hogfather.com"};

Now we're explicitly acknowledging the optional nature of user_name, and confirming that there is a string value inside it - the string "Mort". But if Mort wanted to be secretive about his name, we could have instead written:

let mort = {name: None, email: "mort@hogfather.com"};

Let's now write a function that prints the name of the user. First attempt:

let print_details u => print_string u.name;

We get the reverse of our previous error:

Error: This expression has type user_name = option string but an expression was expected of type string

What OCaml is telling us is that the print_string method expects a plain old string, but we gave it an option string which it doesn't know what to do with. Let's pattern match!

let print_details u => {
  let name =
    switch u.name {
    | Some name => name
    | None => "Nobody home!"
    };
  print_string name;
};

The pattern matching extracted a bare string from the option type and bound it to name. Now it is a simple string that print_string can print and we're fine!

By using the option type, we made it clear to the compiler that user_name could be empty, or have a value inside it. Now everytime we do something with it, OCaml will ensure that we deal with both the possibilities and tell us at compile-time itself if we don't.

Imagine if we had this level of protection across our codebase. Every possibility handled, no edge case left behind. It would make adding new variations a fearless activity. Because the moment something new comes in - imagine we discover a new shade of grey and add it to our color_name variant. In a dynamically typed language we'll have to find out which all parts of the codebase this change will affect before fixing them. This is a manual and error-prone activity, and we might discover cases that we missed only at runtime. But with static typing and variants, the compiler can figure out all the places that deals with color_name and tell us where the exhaustiveness check fails so we can simply follow its lead and fill in the gaps.

Bearing Testimony

Jane Street is one among the largest production users of OCaml, and they rely on OCaml's powerful type system to write correct code even under the immense pressures of a trading desk. Yaron Minsky gives a beginners overview in this introductory talk:


In the "Effective ML" talk he describes among other things how they use variants to help prevent illegal states in the system.



Sean Grove's talk about OCaml and ecosystem shows the versatility and ubiquity of the language. TL;DR: OCaml can run on phones, can be built as a unikernel and run without an operating system, can be a compiled to binaries for multiple architectures, and can run on the browser as well!



I recently asked about the ease of refactoring Haskell code in r/haskell, and their answers apply equally well to OCaml because of similarities in the type system. Here are a few edited snippets:

One of the things I love about haskell is the security of having types around enables me to brutally refactor my code. OTOH, you just never see someone rip out a core piece of functionality and replace it with a cleaner API in python, because all of the subsequent failures it'd induce would be caught at runtime, in production by its users.

That response came from Edward Kmett who might have a bit of a bias towards the language :)

I've found Haskell is the only language where I consistently leave code cleaner than I found it. I can just do so much refactoring without worrying about breaking anything, just following some simple syntactic rules, I can move stuff around knowing a priori it won't change the semantics.

That is the kind of experience I'm also hoping to get from programming in OCaml.

I'm not going to say our codebase at work is pretty, it's far from it. Good engineering practices still apply - code reviews and consist style are important, and the compiler will not enforce them upon you. However, we all acknowledge our code could be cleaner but ultimately - it hasn't been a huge issue, and here's where Haskell is different. We have three different ways to access data from our database, and all sorts of old approaches to doing things - but it all just works. Not "just works" in the sense that we have rigorously checked all behavior with tests - we don't have any tests. The types have been sufficient, and when combined with a terrific compiler that does a good job of removing the cost of abstractions, we don't mind stacking two approaches on top of each other.

The flip side of things - and perhaps the reason we are fairly "careless" when it comes to code cleanliness - is that it's so easy to tidy up. It might be time consuming, but it's rarely a difficult job: rip something out, follow the type errors until it compiles, and with a little bit of testing you're probably done!

If all of this whets your appetite, let's get started with Reason today.

Getting Started

The Reason team has put in a lot of effort to help us get started with the platform easily. You can follow their instructions from the Javascript Workflow section in the Reason website to start with a simple command-line based Reason application.

But my preferred way is to simply clone the reason-react-example repo and start tinkering with the examples there. It even has a TodoMVC written in reason-react that lets us get started with front-end development with Reason right away. The repo is self-contained and npm install also brings in the Reason compiler toolchain. All you need to do is to follow the simple steps in its README, and you're set!

The Tools page explains how to setup editor integration for Reason, and covers most mainstream editors. I use VSCode and its Reason plugin works very well. With the editor integration you get autocomplete, jump to definition, real-time errors, and code formatting. The tooling support is all-around excellent!

Reason has a REPL in which you can try out the code snippets from this post. It is called rtop and to get it on your machine, just install reason-cli by following instructions from the Workflow Setup page.

You should read through the Reason homepage to get a feel for the language, but the docs are incomplete and don't have a guided tutorial yet. However, almost any documentation that applies to OCaml applies to Reason as well. There are many such resources out there and the one you'll most often encounter would be the free online version of Real World OCaml by Minsky, Madhavapeddy, and Hickey. I bought the book and dutifully read through most of it to build a mental map of the language, and it helped a ton! You can even learn concepts from F Sharp for Fun and Profit and a lot of it will be applicable to OCaml as well.

Reason comes bundled with the OCaml stdlib. They contain all the common data structures like List, Map, and Array, and OCaml's authors have written a user manual and documentation that covers the language, its ecosystem, and the standard library. Reading it can help you build a foundational understanding of the language.

You should also check out reason-tools, a syntax converter than can convert between vanilla OCaml and Reason syntaxes. It can be very useful as you're learning through OCaml documentation and want to apply the code samples to Reason.

Another useful resource is GitHub's code search. You can search for OCaml codebases and files with the .re extension to find existing codebases and read through them. And check out the Awesome ReasonML collection maintained by Ramana Venkata which curates Reason-related resources on the internet. Cheng Lou, one of the people behind ReasonML, describes how ReasonML moves implicit meaning often hidden in the code, and assumptions on how it should be written well, all into the language itself in his talk Taming the Meta Language.

Support and Salutations

Don't forget to join the Discord Reason channel. It is very active and the support I got there helped me move forward every time I got stuck.

The Reason community is nascent but very vibrant. It has attracted serious programmers who want a pragmatic programming language that allows mutation but encourages immutable functional programming and static types. In Jordan Walke's own words:

I'd like to use a safe, statically typed, high performance language on the job, so I can spend my weekends with friends and family instead of getting my "fix" of a decent language on the weekends.

What we want to achieve, is that we can get the most valuable parts of FP, without a lot of the FP religion, and ship stuff to production that our mainstream peers can make sense of. People should be able to just read Reason and figure out what's going on, how to make a change quickly, and contribute to our projects.

I think the language is going just there and I'm having a lot of fun programming with it. I'm glad you've read this far and I hope you'll give Reason a spin! If you have any comments or questions, please tweet to me at @jasim_ab, or email in at jasim@protoship.io.

Protoship is a collaboration between Sherin and Jasim. We make web applications with Rails and React, and enjoy automating design and development through custom-built tools. Email us at hello@protoship.io. We're available for hire.