Real World OCaml, by Yaron Minsky, Anil Madhavapeddy and Jason Hickey

Buy in print and eBook.

Chapter 17. Data Serialization with S-Expressions

S-expressions are nested parenthetical expressions whose atomic values are strings. They were first popularized by the Lisp programming language in the 1960s. They have remained one of the simplest and most effective ways to encode structured data in a human-readable and editable form.

There's a full definition of s-expressions available online. An example s-expression might look like this:

(this (is an) (s expression))
Scheme ∗ sexpr/basic.scmall code

S-expressions play a major role in Core, effectively acting as the default serialization format. Indeed, we've encountered s-expressions multiple times already, including in Chapter 7, Error Handling, Chapter 9, Functors, and Chapter 10, First-Class Modules.

This chapter will go into s-expressions in more depth. In particular, we'll discuss:

We'll tie this together at the end of the chapter with a simple s-expression formatted configuration file for a web server

Basic Usage

The type used to represent an s-expression is quite simple:

module Sexp : sig
  type t =
  | Atom of string
  | List of t list
end
OCaml ∗ sexpr/sexp.mliall code

An s-expression can be thought of as a tree where each node contains a list of its children, and where the leaves of the tree are strings. Core provides good support for s-expressions in its Sexp module, including functions for converting s-expressions to and from strings. Let's rewrite our example s-expression in terms of this type:

# Sexp.List [
    Sexp.Atom "this";
    Sexp.List [ Sexp.Atom "is"; Sexp.Atom "an"];
    Sexp.List [ Sexp.Atom "s"; Sexp.Atom "expression" ];
  ];;
- : Sexp.t = (this (is an) (s expression))

This prints out nicely because Core registers a pretty printer with the toplevel. This pretty printer is based on the functions in Sexp for converting s-expressions to and from strings:

# Sexp.to_string (Sexp.List [Sexp.Atom "1"; Sexp.Atom "2"]) ;;
- : string = "(1 2)"
# Sexp.of_string ("(1 2 (3 4))") ;;
- : Sexp.t = (1 2 (3 4))

In addition to providing the Sexp module, most of the base types in Core support conversion to and from s-expressions. For example, we can use the conversion functions defined in the respective modules for integers, strings, and exceptions:

# Int.sexp_of_t 3;;
- : Sexp.t = 3
# String.sexp_of_t "hello";;
- : Sexp.t = hello
# Exn.sexp_of_t (Invalid_argument "foo");;
- : Sexp.t = (Invalid_argument foo)

It's also possible to convert more complex types such as lists or arrays that are polymorphic across the types that they can contain:

# List.sexp_of_t;;
- : ('a -> Sexp.t) -> 'a list -> Sexp.t = <fun>
# List.sexp_of_t Int.sexp_of_t [1; 2; 3];;
- : Sexp.t = (1 2 3)
OCaml Utop ∗ sexpr/to_from_sexp.topscript , continued (part 1) ∗ all code

Notice that List.sexp_of_t is polymorphic and takes as its first argument another conversion function to handle the elements of the list to be converted. Core uses this scheme more generally for defining sexp converters for polymorphic types.

The functions that go in the other direction, i.e., reconstruct an OCaml value from an s-expression, use essentially the same trick for handling polymorphic types, as shown in the following example. Note that these functions will fail with an exception when presented with an s-expression that doesn't match the structure of the OCaml type in question.

# List.t_of_sexp;;
- : (Sexp.t -> 'a) -> Sexp.t -> 'a list = <fun>
# List.t_of_sexp Int.t_of_sexp (Sexp.of_string "(1 2 3)");;
- : int list = [1; 2; 3]
# List.t_of_sexp Int.t_of_sexp (Sexp.of_string "(1 2 three)");;
Exception: (Sexplib.Conv.Of_sexp_error (Failure "int_of_sexp: (Failure int_of_string)") three).
OCaml Utop ∗ sexpr/to_from_sexp.topscript , continued (part 2) ∗ all code

Generating S-Expressions from OCaml Types

But what if you want a function to convert a brand new type to an s-expression? You can of course write it yourself manually. Here's an example:

# type t = { foo: int; bar: float } ;;
type t = { foo : int; bar : float; }
# let sexp_of_t t = let a x = Sexp.Atom x and l x = Sexp.List x in l [ l [a "foo"; Int.sexp_of_t t.foo ]; l [a "bar"; Float.sexp_of_t t.bar]; ] ;;
val sexp_of_t : t -> Sexp.t = <fun>
# sexp_of_t { foo = 3; bar = -5.5 } ;;
- : Sexp.t = ((foo 3) (bar -5.5))

This is somewhat tiresome to write, and it gets more so when you consider the parser, i.e., t_of_sexp, which is considerably more complex. Writing this kind of parsing and printing code by hand is mechanical and error prone, not to mention a drag.

Given how mechanical the code is, you could imagine writing a program that inspected the type definition and autogenerated the conversion code for you. As it turns out, Sexplib does just that. Sexplib, which is included with Core, provides both a library for manipulating s-expressions and a syntax extension for generating such conversion functions. With that syntax extension enabled, any type that has with sexp as an annotation will trigger the generation of the functions we want:

# type t = { foo: int; bar: float } with sexp ;;
type t = { foo : int; bar : float; } val t_of_sexp : Sexp.t -> t = <fun> val sexp_of_t : t -> Sexp.t = <fun>
# t_of_sexp (Sexp.of_string "((bar 35) (foo 3))") ;;
- : t = {foo = 3; bar = 35.}

The syntax extension can be used outside of type declarations as well. As discussed in Chapter 7, Error Handling, with sexp can be attached to the declaration of an exception, which will improve the ability of Core to generate a useful string representation:

# exception Bad_message of string list ;;
exception Bad_message of string list
# Exn.to_string (Bad_message ["1";"2";"3"]) ;;
- : string = "(\"Bad_message(_)\")"
# exception Good_message of string list with sexp;;
exception Good_message of string list
# Exn.to_string (Good_message ["1";"2";"3"]) ;;
- : string = "(//toplevel//.Good_message (1 2 3))"
OCaml Utop ∗ sexpr/auto_making_sexp.topscript , continued (part 1) ∗ all code

You don't always have to declare a named type to create an s-expression converter. The following syntax lets you create one inline, as part of a larger expression:

# let l = [(1,"one"); (2,"two")] ;;
val l : (int * string) list = [(1, "one"); (2, "two")]
# List.iter l ~f:(fun x -> <:sexp_of<int * string>> x |> Sexp.to_string |> print_endline) ;;


(1 one) (2 two) - : unit = ()

The declaration <:sexp_of<int * string>> simply gets expanded to the sexp converter for the type int * string. This is useful whenever you need a sexp converter for an anonymous type.

The syntax extensions bundled with Core almost all have the same basic structure: they autogenerate code based on type definitions, implementing functionality that you could in theory have implemented by hand, but with far less programmer effort.

The Sexp Format

The textual representation of s-expressions is pretty straightforward. An s-expression is written down as a nested parenthetical expression, with whitespace-separated strings as the atoms. Quotes are used for atoms that contain parentheses or spaces themselves; backslash is the escape character; and semicolons are used to introduce single-line comments. Thus, the following file, example.scm:

;; example.scm ((foo 3.3) ;; This is a comment (bar "this is () an \" atom"))

can be loaded using Sexplib. As you can see, the commented data is not part of the resulting s-expression:

# Sexp.load_sexp "example.scm" ;;
- : Sexp.t = ((foo 3.3) (bar "this is () an \" atom"))

All in, the s-expression format supports three comment syntaxes:

;

Comments out everything to the end of line

#|, |#

Delimiters for commenting out a block

#;

Comments out the first complete s-expression that follows

The following example shows all of these in action:

;; comment_heavy_example.scm ((this is included) ; (this is commented out (this stays) #; (all of this is commented out (even though it crosses lines.)) (and #| block delimiters #| which can be nested |# will comment out an arbitrary multi-line block))) |# now we're done ))

Again, loading the file as an s-expression drops the comments:

# Sexp.load_sexp "comment_heavy.scm" ;;
- : Sexp.t = ((this is included) (this stays) (and now we're done))
OCaml Utop ∗ sexpr/example_load.topscript , continued (part 1) ∗ all code

If we introduce an error into our s-expression, by, say, creating a file broken_example.scm which is example.scm, without open-paren in front of bar, we'll get a parse error:

# Exn.handle_uncaught ~exit:false (fun () ->
    ignore (Sexp.load_sexp "example_broken.scm")) ;;


Uncaught exception: (Sexplib.Sexp.Parse_error ((location parse) (err_msg "unexpected character: ')'") (text_line 4) (text_char 29) (global_offset 78) (buf_pos 78))) - : unit = ()
OCaml Utop ∗ sexpr/example_load.topscript , continued (part 2) ∗ all code

In the preceding example, we use Exn.handle_uncaught to make sure that the exception gets printed out in full detail. You should generally wrap every Core program in this handler to get good error messages for any unexpected exceptions.

Preserving Invariants

The most important functionality provided by Sexplib is the autogeneration of converters for new types. We've seen a bit of how this works already, but let's walk through a complete example. Here's the source for a simple library for representing integer intervals, very similar to the one described in Chapter 9, Functors:

(* Module for representing closed integer intervals *)
open Core.Std

(* Invariant: For any Range (x,y), y >= x *)
type t =
  | Range of int * int
  | Empty
with sexp

let is_empty =
  function 
  | Empty -> true 
  | Range _ -> false

let create x y =
  if x > y then
    Empty 
  else 
    Range (x,y)

let contains i x =
  match i with
  | Empty -> false
  | Range (low,high) -> x >= low && x <= high

We can now use this module as follows:

open Core.Std

let intervals =
  let module I = Int_interval in
  [ I.create 3 4;
    I.create 5 4; (* should be empty *)
    I.create 2 3;
    I.create 1 6;
  ]

let () =
  intervals
  |> List.sexp_of_t Int_interval.sexp_of_t
  |> Sexp.to_string_hum
  |> print_endline

But we're still missing something: we haven't created an mli signature for Int_interval yet. Note that we need to explicitly export the s-expression converters that were created within the ml file. For example, here's an interface that doesn't export the s-expression functions:

type t

val is_empty : t -> bool
val create : int -> int -> t
val contains : t -> int -> bool

Building this will give us the following error:

$ corebuild test_interval_nosexp.native
File "test_interval_nosexp.ml", line 14, characters 20-42:
Error: Unbound value Int_interval.sexp_of_t
Command exited with code 2.

We could export the types by hand in the signature, by writing the signatures for the extra functions generated by Sexplib:

open Core.Std

type t
val t_of_sexp : Sexp.t -> t
val sexp_of_t : t -> Sexp.t

val is_empty : t -> bool
val create : int -> int -> t
val contains : t -> int -> bool

This isn't an ideal solution, as it makes you repeatedly expose these extra functions in every signature you create where you want to serialize values. Sexplib solves this by exposing the same syntax extension in signature definitions so that we can just use the same with shorthand in the mli file. Here's the final version of the signature that does just this:

type t with sexp

val is_empty : t -> bool
val create : int -> int -> t
val contains : t -> int -> bool

At this point, test_interval.ml will compile again, and if we run it, we'll get the following output:

$ corebuild test_interval.native
$ ./test_interval.native
((Range 3 4) Empty (Range 2 3) (Range 1 6))

One easy mistake to make when dealing with sexp converters is to ignore the fact that those converters can violate the invariants of your code. For example, the Int_interval module depends for the correctness of the is_empty check on the fact that for any value Range (x,y), y is greater than or equal to x. The create function preserves this invariant, but the t_of_sexp function does not.

We can fix this problem by overriding the autogenerated function and writing a custom sexp converter that wraps the autogenerated converter with whatever invariant checks are necessary:

type t =
  | Range of int * int
  | Empty
with sexp

let create x y =
  if x > y then Empty else Range (x,y)

let t_of_sexp sexp =
  let t = t_of_sexp sexp in
  begin match t with
    | Empty -> ()
    | Range (x,y) ->
      if y < x then of_sexp_error "Upper and lower bound of Range swapped" sexp
  end;
  t

This trick of overriding an existing function definition with a new one is perfectly acceptable in OCaml. Since t_of_sexp is defined with an ordinary let rather than a let rec, the call to the t_of_sexp goes to the Sexplib-generated version of the function, rather than being a recursive call.

Another important aspect of our definition is that we call the function of_sexp_error to raise an exception when the parsing process fails. This improves the error reporting that Sexplib can provide when a conversion fails, as we'll see in the next section.

Getting Good Error Messages

There are two steps to deserializing a type from an s-expression: first, converting the bytes in a file to an s-expression; and the second, converting that s-expression into the type in question. One problem with this is that it can be hard to localize errors to the right place using this scheme. Consider the following example:

open Core.Std

type t = { 
  a: string;
  b: int;
  c: float option 
} with sexp

let run () =
  let t =
    Sexp.load_sexp "foo_broken_example.scm"
    |> t_of_sexp
  in
  printf "b is: %d\n%!" t.b

let () =
  Exn.handle_uncaught ~exit:true run

If you were to run this on a malformatted file, say, this one:

((a "not-an-integer") (b "not-an-integer") (c 1.0))

you'll get the following error:

$ corebuild read_foo.native
$ ./read_foo.native foo_example_broken.scm
Uncaught exception:
(Sexplib.Conv.Of_sexp_error
(Failure "int_of_sexp: (Failure int_of_string)") not-an-integer)
Raised at file "lib/conv.ml", line 281, characters 36-72
Called from file "lib/core_int.ml", line 6, characters 7-14
Called from file "lib/std_internal.ml", line 115, characters 7-33
Called from file "read_foo.ml", line 5, characters 2-8
Called from file "read_foo.ml", line 4, characters 2-40
Called from file "read_foo.ml", line 11, characters 4-60
Called from file "lib/exn.ml", line 69, characters 6-10

If all you have is the error message and the string, it's not terribly informative. In particular, you know that the parsing errored out on the atom "not-an-integer," but you don't know which one! In a large file, this kind of bad error message can be pure misery.

But there's hope! We can make a small change to the code to improve the error message greatly:

open Core.Std

type t = { 
  a: string;
  b: int;
  c: float option 
} with sexp

let run () =
  let t = Sexp.load_sexp_conv_exn "foo_broken_example.scm" t_of_sexp in
  printf "b is: %d\n%!" t.b

let () =
  Exn.handle_uncaught ~exit:true run

If we run it again, we'll see a much more specific error:

$ corebuild read_foo_better_errors.native
$ ./read_foo_better_errors.native foo_example_broken.scm
Uncaught exception:
(Sexplib.Conv.Of_sexp_error
(Sexplib.Sexp.Annotated.Conv_exn foo_broken_example.scm:2:5
(Failure "int_of_sexp: (Failure int_of_string)"))
not-an-integer)
Raised at file "lib/pre_sexp.ml", line 1145, characters 12-58
Called from file "read_foo_better_errors.ml", line 10, characters 10-68
Called from file "lib/exn.ml", line 69, characters 6-10

In the preceding error, foo_broken_example.scm:2:5 tells us that the error occurred in the file "foo_broken_example.scm" on line 2, character 5. This is a much better start for figuring out what went wrong. The ability to find the precise location of the error depends on the sexp converter reporting errors using the function of_sexp_error. This is already done by converters generated by Sexplib, but you should make sure to do the same when you write custom converters.

Sexp-Conversion Directives

Sexplib supports a collection of directives for modifying the default behavior of the autogenerated sexp converters. These directives allow you to customize the way in which types are represented as s-expressions without having to write a custom converter.

Note that the extra directives aren't part of the standard OCaml syntax, but are added via the Sexplib syntax extension. However, since Sexplib is used throughout Core and is part of the standard bundle activated by corebuild, you can use these in your own Core code without any special effort.

sexp_opaque

The most commonly used directive is sexp_opaque, whose purpose is to mark a given component of a type as being unconvertible. Anything marked with sexp_opaque will be presented as the atom <opaque> by the to-sexp converter, and will trigger an exception from the from-sexp converter.

Note that the type of a component marked as opaque doesn't need to have a sexp converter defined. Here, if we define a type without a sexp converter and then try to use another type with a sexp converter, we'll error out:

# type no_converter = int * int ;;
type no_converter = int * int
# type t = { a: no_converter; b: string } with sexp ;;
Characters 14-26: Error: Unbound value no_converter_of_sexp

But with sexp_opaque, we can embed our opaque no_converter type within the other data structure without an error.

# type t = { a: no_converter sexp_opaque; b: string } with sexp ;;
type t = { a : no_converter; b : string; } val t_of_sexp : Sexp.t -> t = <fun> val sexp_of_t : t -> Sexp.t = <fun>
OCaml Utop ∗ sexpr/sexp_opaque.topscript , continued (part 1) ∗ all code

And if we now convert a value of this type to an s-expression, we'll see the contents of field a marked as opaque:

# sexp_of_t { a = (3,4); b = "foo" } ;;
- : Sexp.t = ((a <opaque>) (b foo))
OCaml Utop ∗ sexpr/sexp_opaque.topscript , continued (part 2) ∗ all code

Note that the t_of_sexp function for an opaque type is generated, but will fail at runtime if it is used:

# t_of_sexp (Sexp.of_string "((a whatever) (b foo))") ;;
Exception: (Sexplib.Conv.Of_sexp_error (Failure "opaque_of_sexp: cannot convert opaque values") whatever).
OCaml Utop ∗ sexpr/sexp_opaque.topscript , continued (part 3) ∗ all code

This is there to allow for s-expression converters to be created for types containing sexp_opaque values. This is useful because the resulting converters won't necessarily fail on all inputs. For example, if you have a record containing a no_converter list, the t_of_sexp function would still succeed when the list is empty:

# type t = { a: no_converter sexp_opaque list; b: string } with sexp ;;
type t = { a : no_converter list; b : string; } val t_of_sexp : Sexp.t -> t = <fun> val sexp_of_t : t -> Sexp.t = <fun>
# t_of_sexp (Sexp.of_string "((a ()) (b foo))") ;;
- : t = {a = []; b = "foo"}
OCaml Utop ∗ sexpr/sexp_opaque.topscript , continued (part 4) ∗ all code

If you really only want to generate one direction of converter, one can do this by annotating the type with with sexp_of or with of_sexp instead of with sexp:

# type t = { a: no_converter sexp_opaque; b: string } with sexp_of ;;
type t = { a : no_converter; b : string; } val sexp_of_t : t -> Sexp.t = <fun>
# type t = { a: no_converter sexp_opaque; b: string } with of_sexp ;;
type t = { a : no_converter; b : string; } val t_of_sexp : Sexp.t -> t = <fun>
OCaml Utop ∗ sexpr/sexp_opaque.topscript , continued (part 5) ∗ all code

sexp_list

Sometimes, sexp converters have more parentheses than one would ideally like. Consider, for example, the following variant type:

# type compatible_versions =
  | Specific of string list
  | All with sexp ;;
type compatible_versions = Specific of string list | All val compatible_versions_of_sexp : Sexp.t -> compatible_versions = <fun> val sexp_of_compatible_versions : compatible_versions -> Sexp.t = <fun>
# sexp_of_compatible_versions (Specific ["3.12.0"; "3.12.1"; "3.13.0"]) ;;
- : Sexp.t = (Specific (3.12.0 3.12.1 3.13.0))

You might prefer to make the syntax a bit less parenthesis-laden by dropping the parentheses around the list. We can replace the string list in the type declaration with string sexp_list to give us this alternate syntax:

# type compatible_versions =
  | Specific of string sexp_list
  | All with sexp ;;
type compatible_versions = Specific of string list | All val compatible_versions_of_sexp : Sexp.t -> compatible_versions = <fun> val sexp_of_compatible_versions : compatible_versions -> Sexp.t = <fun>
# sexp_of_compatible_versions (Specific ["3.12.0"; "3.12.1"; "3.13.0"]) ;;
- : Sexp.t = (Specific 3.12.0 3.12.1 3.13.0)
OCaml Utop ∗ sexpr/sexp_list.topscript , continued (part 1) ∗ all code

sexp_option

Another common directive is sexp_option, which is used to make a record field optional in the s-expression. Normally, optional values are represented either as () for None, or as (x) for Some x, and a record field containing an option would be rendered accordingly. For example:

# type t = { a: int option; b: string } with sexp ;;
type t = { a : int option; b : string; } val t_of_sexp : Sexp.t -> t = <fun> val sexp_of_t : t -> Sexp.t = <fun>
# sexp_of_t { a = None; b = "hello" } ;;
- : Sexp.t = ((a ()) (b hello))
# sexp_of_t { a = Some 3; b = "hello" } ;;
- : Sexp.t = ((a (3)) (b hello))

But what if we want a field to be optional, i.e., we want to allow it to be omitted from the record entirely? In that case, we can mark it with sexp_option:

# type t = { a: int sexp_option; b: string } with sexp ;;
type t = { a : int option; b : string; } val t_of_sexp : Sexp.t -> t = <fun> val sexp_of_t : t -> Sexp.t = <fun>
# sexp_of_t { a = Some 3; b = "hello" } ;;
- : Sexp.t = ((a 3) (b hello))
# sexp_of_t { a = None; b = "hello" } ;;
- : Sexp.t = ((b hello))
OCaml Utop ∗ sexpr/sexp_option.topscript , continued (part 1) ∗ all code

Specifying Defaults

The sexp_option declaration is really just an example of specifying a default behavior for dealing with an unspecified field. In particular, sexp_option fills in absent fields with None. But you might want to allow other ways of filling in default values.

Consider the following type, which represents the configuration of a very simple web server:

# type http_server_config = {
     web_root: string;
     port: int;
     addr: string;
  } with sexp ;;
type http_server_config = { web_root : string; port : int; addr : string; } val http_server_config_of_sexp : Sexp.t -> http_server_config = <fun> val sexp_of_http_server_config : http_server_config -> Sexp.t = <fun>

One could imagine making some of these parameters optional; in particular, by default, we might want the web server to bind to port 80, and to listen as localhost. We can do this as follows:

# type http_server_config = {
     web_root: string;
     port: int with default(80);
     addr: string with default("localhost");
  } with sexp ;;
type http_server_config = { web_root : string; port : int; addr : string; } val http_server_config_of_sexp : Sexp.t -> http_server_config = <fun> val sexp_of_http_server_config : http_server_config -> Sexp.t = <fun>
OCaml Utop ∗ sexpr/sexp_default.topscript , continued (part 1) ∗ all code

Now, if we try to convert an s-expression that specifies only the web_root, we'll see that the other values are filled in with the desired defaults:

# let cfg = http_server_config_of_sexp
   (Sexp.of_string "((web_root /var/www/html))") ;;
val cfg : http_server_config = {web_root = "/var/www/html"; port = 80; addr = "localhost"}
OCaml Utop ∗ sexpr/sexp_default.topscript , continued (part 2) ∗ all code

If we convert the configuration back out to an s-expression, you'll notice that all of the fields are present, even though they're not strictly necessary:

# sexp_of_http_server_config cfg ;;
- : Sexp.t = ((web_root /var/www/html) (port 80) (addr localhost))
OCaml Utop ∗ sexpr/sexp_default.topscript , continued (part 3) ∗ all code

We could make the generated s-expression also drop exported values, by using the sexp_drop_default directive:

# type http_server_config = {
     web_root: string;
     port: int with default(80), sexp_drop_default;
     addr: string with default("localhost"), sexp_drop_default;
  } with sexp ;;
type http_server_config = { web_root : string; port : int; addr : string; } val http_server_config_of_sexp : Sexp.t -> http_server_config = <fun> val sexp_of_http_server_config : http_server_config -> Sexp.t = <fun>
# let cfg = http_server_config_of_sexp (Sexp.of_string "((web_root /var/www/html))") ;;
val cfg : http_server_config = {web_root = "/var/www/html"; port = 80; addr = "localhost"}
# sexp_of_http_server_config cfg ;;
- : Sexp.t = ((web_root /var/www/html))
OCaml Utop ∗ sexpr/sexp_default.topscript , continued (part 4) ∗ all code

As you can see, the fields that are at their default values are simply omitted from the s-expression. On the other hand, if we convert a config with other values, then those values will be included in the s-expression:

# sexp_of_http_server_config { cfg with port = 8080 } ;;
- : Sexp.t = ((web_root /var/www/html) (port 8080))
# sexp_of_http_server_config { cfg with port = 8080; addr = "192.168.0.1" } ;;