Real World OCaml, by Yaron Minsky, Anil Madhavapeddy and Jason Hickey

Buy in print and eBook.

Chapter 3. Lists and Patterns

This chapter will focus on two common elements of programming in OCaml: lists and pattern matching. Both of these were discussed in Chapter 1, A Guided Tour, but we'll go into more depth here, presenting the two topics together and using one to help illustrate the other.

List Basics

An OCaml list is an immutable, finite sequence of elements of the same type. As we've seen, OCaml lists can be generated using a bracket-and-semicolon notation:

# [1;2;3];;
- : int list = [1; 2; 3]

And they can also be generated using the equivalent :: notation:

# 1 :: (2 :: (3 :: [])) ;;
- : int list = [1; 2; 3]
# 1 :: 2 :: 3 :: [] ;;
- : int list = [1; 2; 3]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 1) ∗ all code

As you can see, the :: operator is right-associative, which means that we can build up lists without parentheses. The empty list [] is used to terminate a list. Note that the empty list is polymorphic, meaning it can be used with elements of any type, as you can see here:

# let empty = [];;
val empty : 'a list = []
# 3 :: empty;;
- : int list = [3]
# "three" :: empty;;
- : string list = ["three"]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 2) ∗ all code

The way in which the :: operator attaches elements to the front of a list reflects the fact that OCaml's lists are in fact singly linked lists. The figure below is a rough graphical representation of how the list 1 :: 2 :: 3 :: [] is laid out as a data structure. The final arrow (from the box containing 3) points to the empty list.

+---+---+   +---+---+   +---+---+
| 1 | *---->| 2 | *---->| 3 | *---->||
+---+---+   +---+---+   +---+---+

Each :: essentially adds a new block to the proceding picture. Such a block contains two things: a reference to the data in that list element, and a reference to the remainder of the list. This is why :: can extend a list without modifying it; extension allocates a new list element but change any of the existing ones, as you can see:

# let l = 1 :: 2 :: 3 :: [];;
val l : int list = [1; 2; 3]
# let m = 0 :: l;;
val m : int list = [0; 1; 2; 3]
# l;;
- : int list = [1; 2; 3]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 3) ∗ all code

Using Patterns to Extract Data from a List

We can read data out of a list using a match statement. Here's a simple example of a recursive function that computes the sum of all elements of a list:

# let rec sum l =
    match l with
    | [] -> 0
    | hd :: tl -> hd + sum tl
  ;;
val sum : int list -> int = <fun>
# sum [1;2;3];;
- : int = 6
# sum [];;
- : int = 0
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 4) ∗ all code

This code follows the convention of using hd to represent the first element (or head) of the list, and tl to represent the remainder (or tail).

The match statement in sum is really doing two things: first, it's acting as a case-analysis tool, breaking down the possibilities into a pattern-indexed list of cases. Second, it lets you name substructures within the data structure being matched. In this case, the variables hd and tl are bound by the pattern that defines the second case of the match statement. Variables that are bound in this way can be used in the expression to the right of the arrow for the pattern in question.

The fact that match statements can be used to bind new variables can be a source of confusion. To see how, imagine we wanted to write a function that filtered out from a list all elements equal to a particular value. You might be tempted to write that code as follows, but when you do, the compiler will immediately warn you that something is wrong:

# let rec drop_value l to_drop =
    match l with
    | [] -> []
    | to_drop :: tl -> drop_value tl to_drop
    | hd :: tl -> hd :: drop_value tl to_drop
  ;;


Characters 114-122: Warning 11: this match case is unused.val drop_value : 'a list -> 'a -> 'a list = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 5) ∗ all code

Moreover, the function clearly does the wrong thing, filtering out all elements of the list rather than just those equal to the provided value, as you can see here:

# drop_value [1;2;3] 2;;
- : int list = []
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 6) ∗ all code

So, what's going on?

The key observation is that the appearance of to_drop in the second case doesn't imply a check that the first element is equal to the value to_drop passed in as an argument to drop_value. Instead, it just causes a new variable to_drop to be bound to whatever happens to be in the first element of the list, shadowing the earlier definition of to_drop. The third case is unused because it is essentially the same pattern as we had in the second case.

A better way to write this code is not to use pattern matching for determining whether the first element is equal to to_drop, but to instead use an ordinary if statement:

# let rec drop_value l to_drop =
    match l with
    | [] -> []
    | hd :: tl ->
      let new_tl = drop_value tl to_drop in
      if hd = to_drop then new_tl else hd :: new_tl
  ;;
val drop_value : 'a list -> 'a -> 'a list = <fun>
# drop_value [1;2;3] 2;;
- : int list = [1; 3]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 7) ∗ all code

Note that if we wanted to drop a particular literal value (rather than a value that was passed in), we could do this using something like our original implementation of drop_value:

# let rec drop_zero l =
    match l with
    | [] -> []
    | 0  :: tl -> drop_zero tl
    | hd :: tl -> hd :: drop_zero tl
  ;;
val drop_zero : int list -> int list = <fun>
# drop_zero [1;2;0;3];;
- : int list = [1; 2; 3]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 8) ∗ all code

Limitations (and Blessings) of Pattern Matching

The preceding example highlights an important fact about patterns, which is that they can't be used to express arbitrary conditions. Patterns can characterize the layout of a data structure and can even include literals, as in the drop_zero example, but that's where they stop. A pattern can check if a list has two elements, but it can't check if the first two elements are equal to each other.

You can think of patterns as a specialized sublanguage that can express a limited (though still quite rich) set of conditions. The fact that the pattern language is limited turns out to be a very good thing, making it possible to build better support for patterns in the compiler. In particular, both the efficiency of match statements and the ability of the compiler to detect errors in matches depend on the constrained nature of patterns.

Performance

Naively, you might think that it would be necessary to check each case in a match in sequence to figure out which one fires. If the cases of a match were guarded by arbitrary code, that would be the case. But OCaml is often able to generate machine code that jumps directly to the matched case based on an efficiently chosen set of runtime checks.

As an example, consider the following rather silly functions for incrementing an integer by one. The first is implemented with a match statement, and the second with a sequence of if statements:

# let plus_one_match x =
    match x with
    | 0 -> 1
    | 1 -> 2
    | 2 -> 3
    | _ -> x + 1

  let plus_one_if x =
    if      x = 0 then 1
    else if x = 1 then 2
    else if x = 2 then 3
    else x + 1
  ;;
val plus_one_match : int -> int = <fun> val plus_one_if : int -> int = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 9) ∗ all code

Note the use of _ in the above match. This is a wildcard pattern that matches any value, but without binding a variable name to the value in question.

If you benchmark these functions, you'll see that plus_one_if is considerably slower than plus_one_match, and the advantage gets larger as the number of cases increases. Here, we'll benchmark these functions using the core_bench library, which can be installed by running opam install core_bench from the command line:

# #require "core_bench";;
# open Core_bench.Std;;
# let run_bench tests = Bench.bench ~ascii_table:true ~display:Textutils.Ascii_table.Display.column_titles tests ;;
val run_bench : Bench.Test.t list -> unit = <fun>
# [ Bench.Test.create ~name:"plus_one_match" (fun () -> ignore (plus_one_match 10)) ; Bench.Test.create ~name:"plus_one_if" (fun () -> ignore (plus_one_if 10)) ] |> run_bench ;;


Estimated testing time 20s (change using -quota SECS). Name Time/Run % of max ---------------- ---------- ---------- plus_one_match 23.68 76.44 plus_one_if 30.98 100.00 - : unit = ()
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 10) ∗ all code

Here's another, less artificial example. We can rewrite the sum function we described earlier in the chapter using an if statement rather than a match. We can then use the functions is_empty, hd_exn, and tl_exn from the List module to deconstruct the list, allowing us to implement the entire function without pattern matching:

# let rec sum_if l =
    if List.is_empty l then 0
    else List.hd_exn l + sum_if (List.tl_exn l)
  ;;
val sum_if : int list -> int = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 11) ∗ all code

Again, we can benchmark these to see the difference:

# let numbers = List.range 0 1000 in
  [ Bench.Test.create ~name:"sum_if" (fun () -> ignore (sum_if numbers))
  ; Bench.Test.create ~name:"sum"    (fun () -> ignore (sum numbers)) ]
  |> run_bench
  ;;


Estimated testing time 20s (change using -quota SECS). Name Time/Run % of max -------- ---------- ---------- sum_if 71_005 100.00 sum 11_788 16.60 - : unit = ()
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 12) ∗ all code

In this case, the match-based implementation is many times faster than the if-based implementation. The difference comes because we need to effectively do the same work multiple times, since each function we call has to reexamine the first element of the list to determine whether or not it's the empty cell. With a match statement, this work happens exactly once per list element.

Generally, pattern matching is more efficient than the alternatives you might code by hand. One notable exception is matches over strings, which are in fact tested sequentially, so matches containing a long sequence of strings can be outperformed by a hash table. But most of the time, pattern matching is a clear performance win.

Detecting Errors

The error-detecting capabilities of match statements are if anything more important than their performance. We've already seen one example of OCaml's ability to find problems in a pattern match: in our broken implementation of drop_value, OCaml warned us that the final case was redundant. There are no algorithms for determining if a predicate written in a general-purpose language is redundant, but it can be solved reliably in the context of patterns.

OCaml also checks match statements for exhaustiveness. Consider what happens if we modify drop_zero by deleting the handler for one of the cases. As you can see, the compiler will produce a warning that we've missed a case, along with an example of an unmatched pattern:

# let rec drop_zero l =
    match l with
    | [] -> []
    | 0  :: tl -> drop_zero tl
  ;;


Characters 26-84: Warning 8: this pattern-matching is not exhaustive. Here is an example of a value that is not matched: 1::_val drop_zero : int list -> 'a list = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 13) ∗ all code

Even for simple examples like this, exhaustiveness checks are pretty useful. But as we'll see in Chapter 6, Variants, they become yet more valuable as you get to more complicated examples, especially those involving user-defined types. In addition to catching outright errors, they act as a sort of refactoring tool, guiding you to the locations where you need to adapt your code to deal with changing types.

Using the List Module Effectively

We've so far written a fair amount of list-munging code using pattern matching and recursive functions. But in real life, you're usually better off using the List module, which is full of reusable functions that abstract out common patterns for computing with lists.

Let's work through a concrete example to see this in action. We'll write a function render_table that, given a list of column headers and a list of rows, prints them out in a well-formatted text table, as follows:

# printf "%s\n"
   (render_table
     ["language";"architect";"first release"]
     [ ["Lisp" ;"John McCarthy" ;"1958"] ;
       ["C"    ;"Dennis Ritchie";"1969"] ;
       ["ML"   ;"Robin Milner"  ;"1973"] ;
       ["OCaml";"Xavier Leroy"  ;"1996"] ;
     ]);;


| language | architect | first release | |----------+----------------+---------------| | Lisp | John McCarthy | 1958 | | C | Dennis Ritchie | 1969 | | ML | Robin Milner | 1973 | | OCaml | Xavier Leroy | 1996 | - : unit = ()
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 69) ∗ all code

The first step is to write a function to compute the maximum width of each column of data. We can do this by converting the header and each row into a list of integer lengths, and then taking the element-wise max of those lists of lengths. Writing the code for all of this directly would be a bit of a chore, but we can do it quite concisely by making use of three functions from the List module: map, map2_exn, and fold.

List.map is the simplest to explain. It takes a list and a function for transforming elements of that list, and returns a new list with the transformed elements. Thus, we can write:

# List.map ~f:String.length ["Hello"; "World!"];;
- : int list = [5; 6]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 14) ∗ all code

List.map2_exn is similar to List.map, except that it takes two lists and a function for combining them. Thus, we might write:

# List.map2_exn ~f:Int.max [1;2;3] [3;2;1];;
- : int list = [3; 2; 3]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 15) ∗ all code

The _exn is there because the function throws an exception if the lists are of mismatched length:

# List.map2_exn ~f:Int.max [1;2;3] [3;2;1;0];;
Exception: (Invalid_argument "length mismatch in rev_map2_exn: 3 <> 4 ").
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 16) ∗ all code

List.fold is the most complicated of the three, taking three arguments: a list to process, an initial accumulator value, and a function for updating the accumulator. List.fold walks over the list from left to right, updating the accumulator at each step and returning the final value of the accumulator when it's done. You can see some of this by looking at the type-signature for fold:

# List.fold;;
- : 'a list -> init:'accum -> f:('accum -> 'a -> 'accum) -> 'accum = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 17) ∗ all code

We can use List.fold for something as simple as summing up a list:

# List.fold ~init:0 ~f:(+) [1;2;3;4];;
- : int = 10
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 18) ∗ all code

This example is particularly simple because the accumulator and the list elements are of the same type. But fold is not limited to such cases. We can for example use fold to reverse a list, in which case the accumulator is itself a list:

# List.fold ~init:[] ~f:(fun list x -> x :: list) [1;2;3;4];;
- : int list = [4; 3; 2; 1]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 19) ∗ all code

Let's bring our three functions together to compute the maximum column widths:

# let max_widths header rows =
    let lengths l = List.map ~f:String.length l in
    List.fold rows
      ~init:(lengths header)
      ~f:(fun acc row ->
        List.map2_exn ~f:Int.max acc (lengths row))
  ;;
val max_widths : string list -> string list list -> int list = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 20) ∗ all code

Using List.map we define the function lengths, which converts a list of strings to a list of integer lengths. List.fold is then used to iterate over the rows, using map2_exn to take the max of the accumulator with the lengths of the strings in each row of the table, with the accumulator initialized to the lengths of the header row.

Now that we know how to compute column widths, we can write the code to generate the line that separates the header from the rest of the text table. We'll do this in part by mapping String.make over the lengths of the columns to generate a string of dashes of the appropriate length. We'll then join these sequences of dashes together using String.concat, which concatenates a list of strings with an optional separator string, and ^, which is a pairwise string concatenation function, to add the delimiters on the outside:

# let render_separator widths =
    let pieces = List.map widths
      ~f:(fun w -> String.make (w + 2) '-')
    in
    "|" ^ String.concat ~sep:"+" pieces ^ "|"
  ;;
val render_separator : int list -> string = <fun>
# render_separator [3;6;2];;
- : string = "|-----+--------+----|"
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 21) ∗ all code

Note that we make the line of dashes two larger than the provided width to provide some whitespace around each entry in the table.

Now we need code for rendering a row with data in it. We'll first write a function called pad, for padding out a string to a specified length plus one blank space on both sides:

# let pad s length =
    " " ^ s ^ String.make (length - String.length s + 1) ' '
  ;;
val pad : string -> int -> string = <fun>
# pad "hello" 10;;
- : string = " hello "
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 24) ∗ all code

We can render a row of data by merging together the padded strings. Again, we'll use List.map2_exn for combining the list of data in the row with the list of widths:

# let render_row row widths =
    let padded = List.map2_exn row widths ~f:pad in
    "|" ^ String.concat ~sep:"|" padded ^ "|"
  ;;
val render_row : string list -> int list -> string = <fun>
# render_row ["Hello";"World"] [10;15];;
- : string = "| Hello | World |"
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 25) ∗ all code

Now we can bring this all together in a single function that renders the table:

# let render_table header rows =
    let widths = max_widths header rows in
    String.concat ~sep:"\n"
      (render_row header widths
       :: render_separator widths
       :: List.map rows ~f:(fun row -> render_row row widths)
      )
  ;;
val render_table : string list -> string list list -> string = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 26) ∗ all code

More Useful List Functions

The previous example we worked through touched on only three of the functions in List. We won't cover the entire interface (for that you should look at the online docs), but a few more functions are useful enough to mention here.

Combining list elements with List.reduce

List.fold, which we described earlier, is a very general and powerful function. Sometimes, however, you want something simpler and easier to use. One such function is List.reduce, which is essentially a specialized version of List.fold that doesn't require an explicit starting value, and whose accumulator has to consume and produce values of the same type as the elements of the list it applies to.

Here's the type signature:

# List.reduce;;
- : 'a list -> f:('a -> 'a -> 'a) -> 'a option = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 27) ∗ all code

reduce returns an optional result, returning None when the input list is empty.

Now we can see reduce in action:

# List.reduce ~f:(+) [1;2;3;4;5];;
- : int option = Some 15
# List.reduce ~f:(+) [];;
- : int option = None
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 28) ∗ all code

Filtering with List.filter and List.filter_map

Very often when processing lists, you wants to restrict your attention to a subset of the values on your list. The List.filter function is one way of doing that:

# List.filter ~f:(fun x -> x mod 2 = 0) [1;2;3;4;5];;
- : int list = [2; 4]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 29) ∗ all code

Note that the mod used above is an infix operator, as described in Chapter 2, Variables and Functions.

Sometimes, you want to both transform and filter as part of the same computation. In that case, List.filter_map is what you need. The function passed to List.filter_map returns an optional value, and List.filter_map drops all elements for which None is returned.

Here's an example. The following expression computes the list of file extensions in the current directory, piping the results through List.dedup to remove duplicates. Note that this example also uses some functions from other modules, including Sys.ls_dir to get a directory listing, and String.rsplit2 to split a string on the rightmost appearance of a given character:

# List.filter_map (Sys.ls_dir ".") ~f:(fun fname ->
    match String.rsplit2 ~on:'.' fname with
    | None  | Some ("",_) -> None
    | Some (_,ext) ->
      Some ext)
  |> List.dedup
  ;;
- : string list = ["ascii"; "ml"; "mli"; "topscript"]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 30) ∗ all code

The preceding code is also an example of an Or pattern, which allows you to have multiple subpatterns within a larger pattern. In this case, None | Some ("",_) is an Or pattern. As we'll see later, Or patterns can be nested anywhere within larger patterns.

Partitioning with List.partition_tf

Another useful operation that's closely related to filtering is partitioning. The function List.partition_tf takes a list and a function for computing a Boolean condition on the list elements, and returns two lists. The tf in the name is a mnemonic to remind the user that true elements go to the first list and false ones go to the second. Here's an example:

# let is_ocaml_source s =
    match String.rsplit2 s ~on:'.' with
    | Some (_,("ml"|"mli")) -> true
    | _ -> false
  ;;
val is_ocaml_source : string -> bool = <fun>
# let (ml_files,other_files) = List.partition_tf (Sys.ls_dir ".") ~f:is_ocaml_source;;
val ml_files : string list = ["example.ml"; "example.mli"] val other_files : string list = ["lists_layout.ascii"; "main.topscript"]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 31) ∗ all code

Combining lists

Another very common operation on lists is concatenation. The list module actually comes with a few different ways of doing this. First, there's List.append, for concatenating a pair of lists:

# List.append [1;2;3] [4;5;6];;
- : int list = [1; 2; 3; 4; 5; 6]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 32) ∗ all code

There's also @, an operator equivalent of List.append:

# [1;2;3] @ [4;5;6];;
- : int list = [1; 2; 3; 4; 5; 6]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 33) ∗ all code

In addition, there is List.concat, for concatenating a list of lists:

# List.concat [[1;2];[3;4;5];[6];[]];;
- : int list = [1; 2; 3; 4; 5; 6]
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 34) ∗ all code

Here's an example of using List.concat along with List.map to compute a recursive listing of a directory tree:

# let rec ls_rec s =
    if Sys.is_file_exn ~follow_symlinks:true s
    then [s]
    else
      Sys.ls_dir s
      |> List.map ~f:(fun sub -> ls_rec (s ^/ sub))
      |> List.concat
  ;;
val ls_rec : string -> string list = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 35) ∗ all code

Note that ^/ is an infix operator provided by Core for adding a new element to a string representing a file path. It is equivalent to Core's Filename.concat.

The preceding combination of List.map and List.concat is common enough that there is a function List.concat_map that combines these into one, more efficient operation:

# let rec ls_rec s =
    if Sys.is_file_exn ~follow_symlinks:true s
    then [s]
    else
      Sys.ls_dir s
      |> List.concat_map ~f:(fun sub -> ls_rec (s ^/ sub))
  ;;
val ls_rec : string -> string list = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 36) ∗ all code

Tail Recursion

The only way to compute the length of an OCaml list is to walk the list from beginning to end. As a result, computing the length of a list takes time linear in the size of the list. Here's a simple function for doing so:

# let rec length = function
    | [] -> 0
    | _ :: tl -> 1 + length tl
  ;;
val length : 'a list -> int = <fun>
# length [1;2;3];;
- : int = 3
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 37) ∗ all code

This looks simple enough, but you'll discover that this implementation runs into problems on very large lists, as we'll show in the following code:

# let make_list n = List.init n ~f:(fun x -> x);;
val make_list : int -> int list = <fun>
# length (make_list 10);;
- : int = 10
# length (make_list 10_000_000);;
Stack overflow during evaluation (looping recursion?).
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 38) ∗ all code

The preceding example creates lists using List.init, which takes an integer n and a function f and creates a list of length n, where the data for each element is created by calling f on the index of that element.

To understand where the error in the above example comes from, you need to learn a bit more about how function calls work. Typically, a function call needs some space to keep track of information associated with the call, such as the arguments passed to the function, or the location of the code that needs to start executing when the function call is complete. To allow for nested function calls, this information is typically organized in a stack, where a new stack frame is allocated for each nested function call, and then deallocated when the function call is complete.

And that's the problem with our call to length: it tried to allocate 10 million stack frames, which exhausted the available stack space. Happily, there's a way around this problem. Consider the following alternative implementation:

# let rec length_plus_n l n =
    match l with
    | [] -> n
    | _ :: tl -> length_plus_n tl (n + 1)
  ;;
val length_plus_n : 'a list -> int -> int = <fun>
# let length l = length_plus_n l 0 ;;
val length : 'a list -> int = <fun>
# length [1;2;3;4];;
- : int = 4
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 39) ∗ all code

This implementation depends on a helper function, length_plus_n, that computes the length of a given list plus a given n. In practice, n acts as an accumulator in which the answer is built up, step by step. As a result, we can do the additions along the way rather than doing them as we unwind the nested sequence of function calls, as we did in our first implementation of length.

The advantage of this approach is that the recursive call in length_plus_n is a tail call. We'll explain more precisely what it means to be a tail call shortly, but the reason it's important is that tail calls don't require the allocation of a new stack frame, due to what is called the tail-call optimization. A recursive function is said to be tail recursive if all of its recursive calls are tail calls. length_plus_n is indeed tail recursive, and as a result, length can take a long list as input without blowing the stack:

# length (make_list 10_000_000);;
- : int = 10000000
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 40) ∗ all code

So when is a call a tail call? Let's think about the situation where one function (the caller) invokes another (the callee). The invocation is considered a tail call when the caller doesn't do anything with the value returned by the callee except to return it. The tail-call optimization makes sense because, when a caller makes a tail call, the caller's stack frame need never be used again, and so you don't need to keep it around. Thus, instead of allocating a new stack frame for the callee, the compiler is free to reuse the caller's stack frame.

Tail recursion is important for more than just lists. Ordinary nontail recursive calls are reasonable when dealing with data structures like binary trees, where the depth of the tree is logarithmic in the size of your data. But when dealing with situations where the depth of the sequence of nested calls is on the order of the size of your data, tail recursion is usually the right approach.

Terser and Faster Patterns

Now that we know more about how lists and patterns work, let's consider how we can improve on an example from the section called “Recursive list functions”: the function destutter, which removes sequential duplicates from a list. Here's the implementation that was described earlier:

# let rec destutter list =
    match list with
    | [] -> []
    | [hd] -> [hd]
    | hd :: hd' :: tl ->
      if hd = hd' then destutter (hd' :: tl)
      else hd :: destutter (hd' :: tl)
  ;;
val destutter : 'a list -> 'a list = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 41) ∗ all code

We'll consider some ways of making this code more concise and more efficient.

First, let's consider efficiency. One problem with the destutter code above is that it in some cases re-creates on the righthand side of the arrow a value that already existed on the lefthand side. Thus, the pattern [hd] -> [hd] actually allocates a new list element, when really, it should be able to just return the list being matched. We can reduce allocation here by using an as pattern, which allows us to declare a name for the thing matched by a pattern or subpattern. While we're at it, we'll use the function keyword to eliminate the need for an explicit match:

# let rec destutter = function
    | [] as l -> l
    | [_] as l -> l
    | hd :: (hd' :: _ as tl) ->
      if hd = hd' then destutter tl
      else hd :: destutter tl
  ;;
val destutter : 'a list -> 'a list = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 42) ∗ all code

We can further collapse this by combining the first two cases into one, using an Or pattern:

# let rec destutter = function
    | [] | [_] as l -> l
    | hd :: (hd' :: _ as tl) ->
      if hd = hd' then destutter tl
      else hd :: destutter tl
  ;;
val destutter : 'a list -> 'a list = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 43) ∗ all code

We can make the code slightly terser now by using a when clause. A when clause allows us to add an extra precondition to a pattern in the form of an arbitrary OCaml expression. In this case, we can use it to include the check on whether the first two elements are equal:

# let rec destutter = function
    | [] | [_] as l -> l
    | hd :: (hd' :: _ as tl) when hd = hd' -> destutter tl
    | hd :: tl -> hd :: destutter tl
  ;;
val destutter : 'a list -> 'a list = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 44) ∗ all code

Polymorphic Compare

In the preceding destutter example, we made use of the fact that OCaml lets us test equality between values of any type, using the = operator. Thus, we can write:

# 3 = 4;;
- : bool = false
# [3;4;5] = [3;4;5];;
- : bool = true
# [Some 3; None] = [None; Some 3];;
- : bool = false
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 45) ∗ all code

Indeed, if we look at the type of the equality operator, we'll see that it is polymorphic:

# (=);;
- : 'a -> 'a -> bool = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 46) ∗ all code

OCaml comes with a whole family of polymorphic comparison operators, including the standard infix comparators, <, >=, etc., as well as the function compare that returns -1, 0, or 1 to flag whether the first operand is smaller than, equal to, or greater than the second, respectively.

You might wonder how you could build functions like these yourself if OCaml didn't come with them built in. It turns out that you can't build these functions on your own. OCaml's polymorphic comparison functions are built into the runtime to a low level. These comparisons are polymorphic on the basis of ignoring almost everything about the types of the values that are being compared, paying attention only to the structure of the values as they're laid out in memory.

Polymorphic compare does have some limitations. For example, it will fail at runtime if it encounters a function value:

# (fun x -> x + 1) = (fun x -> x + 1);;
Exception: (Invalid_argument "equal: functional value").
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 47) ∗ all code

Similarly, it will fail on values that come from outside the OCaml heap, like values from C bindings. But it will work in a reasonable way for other kinds of values.

For simple atomic types, polymorphic compare has the semantics you would expect: for floating-point numbers and integers, polymorphic compare corresponds to the expected numerical comparison functions. For strings, it's a lexicographic comparison.

Sometimes, however, the type-ignoring nature of polymorphic compare is a problem, particularly when you have your own notion of equality and ordering that you want to impose. We'll discuss this issue more, as well as some of the other downsides of polymorphic compare, in Chapter 13, Maps and Hash Tables.

Note that when clauses have some downsides. As we noted earlier, the static checks associated with pattern matches rely on the fact that patterns are restricted in what they can express. Once we add the ability to add an arbitrary condition to a pattern, something will be lost. In particular, the ability of the compiler to determine if a match is exhaustive, or if some case is redundant, is compromised.

Consider the following function, which takes a list of optional values, and returns the number of those values that are Some. Because this implementation uses when clauses, the compiler can't tell that the code is exhaustive:

# let rec count_some list =
    match list with
    | [] -> 0
    | x :: tl when Option.is_none x -> count_some tl
    | x :: tl when Option.is_some x -> 1 + count_some tl
  ;;


Characters 30-169: Warning 8: this pattern-matching is not exhaustive. Here is an example of a value that is not matched: _::_ (However, some guarded clause may match this value.)val count_some : 'a option list -> int = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 48) ∗ all code

Despite the warning, the function does work fine:

# count_some [Some 3; None; Some 4];;
- : int = 2
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 49) ∗ all code

If we add another redundant case without a when clause, the compiler will stop complaining about exhaustiveness and won't produce a warning about the redundancy.

# let rec count_some list =
    match list with
    | [] -> 0
    | x :: tl when Option.is_none x -> count_some tl
    | x :: tl when Option.is_some x -> 1 + count_some tl
    | x :: tl -> -1 (* unreachable *)
  ;;
val count_some : 'a option list -> int = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 50) ∗ all code

Probably a better approach is to simply drop the second when clause:

# let rec count_some list =
    match list with
    | [] -> 0
    | x :: tl when Option.is_none x -> count_some tl
    | _ :: tl -> 1 + count_some tl
  ;;
val count_some : 'a option list -> int = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 51) ∗ all code

This is a little less clear, however, than the direct pattern-matching solution, where the meaning of each pattern is clearer on its own:

# let rec count_some list =
    match list with
    | [] -> 0
    | None   :: tl -> count_some tl
    | Some _ :: tl -> 1 + count_some tl
  ;;
val count_some : 'a option list -> int = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 52) ∗ all code

The takeaway from all of this is although when clauses can be useful, we should prefer patterns wherever they are sufficient.

As a side note, the above implementation of count_some is longer than necessary; even worse, it is not tail recursive. In real life, you would probably just use the List.count function from Core:

# let count_some l = List.count ~f:Option.is_some l;;
val count_some : 'a option list -> int = <fun>
OCaml Utop ∗ lists-and-patterns/main.topscript , continued (part 53) ∗ all code