OCaml Coding Considerations

From Xen
Jump to: navigation, search

This page is a collection of things worth bearing in mind while developing in OCaml.

Garbage Collection

Read information from the tutorial here.

Performance Issues

Read information from the tutorial here.

Use Tail-Recursive Functions when Possible

Tail-recursion is a very important notion to understand in order to make recursive calls efficients. Basically, a function is tail recursive if all the recursive call is not followed by any operations - if there are some operations left, the compiler will store them in the stack to finish them later, which is not good for performances. For example, this function which sum up the element of a list is not tail-recursive : the function fun x -> h + x need to be stored in the stack, waiting for the return of sum t to be totally applied: if your run this example on a big list, you will obtain Stack overflow as too many elements will be stored on the stack, waiting for the recursive calls to return.

let rec sum l =
  | [] -> 0
  | h::t -> h + (sum t)

A basic transformation of non tail-recursive function to tail-recursive function is with the help of an accumulator which carry the intermediate computation and which is passed through the recursive calls. Using that, we can re-write the last function into:

let sum l =
  let rec aux accu =
    | [] -> accu
    | h::t -> aux (h+accu) t
  in
    aux 0 l

More examples can be found here.

Good Usage of List Operations

Complexity of pattern-matching and :: is O(1), and complexity of List.rev and @ is O. Thus, instead of doing

let rec rev = function
  | [] -> []
  | h::t -> rev t @ [h]

which is quadratic (and not tail-recursive), use the following code which is linear:

let rev l =
  let aux accu = function
    | [] -> accu
    | h::t -> aux (h :: accu) t


Do Not Allocate Big Data-Structure Whose Lifetime is Very Short

as in List.filter f (List.map g l) if l is very long.

Avoid Concatenating Constant Strings with Operator ^

The optimizing compiler will not optimize them to a single constant string, requiring the resulting binary to concatenate them every time.

indeed, if you compile the following program test.ml:

print_string ("hello " ^ "world!\n")

using 'ocamlopt -S test.ml -o test' in order to see the resulting assembly file test.s, you will find out that it results in two different .data fields:

.data
        .long   2300
camlTest__1:
        .ascii  "hello "
        .space  1
        .byte   1
        .data
        .long   2300
camlTest__2:
        .ascii  "world!\12"
        .byte   0

and a simple visualization of the resulting test binary using 'less' shows the unoptimized two constant strings in different locations, instead of one single constant string.

This behavior is unfortunate but the compiler is not able to detect pure functional functions from side-effects functions. As example, the following function cannot be optimized, which is why the last example is also not optimized:

let c = ref true
let (^) s t = if !c then s^t else t^s
let x () = print_string ("hello " ^ "world!\n")
let _ = x (); c := false; c ; x () (* this prints "hello world\n!world!\nhello " *)


Specify Types for Function Arguments as Much as Possible

Otherwise, the compiler will try to find the largest polymorphic type for the argument, resulting in very inefficient assembly code.

For instance, the following ocaml function

let max (a:int) (b:int) = if a > b then a else b

is compiled to a simple instruction "cmpl" in assembly to compare a and b, whereas the following ocaml function

let max a b = if a > b then a else b

is compiled to a complex and inefficient call to a polymorphic comparison written as a C library, because the ocaml operator > is polymorphic, with type 'a -> 'a -> bool. This tendency to compile to polymorphic code by default is present regardless if the function is exported as an external symbol to be used outside the module or if the function is restricted to local usage.