Advent of Code 2020: Day 2

Type signatures are not documentation, and compilers are not (quite) all-knowing

Today’s problems are about string parsing, which means we pull out the regular expressions. OCaml actually doesn’t have a regex engine built in, so I used the Re library, which is one of those libraries with absolutely no documentation. “Why do you need documentation? We have an interface file!" is what they say. I was having a hard time with it, so eventually I found Humane-re which advertises itself as a user-friendly wrapper around Re. It even has a blog post with examples! Unfortunately, it only seems to support Emacs-style regular expressions, which I know a lot less well than PCRE, so I went back to Re. Despite the disappointment, Humane-re’s source code served as a great example of how to use Re, and I was able to get my little parser written without much further difficulty. I will admit that I’m getting better at reading type signatures, so I can—at least a little bit—understand why a lot of OCaml users seem to consider interface files and documentation interchangeable. But even then, there are multiple types of documentation, and interface files cover at most one!

Rants aside, here’s my code for both problems:

open Core
open Re

let parse_line line =
  let re = Pcre.regexp "(\\d+)-(\\d+) ([a-z]): ([a-z]+)" in
  let[@warning "-8"] [| _; a; b; char; pass |] = Group.all (exec re line) in
  (int_of_string a, int_of_string b, char.[0], pass)

let is_valid_password_sled line =
  let min, max, char, pass = parse_line line in
  let count = String.count pass ~f:(Char.equal char) in
  count >= min && count <= max

let part1 lines = List.count lines ~f:is_valid_password_sled

let is_valid_password_toboggan line =
  let i, j, char, pass = parse_line line in
  let i_eq = Char.equal char pass.[i - 1] in
  let j_eq = Char.equal char pass.[j - 1] in
  not (Bool.equal i_eq j_eq)

let part2 lines = List.count lines ~f:is_valid_password_toboggan

OCaml’s compiler is incredibly smart, but in this case it actually got in my way slightly. Did you notice the [@warning "-8"] in parse_line? Let’s see what the compiler says if I omit it:

7 |   let [| _; a; b; char; pass |] = Group.all (exec re line) in
Error (warning 8): this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
[|  |]

It’s suggesting that Group.all could return an empty array… but could it really? Could it return an array with anything but five elements? As far as I know, the answer is no. Since my regex has four capture groups, it’ll return five elements: the whole string plus the four captures. However, OCaml doesn’t offer a way to express this with the type system. The compiler knows that Group.all returns a string array, but it has no way of knowing its length, even though a human can infer it pretty easily. In a dependently-typed language like Idris, this should be possible. But in any case, we’re able to silence the warning and the compiler assumes that we know what we’re doing.

So from all this, we learned that OCaml is not dependently typed. And that’s okay, but it begs another question… how the hell does sprintf work?

# open Printf;;
# sprintf "%d" 1;;
- : string = "1"

Okay, so sprintf must be string -> int -> string, right? Nope:

# sprintf "%d %d" 1 2;;
- : string = "1 2"

Okay, so it’s somehow variadic. Nifty, relative to my knowledge of the type system, but not implausible.

# sprintf "%d %c" 1 '2';;
- : string = "1 2"

It’s generic too! Again, not beyond belief, lots of static type systems can do this.

# sprintf "%d %d" 1 '2';;
Error: This expression has type char but an expression was expected of type

Wait… what? To be clear, this isn’t a runtime exception. It fails at compile time. The only thing we changed was the value of the format string, so how is this not a textbook example of a dependently-typed function? According to this web page, the “fold” technique can be used for variadic functions and “(seemingly) dependently-typed functional results”. It describes the technique for MLton, a Standard ML compiler, but I assume the same is used for OCaml. Either way, it’s miles beyond my current level, so I won’t worry about it for a long time.

Anyways, I stayed up too late and now Day 3’s problems are coming out in 3 minutes. Hopefully I have something equally interesting to share tomorrow!

Check out my code in full here.

Edit (12/03): Turns out there is actually some regexp support in the standard library (Str). However, it only supports a limited flavour of regex and the API is not great…

matched_group n s returns the substring of s that was matched by the nth group \(...\) of the regular expression that was matched by the last call to a matching or searching function (see Str.matched_string for details). The user must make sure that the parameter s is the same string that was passed to the matching or searching function.

I don’t plan on writing any multithreaded code, but still, this stinks. And since Re comes as a dependency of Core, I don’t really need to worry about taking on an unnecessary dependency.