Erlang, the language for network programming Issue 2: binary pattern matching José Pablo Ezequiel Pupeno Fernández Silva
http://pupeno.com 24th October 2006 Much is being said about the excellent capabilities of Erlang (http://erlang.org) to write distributed faulttolerant programs, but little has been said about how easy and fun it is to write servers (those programs at the other end of the line). And by easy I don't just mean that you can put up a web server in two lines and hope it'll work, I mean it'll be easy to built robust servers. One example of this is ejabberd (http://ejabberd.jabber.ru), a free Jabber (http://jabber.org) server. I'll start this second part, the one with real networking programming, with a bet. Think about the IPv4 protocol, its header is like this:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ you can check RFC791, page 11 for more information. At a glance, the rst 4 bits are the version, the next 4 bits the IHL (Internet Header Length), then we have a whole byte, 8 bits, of Type of Service. The next two bytes are the total length and I am already tired of it, you get the picture right? Pick whatever language you want (except Erlang, that's mine now, but it can be yours latter) and think about how many lines of code would take you to parse that beast, the IP header. Think about how much time it takes you to write those lines, and test them. Done? come on! really think about it, otherwise the game is boring. Close your eyes, picture the lines of code. If you can't, go and write some pseudo-code similar to your favorite language to do the parsing. Done? OK. Here's my bet: I bet that I can do it, in Erlang, in far less lines than you! I bet you that I can code it so fast that I'd be nished of writing the code to parse the whole header before you nish the code to parse the rst line. And while you are testing I'll go to the beach because I'll just trust my code to run without problems. Do you think I am crazy? I'll conrm it with another bet: I'll bet that after reading this article you'll be able to do the same super-programming that I claimed capable of in the previous paragraph. Keep reading! One of the Erlang features that really help us write servers is binary pattern matching. To understand it, rst you need to understand pattern matching; for that you can read the previous issue (http://pupeno.com/ publications/erlang-network-programming-1.pdf/view) previous issue. Erlang provides a way to write binary data directly on the source code:
< <"hello"> >
1
That is a binary containing the string "hello" as ASCII. In hexadecimal notation it'd be: 68 65 6c 6c 6f; in decimal: 104 101 108 108 111; and in binary: 01101000 01100101 01101100 01101100 01101111. Another one:
< <1, 2, 3> > This one contains three bytes, the rst one being 1, the second being 2 and the third being 3; in hexadecimal: 01 02 03. So far, nothing impressive, let's get there:
< <1, 2, 3:16> > it contains four bytes, the rst and second are 1 and 2 respectively. The third and fourth both form a 3, so in hexadecimal: 01 02 00 03. The 16 after the colon species how many bits the previous value will use (the default for integers is 8). What did you say? integers ? Yes. Is that a type? Yes. You can also have types, for instance
< <1/integer, 2.34/float> > will generate, in hexadecimal, 01 40 02 b8 51 eb 85 1e b8 (the standard size for a oat is 64 bits). We can also dene endianness, sigdness and unit. Enough of that. Binaries, as any other structure like lists or tuples, can participate in pattern matching. Let's do some pattern matching in binaries:
Packet = < <"Erlang is a general-purpose programming language."> >, < > = Packet. When developing a program, Packet would normally come from a le or a network connection, here I just dened it so you can see its contents. The second line breaks Packet into various dierent pieces and assigns identiers to them (A, B, C, D and E. Let's see the results:
> A. 69 > B. 29292 > C. 1634625312 > D. 7598452597831722350 > E. < <"eral-purpose programming language."> > Isn't it cool? Think about how many lines of code would have taken you to do the same in any other programming language (if you nd anything that can beat Erlang at doing that in one line of code, I am interested in taking a look at it). A typical problem when learning to code is: how do I check if the third bit is 1 or 0 of that byte? In Erlang:
check_third(< <_:2, 1:1, _:5> >) -> "It's a one"; check_third(_) -> "It's a zero". If the parameter to check_third has a 1 in its third bit, we'll get "It's a one", otherwise, "It's a zero". If you need to do this inside a function (without calling a function, you can use case, another Erlang construct but I am not going to describe here). You can use the identiers generated in a pattern in the pattern itself:
< <Size:8, String:Size/binary-unit:8> > reads as:
2
The rst 8 bits, a byte, will be taken, as a integer and named Size. String will be measured in binary-units of size 8 bits (bytes) and it will contain Size of those units. For example:
> < <Size:8, String:Size/binary-unit:8> > = < <5, "hello"> >. < <5,104,101,108,108,111> > > Size. 5 > String. < <"hello"> > > binary_to_list(String). "hello" And if the string is shorter or longer, we'll have a pattern mismatch, a clean exception. The last call turns the binary into a list (strings are list of characters in Erlang). By now, you may start to imagine why I've made such a bet at the beginning of this document. Here you can see my line (broken due to space constraints) that will match an IPv4 header1 :
< > = Packet. Impresive isn't ? If you want to see more I can recommend you my unnished DNS parser in the Serlvers (http://pupeno.com/software/serlvers) project.
1 I've cheated, it is not matching correctly the IP Option (optional eld). Since we have to calculate its size according to IHL, then it has to be done in two separate matches. Just pick up Rest and continue working on it.
3