simple s-expression parser in python
A few weeks back, I boasted in a bar that anyone could write a simple s-expression parser from scratch (no flex, no yacc, no bison, no antler, etc). Well, I could not boast without doing it myself, so I finally did. My s-expression lexer/parser weighs in at about 170 lines without tests (450 lines with them), and parsers integers and symbols. I even created a little repl which takes this:
|
|
And converts it into this:
|
|
A pretty-printed python list consisting of objects representing symbols and numbers.
I think the design is fairly readable if a bit unusual. There are classes for each token (instances of which are used for fully and partially formed tokens) and we “add” these tokens together to form a larger token and ultimately invoke a next_token_exception allowing the lexer to yield the largest, greediest, most fully-formed token possible.
Here’s an example of the lexer in action:
|
|
After lexing, we’re left with a stream of fully-formed tokens; it’s a simple matter to filter out the whitespace tokens and start yielding instances of Symbol() and Number() classes with proper, pythonic types for the data.
Although I used regexes (quite unnecessarily) and generators, this little thing could probably be ported to C. It’d be the fastest, most useless parser/lexer around :)
Happy Hacking!