Combine 1.0.0 and a simple INI parser
Combine 1.0.0 is released! This finally brings combine into stable semver territory meaning you can now use it without worrying about breakage when updating to a newer version.
Rather than just posting an announcement with the few breaking changes that happened between the announcement of the beta and now I figured I could write about how to create a parser from scratch using combine.
A simple INI parser
The INI file format is an informal standard for configuration files. As they are quite simple to parse they make a good introductory example while still demonstrating the flexibiliy of using parser combinators for parsing.
From the wiki page and the example text we can see that we need parsers for three major elements, properties, sections and whitespace (including comments). Well tackle each one separately in the order which they are needed.
Starting with properties we can see that we need to parse the property name followed by an equals sign followed by the value which reaches until the end of the line. In combine the easiest way of running parsers in sequence is to use tuples which lets use write the property parser like so.
There are four different predefined parsers doing work in this line.
satisfytakes a function of type
(Item) -> booland creates a parser which only accepts the items which returns true for that predicate.
many1is a function which takes a parser and creates a new parser which applies the given parser one or more times (hence the 1 suffix).
tokencreates a parser which accepts the given token.
maptransforms the result of a sucessful parse into a new result. We use it here to only return a tuple of the key and value, skipping the result of the
All put together we have constructed a property parser in just a few lines which we can pass around however we want, well almost. To create a parser with combine a the Parser trait is used and each of the parsers mentioned above is its own type which implement the
Parser trait. This technique is great for ensuring that the compiler never loses any information when it comes to optimizing the code. This does howoever come at the cost of some rather large types.
//Type of the property parser Map<(Many<_, Satisfy<_, [closure@examples\ini.rs:18:19: 18:31]>>, Token<_>, Many<_, Satisfy<_, [closure@examples\ini.rs:18:60: 18:73]>>), [closure@examples\ini.rs:19:14: 19:44]>
Even if we could return closure types from functions that is still a long type we dont really want to type out. We could box the parser but that requires an allocation each time we need the parser which we do not actually need. Instead we can do one better and wrap the parser inside a function which directly calls the parser.
Parsing whitespace and comments
Parsing whitespace are done similiarily as the property parser. We do however need a few more parsers to write it. The
skip_many parses in the same way as the
many parser only it ignores the result. This makes it a slightly more efficient for cases like this when we do not care about the result. Next we use the
or method to create a parser which parses either whitespace through the predefined
spaces parser or comments.
The last parser we need before putting it all together is the parser for section. A sections starts with its name enclosed in brackets so we can use the
between parser here which ignores the result of the enclosing parsers and only returns the result of what is inbetween. After the name we want to skip any whitespace that are between the name and the properties in the sections so here we need to use the
parser function which turns the ordinary whitespace function into a parser. We use the same
parser function to also wrap the
Putting it all together
Finally we can put it all together into the complete INI parser. As each INI file can start with properties which do not belong to a named section we create a small struct which contains the global properties as well as the ones contained in a section. We then reuse the
properties parser which we used above in the section parser to parse any properties which appear before any section. Lastly we skip any whitespace before the first property/section.
All that is left is to try it with some data.
Its always a good idea to test error cases as well. This parser is quite liberal in what it accepts so creating a good error cases is a bit tricky but we can test unclosed sections. If we parse “[error” we would expect a readable error message.
Parse error at line: 1, column: 7 Unexpected 'end of input' Expected ']'
Which is pretty good but if we wanted to be a bit clearer we might want to say that we expected a section at this point. To do this we can use the
expected combinator to attach an error message when the parser fails.
Parse error at line: 1, column: 7 Unexpected 'end of input' Expected 'section'
Which gives us an error message which may be a bit easier to understand.
The final parser
Below is the parser in its entirety. It could use some improvements but as the INI format isn’t exactly specified I prefer to keep it rather minimal which should let it be easy to extend to handle some more specific things. You can find the complete source with tests here.
Hopefully this should give a good idea about to go about building parsers using combine. Though we only parse a string but this parser could equally well be used to parse any other input stream such as iterators. If you want to see more examples of parsing using combine they can be found in the benches and tests directories of combine.
Though combine has reached 1.0.0 that does not mean that it is done. I have already two more features lying in wait, zero copy parsing and using non-cloneable iterators. These should get merged in under an experimental feature pretty soon so that they can get some testing before stabilizing them.
Thats all for now, depending on the interest I may write more about combine or the [programming langauge][languager] I am writing (also in rust) in which you can find a much more advanced example on how to use combine.