Combine 1.0.0 and a simple INI parser
Combine 1.0.0 is released! This finally brings combine into stable semver territory meaning you can now use it without worrying about breakage when updating to a newer version.
Rather than just posting an announcement with the few breaking changes that happened between the announcement of the beta and now I figured I could write about how to create a parser from scratch using combine.
A simple INI parser
The INI file format is an informal standard for configuration files. As they are quite simple to parse they make a good introductory example while still demonstrating the flexibiliy of using parser combinators for parsing.
From the wiki page and the example text we can see that we need parsers for three major elements, properties, sections and whitespace (including comments). Well tackle each one separately in the order which they are needed.
Parsing properties
Starting with properties we can see that we need to parse the property name followed by an equals sign followed by the value which reaches until the end of the line. In combine the easiest way of running parsers in sequence is to use tuples which lets use write the property parser like so.
There are four different predefined parsers doing work in this line.
satisfy
takes a function of type(Item) -> bool
and creates a parser which only accepts the items which returns true for that predicate.many1
is a function which takes a parser and creates a new parser which applies the given parser one or more times (hence the 1 suffix).token
creates a parser which accepts the given token.map
transforms the result of a sucessful parse into a new result. We use it here to only return a tuple of the key and value, skipping the result of thetoken('=')
parser.
All put together we have constructed a property parser in just a few lines which we can pass around however we want, well almost. To create a parser with combine a the Parser trait is used and each of the parsers mentioned above is its own type which implement the Parser
trait. This technique is great for ensuring that the compiler never loses any information when it comes to optimizing the code. This does howoever come at the cost of some rather large types.
//Type of the property parser
Map<(Many<_, Satisfy<_, [closure@examples\ini.rs:18:19: 18:31]>>, Token<_>, Many<_, Satisfy<_, [closure@examples\ini.rs:18:60: 18:73]>>), [closure@examples\ini.rs:19:14: 19:44]>
Even if we could return closure types from functions that is still a long type we dont really want to type out. We could box the parser but that requires an allocation each time we need the parser which we do not actually need. Instead we can do one better and wrap the parser inside a function which directly calls the parser.
Parsing whitespace and comments
Parsing whitespace are done similiarily as the property parser. We do however need a few more parsers to write it. The skip_many
parses in the same way as the many
parser only it ignores the result. This makes it a slightly more efficient for cases like this when we do not care about the result. Next we use the or
method to create a parser which parses either whitespace through the predefined spaces
parser or comments.
Parsing sections
The last parser we need before putting it all together is the parser for section. A sections starts with its name enclosed in brackets so we can use the between
parser here which ignores the result of the enclosing parsers and only returns the result of what is inbetween. After the name we want to skip any whitespace that are between the name and the properties in the sections so here we need to use the parser
function which turns the ordinary whitespace function into a parser. We use the same parser
function to also wrap the property
and properties
function.
Putting it all together
Finally we can put it all together into the complete INI parser. As each INI file can start with properties which do not belong to a named section we create a small struct which contains the global properties as well as the ones contained in a section. We then reuse the properties
parser which we used above in the section parser to parse any properties which appear before any section. Lastly we skip any whitespace before the first property/section.
All that is left is to try it with some data.
Its always a good idea to test error cases as well. This parser is quite liberal in what it accepts so creating a good error cases is a bit tricky but we can test unclosed sections. If we parse “[error” we would expect a readable error message.
Parse error at line: 1, column: 7
Unexpected 'end of input'
Expected ']'
Which is pretty good but if we wanted to be a bit clearer we might want to say that we expected a section at this point. To do this we can use the expected
combinator to attach an error message when the parser fails.
Parse error at line: 1, column: 7
Unexpected 'end of input'
Expected 'section'
Which gives us an error message which may be a bit easier to understand.
The final parser
Below is the parser in its entirety. It could use some improvements but as the INI format isn’t exactly specified I prefer to keep it rather minimal which should let it be easy to extend to handle some more specific things. You can find the complete source with tests here.
Whats next
Hopefully this should give a good idea about to go about building parsers using combine. Though we only parse a string but this parser could equally well be used to parse any other input stream such as iterators. If you want to see more examples of parsing using combine they can be found in the benches and tests directories of combine.
Though combine has reached 1.0.0 that does not mean that it is done. I have already two more features lying in wait, zero copy parsing and using non-cloneable iterators. These should get merged in under an experimental feature pretty soon so that they can get some testing before stabilizing them.
Thats all for now, depending on the interest I may write more about combine or the [programming langauge][languager] I am writing (also in rust) in which you can find a much more advanced example on how to use combine.