The project I’m working on involves parsing a rather large (24+ meg) input file. At the moment, I don’t know anything about the output format which is required, so I’m working on parsing the file and turning into an almost vermatim XML representation of the original file. The idea is that once it’s in XML, it will be a relativley trivial task to apply a transformation to the XML to convert it into teh target format.
The input file is in ASAP2 format (some kind of engineering thing). Fortunatley, this file format is well specified, and the documentation comes with the EBNF notation. This means it should be a relativley simple task to build a grammar file to feed to a parser generator and get it to do all the hard work for me.
Easily said
Now I find myself in the strange dark woods that is compiler theory. I never imagined I would find myself in this place. But apparently one can’t tackle parser theory without having to get your feet wet in compiler theory.
I’m hoping this work pays off in the long term, but right now I’m finding it hard going. The two parser generators I’ve been looking at, antlr and javacc, have appalling documentation (for a beginner).
I get pretty tired of the “if you need to ask, you aren’t worthy of the answer” attitude in many computing circles. Well, I need to ask, and right now, I’, finding the “documentation” on the above mentioned programs web sites next to useless.
What I did find, was The University Of Birminghams’ tutorial pages which have a great tutorial on antlr. You guys rock. (this does make me wonder why our own CS department here at Swinburne doesn’t have such resources online?)