Why Does Apache NetBeans Need Its Own Parsers?
A question was asked on the Apache NetBeans mailing list: "I was just curious about the theoretical aspect of parsing. Isn't there a unified parsing API, using ANTLR/lex/yacc which can parse any language given a grammar for it? Why do we use a different parsing implementation (like the Graal JS parser in this instance) when a unified approach will help us support lots of languages easily?"
Tim Boudreau, involved in NetBeans from its earliest hours, responds, in the thread linked above:
First, in an IDE, you are *never* just "parsing". You are doing *a lot* with the results of the parse. An IDE doesn't have to just parse one file; it must also understand the context of the project that file lives in; how it relates to other files and those files interdependencies; multiple versions of languages; and the fact that the results of a parse do not map cleanly to a bunch of stuff an IDE would show you that would be useful. For example, say the caret is in a java method, and you want to find all other methods that call the one you're in and show the user a list of them. The amount of work that has to happen to answer that question is very, very large. To do that quickly enough to be useful, you need to do it ahead of time and have a bunch of indexing and caching software behind the scenes (all of which must be adapted to whatever the parser provides) so you can look it up when you need it. In short, a parser is kind of like a toilet seat by itself. You don't want to use it without a whole lot of plumbing attached to it.
Third, in an IDE with a 20 year history, a lot of parser generating technologies have come and gone - javacc, javacup, ANTLR, and good old hand-written lexers and parsers. Unifying them all would be an enormous amount of work, would break a lot of code that works just fine, and the end result would be - stuff we've already got, that already works, just with one-parser-generator-to-rule-them-all underneath. Other than prettiness, I don't know what problem that solves.
So, all of this is to say: We use different parsing implementations because parsing is just a tiny piece of supporting a language, so it wouldn't make the hard parts easier enough to be worth it. And there will be new cool parser-generating technologies that come along, and it's good to be able to use them, rather than be married to one-parser-generator-to-rule-them-all and have this conversation again, when they come along.
Posted at 12:11PM Aug 06, 2019 by Geertjan in General | |