parsing c++ without a symbol table!

"KNAPEN, GREGORY" <gregory.knapen@bell.ca>
27 Jul 1998 11:46:18 -0400

          From comp.compilers

Related articles
parsing c++ without a symbol table! gregory.knapen@bell.ca (KNAPEN, GREGORY) (1998-07-27)
Re: parsing c++ without a symbol table! dlmoore@pgroup.com (David L Moore) (1998-07-27)
Re: parsing c++ without a symbol table! qjackson@wave.home.com (Quinn Tyler Jackson) (1998-07-28)
Re: parsing c++ without a symbol table! jason@cygnus.com (Jason Merrill) (1998-07-28)
Re: parsing c++ without a symbol table! mac@coos.dartmouth.edu (1998-07-30)
| List of all articles for this month |

From: "KNAPEN, GREGORY" <gregory.knapen@bell.ca>
Newsgroups: comp.compilers
Date: 27 Jul 1998 11:46:18 -0400
Organization: Bell Canada / Bell Sygma
Keywords: C++, parse

Hi,


I am building a c++ parser that recognizes c++ by using the syntax only.
I don't use any semantic information i.e. there is no need for a symbol
table. Of course, this parser can not be use as a compiler because the
language contains ambiguities. This parser is intended to gather metrics
from source code.


While doing this project, I found that most of c++ can be parsed by
using the syntax alone except for three cases:


1. ambiguity between function call and variable declaration


ex: T(a); or T(*a); etc..


this would be a variable declaration if T is a type or a function call
if T is a function.


2. ambiguity between function declaration and variable declaration


ex: int X(A);


if A is a type A X is a function declaration
if A is a variable x is a var initialized with A


3. ambiguous parameter


ex: int F(T(C));


if C is a type the declaration becomes int F(T(*fp)(C c));
if C is a new id it becomes int F(T C);


I was wodering if there were other such cases where a sentence needs
semantic information to be made non ambiguous. Any case that can be
recognized by de syntax alone does not qualify. I assume that I have
infinite lookahead(backtracking).


For example, a c-style type cast is usually recognized by checking if
the identifier between parenthesis is a type or not. It is possible to
find a type cast by the syntax alone.


var = (Type1)(Type2)...(TypeN)(expression);


an expression between () is a type cast if and only if it is followed by
another typecast or an expression. This requires a lot of
backtracking(inefficient) but it illustrates the point that the sentence
can be recognized without using semantic information.


So I was wondering if there were other families of sentences besides the
ones listed that required semantic information to be made non ambiguous?




Greg Knapen
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.