Bottom- Up Parsing (Compiler Writing) Part 1. Error Detection and Recovery in LR Parsers Acceptable error recovery, which often involves some error repair, has been difficult to incorporate within LR, SLR, and LALR parsers. This section outlines some of the approaches to error detection and recovery, concentrating on recent efforts in implementing error- repair strategies based upon the context surrounding the error point in a source program. We first describe some of the earlier methods of error recovery which have been used. The discussion mentions some advantages and disadvantages of these methods, providing the motivation for other error- recovery implementations. We next identify some of the difficulties in applying the Graham- Rhodes method of error recovery to LR parsers. We then present several approaches to error recovery that have been based on the Graham- Rhodes approach. Early methods of error recovery. To be usable, a parser requires some method which it can use to recover from errors. This section examines three methods of error recovery which are applicable to LR parsers. While some of these methods have been used in parsers, they are not considered to provide satisfactory error recovery today. Error repair, a more stringent form of error recovery, involves changing the erroneous input sequence to one that is syntactically correct so that parsing may continue. Error- repair procedures tend to provide better diagnostics, giving the programmer hints as to how the parser recovered from the error and, hopefully, giving some hints on how to fix the error for a later run. Error- recovery and - repair methods can be considered unsatisfactory for a number of reasons. Some methods react badly to unanticipated situations; some require a large programming effort and careful anticipation of possible syntactic errors. Because of increasing software- development costs, it has become advantageous to use parser generators to generate a parser. A few recovery methods cannot be included within a parser generator, requiring the parser writer to provide a significant portion of the coding effort. Finally some error- recovery methods are overly simplistic, throwing away stack and input contents until it appears that parsing may continue. LALR Parse Table Generation in C#. Introduction to parsers (Top Down Parsing and Bottom Up Parsing). Modern Compiler Design Associated Supplemental Materials. 5 Bottom-Up Parsing & Yacc 21. They do not take advantage of available information which would lead to better error diagnostics and recovery. One of the earliest methods of error recovery is referred to as panic mode. Panic mode has been popular because it is very simple to implement. Furthermore, the method is compatible with automatic parser generation and can be easily included in parsers generated by parser generators. When an error is detected by a parser that uses panic mode for error recovery, the parser throws away input symbols and stack symbols until an input symbol and state on top of the stack permit parsing to continue. The usual approach to implementing panic mode involves defining a set of synchronizing symbols. These symbols are terminal symbols of the language that constitute firm delimiters. Usually, firm delimiters end statements or structured constructs and only minimally constrain the strings which may follow. For most languages, the set of synchronizing symbols can be determined easily (Graham, 1. Panic mode proceeds as follows: Input tokens are thrown away until a synchronizing symbol is read. Then elements are popped off the parsing stack until a state that permits a move based on the synchronizing symbol ends up on top of the stack. Control is passed back to the parser once a parsing move can be performed. Some improvements to this approach make use of predictions as to what symbols may follow the one on the stack and throw away input symbols until one is found or insert symbols into the input stream until parsing can continue. As an example of panic- mode error recovery, consider the following program statement: There is a missing operator between the variables . Bottom Up Parsing Program C Code Operators
After the terminal symbol for the variable . This error- recovery method throws away source tokens until a token which would permit a move is read. Hence the tokens for . Table 7- 4. 5 Action table augmented with error routines. The advantage of using panic mode is that this form of error recovery is fast and requires little code. However, because input symbols are thrown away, not all errors in a program may be detected, possibly necessitating several runs to detect all the errors. Finally, little information is provided to the user concerning the nature of the error. A more ad hoc method of error recovery has been used in table- driven parsers, including LR parsers. Whenever an error entry in the action table would normally occur, the name of an error- recovery routine is designated. The error routines, which must be hand- coded, manipulate the input stream and the stack based upon the nature of the error (as indicated by which location is being accessed in the parse table). Table 7- 4. 5 contains an example of an action table which is augmented with the names of error routines. These names begin with e and end with a digit. The error routine whose name is the one returned by the action table is called when an error occurs. For example, the error routine el handles the case of a missing operand, generating a suitable error message and inserting an operand into the input stream. Likewise, e. 2 might generate a message stating that the parentheses are unbalanced and would delete the extra right parenthesis. The advantage of using error routines is that compiler writers are allowed to use their intuitions about the likely causes of syntax errors, and error recovery and meaningful diagnostic error messages can be provided. Because the size of LR parsing tables tends to be large for most programming languages, it is not feasible for these error routines to be implemented by hand. As this approach requires human insight to indicate the likely cause of an error, it is impossible to generate the error routines automatically. Also, unanticipated errors can cause the error- recovery process to collapse and cause the parser to become unstable. Another approach which does provide automatic error recovery is used in YACC (yet another compiler- compiler), an LALR parser generator (Johnson, 1. The idea is similar to that of panic mode, where synchronizing symbols also provide a basis for recovering from erroneous input. Examples of major nonterminals include (program), (block), and (statement). The user adds to the grammar an error production of the form where A is a major nonterminal and a is an often empty string of terminal and nonterminal symbols. When an error is detected by the parser, the error routine pops the stack until it finds an element whose state is associated with the production A —> error a and then shifts the special token error onto the stack as though error had been read as input. Then, the parser attempts to resume parsing by discarding input symbols until a valid input symbol is read. For example, a grammar might have the productions where (stmt) represents assignment statements, flow of control statements, and so forth. If an error is detected while a string of tokens which would reduce to (stmt) is being parsed, the parser first pops elements off the stack until a state which would permit the error token associated with the production to be shifted onto the stack is found. As error reduces to (stmt), input tokens are then discarded until a semicolon appears in the input text. At this point, parsing resumes. The programmer can take advantage of intuitions concerning the likely causes of errors when including error productions in a grammar; if these error productions are improperly specified, the worst case for error recovery behaves like panic mode. Furthermore, YACC has demonstrated that this method can be implemented in parser generators. While this method permits automatic error recovery, its failings include the poor handling of unanticipated errors. Also, if no state on the stack is associated with an error production, the parser dies after popping the contents of the entire stack. The error- recovery methods discussed here are not considered totally satisfactory. We next describe a better method of error recovery, which is based on the Graham and Rhodes method introduced earlier (see Sec. Application of the Graham- Rhodes method to LR parsers. Recall that the Graham- Rhodes method uses a two- phase approach. The first phase, or condensation phase, condenses the surrounding context by first trying to continue reducing the sentential form on the stack and then trying to continue to parse the unread portion of the input string without referring to what has been previously parsed before the error point. The second phase is the correction phase, which analyzes the context in which the error occurs and provides diagnostic information and a repair. The condensation phase can further be subdivided into two parts. The first part tries to make further reductions on the stack preceding the point at which the error was detected. This is referred to as a backward move. The backward move endeavors to replace right- hand sides of some productions by their left- hand sides. Reductions continue until no more can be performed. A forward move is then tried. In this case, parsing beyond the point of error detection is performed. The parser must be started up without any left context; in some cases, this is relatively simple (such as in simple precedence parsing). Parsing continues until one of two possibilities arise. In one possibility a second error occurs. One solution to resolving the second error is to recursively invoke the error- recovery scheme on the error in order to correct it; another solution tries to redo the parse using a different parsing sequence. The second possibility, which is the most likely according to Graham and Rhodes, occurs when a reduction that includes elements on the parsing stack that are below the point where the error was detected is required.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
January 2017
Categories |