Saturday, 15 October 2016

lex I/O routines

Some actions may require reading another character, putting a character back into the input stream, or writing a character to the standard output. lex supplies three functions to handle these tasks: input(),unput(), and output(), respectively. input() takes no arguments; unput() and output() take a single character-valued argument.

The following example illustrates the use of input() and unput(). The subroutine skipcmnts() is used to ignore comments in a language like C, where comments occur between `/*' and `*/' : 
  1 %%
  2 "/*"                  skipcmnts();
  3 .
  4 .
  5 .
  6 %%
  7 skipcmnts()
  8 {
  9        for(;;)
 10        {
 11           while (input() != '*');
 12           if (input() != '/') {
 13                 unput(yytext[yyleng-1]);
 14           }
 15           else
 16                 return;
 17        }
 18  }
After the token "/*" is read, the lexical analyzer continues reading characters until an asterisk (*) is found. If the character after the asterisk is a "/", the function returns. Otherwise, that character is returned to the input stream and the function keeps on reading characters. The important thing to note here is that the analyzer does not try to match any patterns with the characters that are read by input(). When it resumes pattern matching, after the function returns, it starts with the first character in the input stream after the characters read by the subroutine.
There are three other things to note in this example. First, the unput() function (which puts back the last character read) is necessary to avoid missing the final ``//'' if the comment ends with a **/. In this case, having read an ``*'', the analyzer finds that the next character is not the terminal `/' and must read some more. Second, the expression yytext[yyleng-1] refers to the last character read. Third, this routine assumes that the comments are not nested, as is indeed the case with the C language. If, unlike in C, they are nested in the source text, after reading the first */ ending the inner group of comments, the lexical analyzer reads the rest of the comments as if they were part of the input to be searched for patterns.
To handle special I/O needs, such as writing to several files, standard I/O routines in C can be used to rewrite the functions input(), unput(), and output(). These and other programmer-defined functions should be placed in the subroutine section. The new routines will then replace the standard ones. lex's input() is equivalent to getchar(), and output() is equivalent to putchar().

No comments:

Post a Comment