Lexer is not lexing numbers very well
|
|
Bugzilla Link |
1125 |
Created on |
Jul 11, 2014 09:25 |
Resolution |
FIXED |
Resolved on |
Sep 26, 2015 17:13 |
Version |
svn |
OS |
All |
Architecture |
All |
Extended Description
While fixing bug 1064 I found several inconsistencies in the way lexer treats numbers.
$ cd $SAC2CBASE/src/libsac2c/scanparse
$ gcc -g -o lex -std=c99 -D_GNU_SOURCE -DLEXER_BINARY lex.c trie.c
Test[1]:
$ echo -n "123axa" | ./lex /dev/stdin
/dev/stdin 1:1 number ['123']
/dev/stdin 1:4 id ['axa']
This means that if we potentially mistyped a certain prefix, it will treat the sequence as two tokens. Shouldn't matter in practise but GCC for example gives the error message like:
invalid suffix "axa" on integer constant
we can do something similar.
Test[2]:
echo -n ".23f " | ./lex /dev/stdin
/dev/stdin 1:1 operator ['.']
/dev/stdin 1:2 number_float ['23']
/dev/stdin 1:5 whitespace [' ']
That indicates that .23 which would be a valid number in C is not supported.
Test[3]:
$ echo -n ".23" | ./lex /dev/stdin
/dev/stdin 1:1 operator ['.']
/dev/stdin error:1:4 unexpected end of file
/dev/stdin 1:2 unknown ['23'] !unknown
That seem to be a bug! We never hit this as we don't use numbers at the end of the file, but it should *not* happen.
Test[4]:
$ echo -n "0.f" | ./lex /dev/stdin
/dev/stdin error:1:3 digit expexted, 'f' found instead
/dev/stdin 1:1 unknown ['0.'] !unknown
/dev/stdin 1:3 id ['f']
It doesn't like suffixes after the dot, which is legal in C.
It would be nice to match the way number are treated in C with SaC and add suffixes for integer constants (b,l,u,...) on top.