Lexer is not lexing numbers very well


Bugzilla Link	1125
Created on	Jul 11, 2014 09:25
Resolution	FIXED
Resolved on	Sep 26, 2015 17:13
Version	svn
OS	All
Architecture	All

Extended Description

While fixing bug 1064 I found several inconsistencies in the way lexer treats numbers.
$ cd $SAC2CBASE/src/libsac2c/scanparse
$ gcc -g -o lex -std=c99 -D_GNU_SOURCE -DLEXER_BINARY lex.c  trie.c
Test[1]:
$ echo -n "123axa" | ./lex /dev/stdin 
/dev/stdin 1:1 number ['123']
/dev/stdin 1:4 id ['axa']
This means that if we potentially mistyped a certain prefix, it will treat the sequence as two tokens.  Shouldn't matter in practise but GCC for example gives the error message like:
       invalid suffix "axa" on integer constant
we can do something similar.
Test[2]:
echo -n ".23f " | ./lex /dev/stdin 
/dev/stdin 1:1 operator ['.']
/dev/stdin 1:2 number_float ['23']
/dev/stdin 1:5 whitespace [' ']
That indicates that .23 which would be a valid number in C is not supported.
Test[3]:
$ echo -n ".23" | ./lex /dev/stdin 
/dev/stdin 1:1 operator ['.']
/dev/stdin error:1:4 unexpected end of file
/dev/stdin 1:2 unknown ['23'] !unknown
That seem to be a bug!  We never hit this as we don't use numbers at the end of the file, but it should *not* happen.
Test[4]:
$ echo -n "0.f" | ./lex /dev/stdin 
/dev/stdin error:1:3 digit expexted, 'f' found instead
/dev/stdin 1:1 unknown ['0.'] !unknown
/dev/stdin 1:3 id ['f']
It doesn't like suffixes after the dot, which is legal in C.
It would be nice to match the way number are treated in C with SaC and add suffixes for integer constants (b,l,u,...) on top.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information