The parser can still initially parse the "-" token separately from the
parsed value "9223372036854775808" which cannot be an integer and is then
given a double value (hoping that the double will keep its precision, but
even if a double has less bits, the truncated bits in this case are all
zeroes.
So the double value 9223372036854775808 which is still a constant, can
easily be tested when it appears next to the unary minur operator in the
syntax tree which now as tow tokens: suring the "reduce" step of the
parser: the unary minus and the double value which is still exactly exactly
equal to 9223372036854775808.
This just requires an additional reduce rule in the syntaxic parser (not in
the lexer) to resolve it as if it was a single precomputed negative integer
constant for this specific case.
However this would mean that -922337203685477580.8e1 would turn to be
parsed as an integer even if the source specified it explicitly as a double.
There are two ways to handle this:
- either the lexer does not parse numeric strings itself, and leaves the
token as a string. The actual conversion to a datatype will be done in the
parser.
- or the lexer parse the numeric strings and does not only return the token
with attributes set to the numeric value, but also containing an "hint"
indicator of which parser (integer or floating point) it used to reduce the
numeric string to a resolved floating point constant. This approaches
complicates only one parser rule:
unaryexpression ::= '-' INTCONSTANT
unaryexpression ::= '-' DOUBLECONSTANT
{ if (tokens[1].doublevalue == 9223372036854775808.0) {
tokens[1].type = INTCONSTANT;
tokens[1].intvalue = -9223372036854775808;
}
unaryexpression ::= ....
unaryexpression ::= '-' (expression)
(here the "tokens[]" is some (C-like) array that access to properties of
tokens returned by the lexer, and being reduced in the parsiing rules
(whose number of tokens in the array is determined by the parsing rule,
here there are 2 tokens) and stored and modified by the parser in its
abstract syntax tree, assuming that tokens[i].type is one of the defined
token type constants returned by the lexer which can also set
tokens[i].intvalue or token[i].doublevalue, or token[i].stringvalue for
NAME token types or for LITTERALSTRING token types).
Post by Muh MuhtenPost by pocomane-- Min integer literal, BUG ???
assert(tostring(-9223372036854775808) == '-9.2233720368548e+018')
assert(math.type(-9223372036854775808) == 'float')
The issue appears to be that the lexer sees '-' separately from the
numeral itself. As such, when reading the number, it must fit in a
non-negative integer, and is then constant-folded to the actual
negative. Incidentally, it appears that this corner case has already
lstrlib.c
961: const char *format = (n == LUA_MININTEGER) /* corner case? */
962- ? "0x%" LUA_INTEGER_FRMLEN "x" /* use hexa */
963- : LUA_INTEGER_FMT; /* else use default format */
It seems to me that making this particular case work would be rather
involved, and essentially require negative numbers to be first-class
citizens in the grammar, rather than cobbled together through unary
minus and constant folding. I also don't see a satisfying solution
for, e.g. "- 9223372036854775808", "- --[[]]9223372036854775808",
"-(9223372036854775808)", though arguably those are *explicitly* the
negation of positive 9223372036854775808, which doesn't fit, and
really should be an integer.
In any case, the main uses I can see for having that number as a
constant and an integer are qua -2^63 and qua minimum integer. Perhaps
some variation on "-2^63|0", "-1<<63", "~(~0>>1)", or
"-0x8000000000000000" might be suitable? (Though that last one only
happens to work because 0x8000000000000000 == -0x8000000000000000 (64
bits) in 2's complement, and the unchecked overflow from casting to
signed after reading a hex literal may be undefined behavior.)