Discussion:
Maybe bug - string representation of a literal mininteger
pocomane
2018-11-27 10:32:16 UTC
Permalink
It seems to me that there is an inconsistency in how lua handles the
min and the max integer literals. The following code ends without any
error both in Lua 5.3 and Lua 5.4.

```
-- Max integer, both literal and reference are ok
assert(tostring(math.maxinteger) == '9223372036854775807')
assert(tostring(9223372036854775807) == '9223372036854775807')
assert(math.type(9223372036854775807) == 'integer')

-- Min integer+1, literal is ok
assert(tostring(-9223372036854775807) == '-9223372036854775807')
assert(math.type(-9223372036854775807) == 'integer')

-- Min integer, reference is ok
assert(tostring(math.mininteger) == '-9223372036854775808')
assert(math.type(math.mininteger) == 'integer')

-- Min integer literal, BUG ???
assert(tostring(-9223372036854775808) == '-9.2233720368548e+018')
assert(math.type(-9223372036854775808) == 'float')
```

Is it a bug?
Muh Muhten
2018-11-27 11:51:39 UTC
Permalink
Post by pocomane
-- Min integer literal, BUG ???
assert(tostring(-9223372036854775808) == '-9.2233720368548e+018')
assert(math.type(-9223372036854775808) == 'float')
The issue appears to be that the lexer sees '-' separately from the
numeral itself. As such, when reading the number, it must fit in a
non-negative integer, and is then constant-folded to the actual
negative. Incidentally, it appears that this corner case has already
been noticed and is dealt with by the %q format:

lstrlib.c
961: const char *format = (n == LUA_MININTEGER) /* corner case? */
962- ? "0x%" LUA_INTEGER_FRMLEN "x" /* use hexa */
963- : LUA_INTEGER_FMT; /* else use default format */

It seems to me that making this particular case work would be rather
involved, and essentially require negative numbers to be first-class
citizens in the grammar, rather than cobbled together through unary
minus and constant folding. I also don't see a satisfying solution
for, e.g. "- 9223372036854775808", "- --[[]]9223372036854775808",
"-(9223372036854775808)", though arguably those are *explicitly* the
negation of positive 9223372036854775808, which doesn't fit, and
really should be an integer.

In any case, the main uses I can see for having that number as a
constant and an integer are qua -2^63 and qua minimum integer. Perhaps
some variation on "-2^63|0", "-1<<63", "~(~0>>1)", or
"-0x8000000000000000" might be suitable? (Though that last one only
happens to work because 0x8000000000000000 == -0x8000000000000000 (64
bits) in 2's complement, and the unchecked overflow from casting to
signed after reading a hex literal may be undefined behavior.)
pocomane
2018-11-27 13:16:51 UTC
Permalink
Post by Muh Muhten
It seems to me that making this particular case work would be rather
involved, and essentially require negative numbers to be first-class
citizens in the grammar, rather than cobbled together through unary
minus and constant folding. I also don't see a satisfying solution
for, e.g. "- 9223372036854775808", "- --[[]]9223372036854775808",
"-(9223372036854775808)", though arguably those are *explicitly* the
negation of positive 9223372036854775808, which doesn't fit, and
really should be an integer.
I am just using - 9223372036854775807-1 in my code (in test suites, I
prefer explicit number over math.mininteger). If it is complex to fix,
I think it could stay as it is. Maybe just a warning in the manual
could be useful.
Coda Highland
2018-11-27 15:06:09 UTC
Permalink
Post by pocomane
Post by Muh Muhten
It seems to me that making this particular case work would be rather
involved, and essentially require negative numbers to be first-class
citizens in the grammar, rather than cobbled together through unary
minus and constant folding. I also don't see a satisfying solution
for, e.g. "- 9223372036854775808", "- --[[]]9223372036854775808",
"-(9223372036854775808)", though arguably those are *explicitly* the
negation of positive 9223372036854775808, which doesn't fit, and
really should be an integer.
I am just using - 9223372036854775807-1 in my code (in test suites, I
prefer explicit number over math.mininteger). If it is complex to fix,
I think it could stay as it is. Maybe just a warning in the manual
could be useful.
The way Lua does it is consistent with how a number of other
programming languages handle tokenization. I know C++ does it that
way, having implemented a C++ parser once upon a time.

There's also the complication of expressions like "2-1" -- is that a
2, a -, and a 1, or is that a 2 and a -1? Obviously the intent is the
former, but any language that generates a token stream from a lexer
that it uses to subsequently construct the syntax tree is going to
have this problem.

There IS a solution available that isn't confused by your first two
examples, but it's not particularly elegant: when a unary minus
expression is parsed, and the operand is a constant, the parser folds
them together in the syntax tree instead of emitting an actual
negation operation. (Your last example isn't just "arguably" the
explicit negation, the use of parentheses makes it DEFINITELY an
explicit negation because the parentheses mean you're demanding that
the expression inside is computed BEFORE you apply the unary minus.)

/s/ Adam
Roberto Ierusalimschy
2018-11-27 15:19:04 UTC
Permalink
Post by Coda Highland
There IS a solution available that isn't confused by your first two
examples, but it's not particularly elegant: when a unary minus
expression is parsed, and the operand is a constant, the parser folds
them together in the syntax tree instead of emitting an actual
negation operation. (Your last example isn't just "arguably" the
explicit negation, the use of parentheses makes it DEFINITELY an
explicit negation because the parentheses mean you're demanding that
the expression inside is computed BEFORE you apply the unary minus.)
Java singles out this particular case :-)

It is a compile-time error if a decimal literal of type int is
larger than 2147483648 (2^31), or if the decimal literal 2147483648
appears anywhere other than as the operand of the unary minus
operator (§15.15.4).
[...]
It is a compile-time error if a decimal literal of type long is
larger than 9223372036854775808L (2^63), or if the decimal literal
9223372036854775808L appears anywhere other than as the operand of
the unary minus operator (§15.15.4).

https://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.10.1

-- Roberto
Philippe Verdy
2018-11-29 14:57:52 UTC
Permalink
The parser can still initially parse the "-" token separately from the
parsed value "9223372036854775808" which cannot be an integer and is then
given a double value (hoping that the double will keep its precision, but
even if a double has less bits, the truncated bits in this case are all
zeroes.

So the double value 9223372036854775808 which is still a constant, can
easily be tested when it appears next to the unary minur operator in the
syntax tree which now as tow tokens: suring the "reduce" step of the
parser: the unary minus and the double value which is still exactly exactly
equal to 9223372036854775808.
This just requires an additional reduce rule in the syntaxic parser (not in
the lexer) to resolve it as if it was a single precomputed negative integer
constant for this specific case.

However this would mean that -922337203685477580.8e1 would turn to be
parsed as an integer even if the source specified it explicitly as a double.

There are two ways to handle this:
- either the lexer does not parse numeric strings itself, and leaves the
token as a string. The actual conversion to a datatype will be done in the
parser.
- or the lexer parse the numeric strings and does not only return the token
with attributes set to the numeric value, but also containing an "hint"
indicator of which parser (integer or floating point) it used to reduce the
numeric string to a resolved floating point constant. This approaches
complicates only one parser rule:

unaryexpression ::= '-' INTCONSTANT
unaryexpression ::= '-' DOUBLECONSTANT
{ if (tokens[1].doublevalue == 9223372036854775808.0) {
tokens[1].type = INTCONSTANT;
tokens[1].intvalue = -9223372036854775808;
}
unaryexpression ::= ....
unaryexpression ::= '-' (expression)

(here the "tokens[]" is some (C-like) array that access to properties of
tokens returned by the lexer, and being reduced in the parsiing rules
(whose number of tokens in the array is determined by the parsing rule,
here there are 2 tokens) and stored and modified by the parser in its
abstract syntax tree, assuming that tokens[i].type is one of the defined
token type constants returned by the lexer which can also set
tokens[i].intvalue or token[i].doublevalue, or token[i].stringvalue for
NAME token types or for LITTERALSTRING token types).
Post by Muh Muhten
Post by pocomane
-- Min integer literal, BUG ???
assert(tostring(-9223372036854775808) == '-9.2233720368548e+018')
assert(math.type(-9223372036854775808) == 'float')
The issue appears to be that the lexer sees '-' separately from the
numeral itself. As such, when reading the number, it must fit in a
non-negative integer, and is then constant-folded to the actual
negative. Incidentally, it appears that this corner case has already
lstrlib.c
961: const char *format = (n == LUA_MININTEGER) /* corner case? */
962- ? "0x%" LUA_INTEGER_FRMLEN "x" /* use hexa */
963- : LUA_INTEGER_FMT; /* else use default format */
It seems to me that making this particular case work would be rather
involved, and essentially require negative numbers to be first-class
citizens in the grammar, rather than cobbled together through unary
minus and constant folding. I also don't see a satisfying solution
for, e.g. "- 9223372036854775808", "- --[[]]9223372036854775808",
"-(9223372036854775808)", though arguably those are *explicitly* the
negation of positive 9223372036854775808, which doesn't fit, and
really should be an integer.
In any case, the main uses I can see for having that number as a
constant and an integer are qua -2^63 and qua minimum integer. Perhaps
some variation on "-2^63|0", "-1<<63", "~(~0>>1)", or
"-0x8000000000000000" might be suitable? (Though that last one only
happens to work because 0x8000000000000000 == -0x8000000000000000 (64
bits) in 2's complement, and the unchecked overflow from casting to
signed after reading a hex literal may be undefined behavior.)
Philippe Verdy
2018-11-29 15:15:42 UTC
Permalink
Better solution:

unaryexpression ::= '-' DOUBLECONSTANT
unaryexpression ::= '-' INTCONSTANT
{
//discard the AST generated in $0 for the reduction of $1 $2,
replace it with a single token:
$0.type = INTCONSTANT
if ($1.intvalue < 0)
$0.intvalue= $2.intvalue;
else
$0.intvalue = -$2.intvalue;
}
unaryexpression ::= ....
unaryexpression ::= '-' (expression)

This way there's no risk of interpreting "-922337203685477580.8e1" as an
integer, it will remain a double.

It is assumed that the lexer will return always positive values when
parsing numeric strings and returning DOUBLECONSTANT and INTCONSTANT token
types, except that when it will use the integer parser and will return an
INTCONSTANT token type, it will set the token.intvalue directly, which will
then turn to be negative for the specific MININT value. We can detect it
easily during the reduction of the rule, to discard the unary minus,
because the "intvalue" is already correct and does not need to be negated.
Post by Philippe Verdy
The parser can still initially parse the "-" token separately from the
parsed value "9223372036854775808" which cannot be an integer and is then
given a double value (hoping that the double will keep its precision, but
even if a double has less bits, the truncated bits in this case are all
zeroes.
So the double value 9223372036854775808 which is still a constant, can
easily be tested when it appears next to the unary minur operator in the
syntax tree which now as tow tokens: suring the "reduce" step of the
parser: the unary minus and the double value which is still exactly exactly
equal to 9223372036854775808.
This just requires an additional reduce rule in the syntaxic parser (not
in the lexer) to resolve it as if it was a single precomputed negative
integer constant for this specific case.
However this would mean that -922337203685477580.8e1 would turn to be
parsed as an integer even if the source specified it explicitly as a double.
- either the lexer does not parse numeric strings itself, and leaves the
token as a string. The actual conversion to a datatype will be done in the
parser.
- or the lexer parse the numeric strings and does not only return the
token with attributes set to the numeric value, but also containing an
"hint" indicator of which parser (integer or floating point) it used to
reduce the numeric string to a resolved floating point constant. This
unaryexpression ::= '-' INTCONSTANT
unaryexpression ::= '-' DOUBLECONSTANT
{ if (tokens[1].doublevalue == 9223372036854775808.0) {
tokens[1].type = INTCONSTANT;
tokens[1].intvalue = -9223372036854775808;
}
unaryexpression ::= ....
unaryexpression ::= '-' (expression)
(here the "tokens[]" is some (C-like) array that access to properties of
tokens returned by the lexer, and being reduced in the parsiing rules
(whose number of tokens in the array is determined by the parsing rule,
here there are 2 tokens) and stored and modified by the parser in its
abstract syntax tree, assuming that tokens[i].type is one of the defined
token type constants returned by the lexer which can also set
tokens[i].intvalue or token[i].doublevalue, or token[i].stringvalue for
NAME token types or for LITTERALSTRING token types).
Post by Muh Muhten
Post by pocomane
-- Min integer literal, BUG ???
assert(tostring(-9223372036854775808) == '-9.2233720368548e+018')
assert(math.type(-9223372036854775808) == 'float')
The issue appears to be that the lexer sees '-' separately from the
numeral itself. As such, when reading the number, it must fit in a
non-negative integer, and is then constant-folded to the actual
negative. Incidentally, it appears that this corner case has already
lstrlib.c
961: const char *format = (n == LUA_MININTEGER) /* corner case? */
962- ? "0x%" LUA_INTEGER_FRMLEN "x" /* use hexa */
963- : LUA_INTEGER_FMT; /* else use default format */
It seems to me that making this particular case work would be rather
involved, and essentially require negative numbers to be first-class
citizens in the grammar, rather than cobbled together through unary
minus and constant folding. I also don't see a satisfying solution
for, e.g. "- 9223372036854775808", "- --[[]]9223372036854775808",
"-(9223372036854775808)", though arguably those are *explicitly* the
negation of positive 9223372036854775808, which doesn't fit, and
really should be an integer.
In any case, the main uses I can see for having that number as a
constant and an integer are qua -2^63 and qua minimum integer. Perhaps
some variation on "-2^63|0", "-1<<63", "~(~0>>1)", or
"-0x8000000000000000" might be suitable? (Though that last one only
happens to work because 0x8000000000000000 == -0x8000000000000000 (64
bits) in 2's complement, and the unchecked overflow from casting to
signed after reading a hex literal may be undefined behavior.)
Dirk Laurie
2018-11-27 19:31:30 UTC
Permalink
Post by pocomane
It seems to me that there is an inconsistency in how lua handles the
min and the max integer literals. The following code ends without any
error both in Lua 5.3 and Lua 5.4.
```
-- Max integer, both literal and reference are ok
assert(tostring(math.maxinteger) == '9223372036854775807')
assert(tostring(9223372036854775807) == '9223372036854775807')
assert(math.type(9223372036854775807) == 'integer')
-- Min integer+1, literal is ok
assert(tostring(-9223372036854775807) == '-9223372036854775807')
assert(math.type(-9223372036854775807) == 'integer')
-- Min integer, reference is ok
assert(tostring(math.mininteger) == '-9223372036854775808')
assert(math.type(math.mininteger) == 'integer')
-- Min integer literal, BUG ???
assert(tostring(-9223372036854775808) == '-9.2233720368548e+018')
assert(math.type(-9223372036854775808) == 'float')
```
Is it a bug?
On my system (Ubuntu 18.04, amd64 processor), the second-last
assertion fails, but the last one succeeds.
Andrew Gierth
2018-11-28 00:00:09 UTC
Permalink
Post by pocomane
-- Min integer literal, BUG ???
assert(tostring(-9223372036854775808) == '-9.2233720368548e+018')
assert(math.type(-9223372036854775808) == 'float')
Dirk> On my system (Ubuntu 18.04, amd64 processor), the second-last
Dirk> assertion fails, but the last one succeeds.

The assert(tostring(-9223372036854775808) == '-9.2233720368548e+018')
is fragile in that it makes assumptions about the float output format
that may not be true - e.g. whether the exponent is given as e+018
or e+18 (which is what I get on freebsd) or e18.
--
Andrew.
pocomane
2018-11-28 07:14:08 UTC
Permalink
On Wed, Nov 28, 2018 at 1:00 AM Andrew Gierth
Post by Andrew Gierth
Dirk> On my system (Ubuntu 18.04, amd64 processor), the second-last
Dirk> assertion fails, but the last one succeeds.
The assert(tostring(-9223372036854775808) == '-9.2233720368548e+018')
is fragile in that it makes assumptions about the float output format
that may not be true - e.g. whether the exponent is given as e+018
or e+18 (which is what I get on freebsd) or e18.
Sorry, I think a better way to test it is:

```
assert(tostring(-9223372036854775808) ~= '-9223372036854775808')
```
Loading...