Discussion:
What would you remove from Lua - a case of regression?
Dibyendu Majumdar
2018-11-25 17:40:06 UTC
Permalink
Hi,

One of the difficulties of generating high performance code for Lua in
a _naive_ way is that many op codes have a lot of branching. A simple
add instruction may do:

if (isint(a) && isint(b))
integer add a, b
else if (isfloat(a) && isfloat(b))
float add a, b
else if (hasaddmeta(a) || hasaddmeta(b))
invoke meta add a, b

This is really inefficient when JIT compiled.

Added to this is the need to support coroutines, which can yield in
the middle of a metamethod, so that complicates life further because
after a metamethod call, you cannot assume that the Lua stack is the
same.

So I am thinking of experimenting with a minified Lua - regress back
to 5.1 but on top of that:

* Eliminate metamethods except for __gc.
* No more coroutines

So back to only one number type. As you can imagine this would mean
add can simply be:
a + b

Now, I do not know how often people implement operator overloading -
but given some languages do not have this feature it may not be a big
loss.

I personally can do without coroutines too, I am not sure how widely
they are used; but the trade off is that we eliminate a bunch of
complexity from mini Lua.

I think there are other things I could eliminate such as var args. But
for now, I may start with above. I don't know right now if I should
try to trim down current Ravi code or start with Lua 5.1.

The difficult bit is getting time to do all this.

Regards
Dibyendu
Sean Conner
2018-11-25 22:04:21 UTC
Permalink
Post by Dibyendu Majumdar
Hi,
One of the difficulties of generating high performance code for Lua in
a _naive_ way is that many op codes have a lot of branching. A simple
if (isint(a) && isint(b))
integer add a, b
else if (isfloat(a) && isfloat(b))
float add a, b
else if (hasaddmeta(a) || hasaddmeta(b))
invoke meta add a, b
This is really inefficient when JIT compiled.
There are languages that do type inference, where from the context of the
code it can determine the type of a variable. So for instance, if all calls
to a function always pass in integers, you can specialize the code for that.
For example, this function:

-- written for Lua 5.1
function fromjulianday(jd)
local a = jd + 32044
local b = math.floor((4 * a + 3) / 146097)
local c = a - math.floor((b * 146097) / 4)
local d = math.floor((4 * c + 3) / 1461)
local e = c - math.floor((1461 * d) / 4)
local m = math.floor((5 * e + 2) / 153)

return {
day = e - math.floor((153 * m + 2) / 5) + 1,
month = m + 3 - 12 * math.floor((m / 10)),
year = b * 100 + d - 4800 + math.floor(m / 10)
}
end

Given the presence of math.floor(), this indicates the code wants integer
results. Also, jd is used as if an integer, so this function can be flagged
as taking an integer as its parameter.

Now, having only heard of type inference (and very rarely used a language
that has it) I don't know how difficult this would be.
Post by Dibyendu Majumdar
* Eliminate metamethods except for __gc.
I looked over my own code for uses of metatables. I use them extensively
for userdata, very rarely for tables, and not at all for the other types.
Furthermore, there are a few methods I've repeatedly used (for both, unless
otherwise noted):

__mode (table)
__index
__newindex
__tostring (userdata)
__gc
__len

I did find one or two uses of the following in all my code:

__call (usedata)
__add (userdata [1])
__sub (userdata [1])
__unm (userdata [1])
__eq (userdata)
__lt (userdata)
__le (userdata)
__concat (userdata)

But this is just my code. Take it with a grain of salt.

At the very least, consider keeping __mode (which is used in conjunction
with the garbage collector), and possibly making a cheap check for __index,
__newindex and __len (if possible).
Post by Dibyendu Majumdar
* No more coroutines
This is a deal-breaker for me (more on this below).
Post by Dibyendu Majumdar
So back to only one number type. As you can imagine this would mean
a + b
Now, I do not know how often people implement operator overloading -
but given some languages do not have this feature it may not be a big
loss.
Personally, almost never. But I do use LPeg (a lot) and that does use
operator overloading. And again, for me, this would be a big loss.
Post by Dibyendu Majumdar
I personally can do without coroutines too, I am not sure how widely
they are used; but the trade off is that we eliminate a bunch of
complexity from mini Lua.
I use them. Maybe not as often as LPeg, but in the context I use them,
they are invaluable. I use coroutines to implement servers. A network
request comes on, I create a coroutine to handle that request. A simple
echo service:

local signal = require "org.conman.signal"
local nfl = require "org.conman.nfl"
local tcp = require "org.conman.nfl.tcp"
local addr,port = ...

signal.catch('int')
signal.catch('term')

tcp.listen(addr or "0.0.0.0",port or 8000,function(ios)
repeat
local line = ios:read("*L") -- potential yield spot
ios:write(line) -- potential yield spot
until line == ""
ios:close()
end)

nfl.eventloop() -- this manages scheduling of coroutines

It makes the actual code to handle the connection straightforward instead
of chopping it up into an incomprehensible nest of callback functions.

There are a few other instances were coroutines are nice (such as the
Same Fringe Problem [3]) but for me, it's handling network services. I
think this would be the common use for coroutines today.

-spc

[1] One module and that one related to signals [2], used to add and
remove signals from a signal set (a POSIX thing).

[2] https://github.com/spc476/lua-conmanorg/blob/master/src/signal.c

[3] http://wiki.c2.com/?SameFringeProblem
Philippe Verdy
2018-11-25 23:24:31 UTC
Permalink
Note: math.floor() is not required toi return a native binary integer. It
may return a variant type using integers for some ranges, and doubles
outside the range. So even in this case the the inference would not help
eliminate the necessary tests on the effective type of the variant... The
compiler has to know also the value range.
But your existing function fromjulianday(jd) does not check the value range
of d, so it's not possible to determine the value range for math.floor()
and then reduce it to code using only native integers.

What your code can do however is to favorize the integer case by making
branch prediction and avoiding most branches for that case. The CPU will
iteself make its own branch prediction anyway and possibly adapt itself
using a branch prediction cache.

What is much more useful is to know how instructions are scheduled and
pipelined: augment the number of distinct instructions that can handle data
in parallel, according to the number of native registers you have and avoid
scheduling two costly execution units (like FPUs) in parallel.

Instruction scheduling (and correct understanding of how CPU pipelines are
working (and how the central pipeline stage, i.e. execution in a
ALU/FPU/VPU port or memory access, can avoid using the same cycles: this
also mitigates the time attacks based on dynamic branch predictions and
speculative execution).

A good optimizer also can know equivalent patterns of instructions that
avoid branch predictions (e.g. it's perfectly possible to implement the
function abx(x) without using any conditional branch, by computing the two
parallel branches and combining them with a binary or: this uses two
parallel pipelines and a final "rendez-vous" where some other independant
instructions are scheduled just before computing the binary OR): it is
difficult to find other instructions in an isolated abs(x) function, but
not if the function is inlined within more complex expressions. Detecting
such patterns requires a good database of equivalent binary expressions
(e.g. optimizing multipliations by shifts, and optimizing shifts by
additions, and avoiding patterns like "store Register[n] to [x]; read
Register[m] from [x]" and replacing it by "move Register[n] to [x]; move
Register[n] to Register[m]" where the two instructions can run in
parallel...). Such rewriting required analyzing the datapaths and which
virtual intermediate result is reused later or not, so that you can reduce
the number of registers needed. This will then largely reduce the footprint
on the datacaches (from the stack or actual native registers).

Note that the ompiler must still be correct semantically (notably for
handling exceptions/errors and pcall() correctly).

As well a compiler can autodetect constant expressions like
"(a+b)*(c-(a+b))". But this is not easy at all if some elements are
actually functions like in "Rand()+Rand()" where not only the two calls to
function Rand() is supposed to return different numbers, but also the
function must be called twice (the number of calls may be significant is
the function uses variables kept across calls in its "closure")
Post by Sean Conner
Post by Dibyendu Majumdar
Hi,
One of the difficulties of generating high performance code for Lua in
a _naive_ way is that many op codes have a lot of branching. A simple
if (isint(a) && isint(b))
integer add a, b
else if (isfloat(a) && isfloat(b))
float add a, b
else if (hasaddmeta(a) || hasaddmeta(b))
invoke meta add a, b
This is really inefficient when JIT compiled.
There are languages that do type inference, where from the context of the
code it can determine the type of a variable. So for instance, if all calls
to a function always pass in integers, you can specialize the code for that.
-- written for Lua 5.1
function fromjulianday(jd)
local a = jd + 32044
local b = math.floor((4 * a + 3) / 146097)
local c = a - math.floor((b * 146097) / 4)
local d = math.floor((4 * c + 3) / 1461)
local e = c - math.floor((1461 * d) / 4)
local m = math.floor((5 * e + 2) / 153)
return {
day = e - math.floor((153 * m + 2) / 5) + 1,
month = m + 3 - 12 * math.floor((m / 10)),
year = b * 100 + d - 4800 + math.floor(m / 10)
}
end
Given the presence of math.floor(), this indicates the code wants integer
results. Also, jd is used as if an integer, so this function can be flagged
as taking an integer as its parameter.
Now, having only heard of type inference (and very rarely used a language
that has it) I don't know how difficult this would be.
Post by Dibyendu Majumdar
* Eliminate metamethods except for __gc.
I looked over my own code for uses of metatables. I use them extensively
for userdata, very rarely for tables, and not at all for the other types.
Furthermore, there are a few methods I've repeatedly used (for both, unless
__mode (table)
__index
__newindex
__tostring (userdata)
__gc
__len
__call (usedata)
__add (userdata [1])
__sub (userdata [1])
__unm (userdata [1])
__eq (userdata)
__lt (userdata)
__le (userdata)
__concat (userdata)
But this is just my code. Take it with a grain of salt.
At the very least, consider keeping __mode (which is used in conjunction
with the garbage collector), and possibly making a cheap check for __index,
__newindex and __len (if possible).
Post by Dibyendu Majumdar
* No more coroutines
This is a deal-breaker for me (more on this below).
Post by Dibyendu Majumdar
So back to only one number type. As you can imagine this would mean
a + b
Now, I do not know how often people implement operator overloading -
but given some languages do not have this feature it may not be a big
loss.
Personally, almost never. But I do use LPeg (a lot) and that does use
operator overloading. And again, for me, this would be a big loss.
Post by Dibyendu Majumdar
I personally can do without coroutines too, I am not sure how widely
they are used; but the trade off is that we eliminate a bunch of
complexity from mini Lua.
I use them. Maybe not as often as LPeg, but in the context I use them,
they are invaluable. I use coroutines to implement servers. A network
request comes on, I create a coroutine to handle that request. A simple
local signal = require "org.conman.signal"
local nfl = require "org.conman.nfl"
local tcp = require "org.conman.nfl.tcp"
local addr,port = ...
signal.catch('int')
signal.catch('term')
tcp.listen(addr or "0.0.0.0",port or 8000,function(ios)
repeat
local line = ios:read("*L") -- potential yield spot
ios:write(line) -- potential yield spot
until line == ""
ios:close()
end)
nfl.eventloop() -- this manages scheduling of coroutines
It makes the actual code to handle the connection straightforward instead
of chopping it up into an incomprehensible nest of callback functions.
There are a few other instances were coroutines are nice (such as the
Same Fringe Problem [3]) but for me, it's handling network services. I
think this would be the common use for coroutines today.
-spc
[1] One module and that one related to signals [2], used to add and
remove signals from a signal set (a POSIX thing).
[2] https://github.com/spc476/lua-conmanorg/blob/master/src/signal.c
[3] http://wiki.c2.com/?SameFringeProblem
Muh Muhten
2018-11-26 00:20:40 UTC
Permalink
Post by Philippe Verdy
Note: math.floor() is not required toi return a native binary integer. It
may return a variant type using integers for some ranges, and doubles
outside the range. So even in this case the the inference would not help
eliminate the necessary tests on the effective type of the variant... The
math.floor can return literally anything. It can be redefined.

In the case where it _isn't_, this is pointless pedantry since it's
not unreasonable to expect the compiler to be aware of (idioms
involving) the stdlib. (And vice versa.)
Post by Philippe Verdy
Note that the ompiler must still be correct semantically (notably for
handling exceptions/errors and pcall() correctly).
Pray tell, what does this entail under the pretext of _changing the
semantics_ to make things easier for the compiler?
Philippe Verdy
2018-11-26 06:42:42 UTC
Permalink
Post by Muh Muhten
Post by Philippe Verdy
Note: math.floor() is not required toi return a native binary integer. It
may return a variant type using integers for some ranges, and doubles
outside the range. So even in this case the the inference would not help
eliminate the necessary tests on the effective type of the variant... The
math.floor can return literally anything. It can be redefined
Here I was speaking about the standard library function, without any
redefinition: it takes a 'number' (usually a double when compiling Lua
itself) and returns a 'number' which is not limited to a 53-bit integer).

So the function computing the Julian day MUST still contain a code testing
the value to make sure it is in range. Technically the given sample
function was NOT correct as it returns random value if it is not. But it
will be correct if "number" is a 64-bit double, and the function first
check that the parameter is within +/-2^31 before performing all the rest
in the "if" branch (the else branch may just return nil to indicate the
error or call error()); then the compiler know that the given number is not
a NaN and not infinite; and the standard function math.floor (the compiler
can know that it is not overloaded) is warrantied to return a 32-bit
integer. Then it can perform optimization of the arithmetic code that
follows (it still needs to check that arthmetic operators are also not
overloaded).

Operators can be redefined too ! The code "1+1" in Lua is not even
warrantied to return 2 if "+" is redefined (in function contexts where
metatable contain an overload).

In summary the compiler needs to check the environment (it is easy to
generate two compiled codes version depending if the environment uses the
standard libary or not, it's not so complex because it is determined only
by the standard Lua language semantics), before doing any kind of type
inference (which is more complex: tracing the possible value range is much
more complex as it depends on preevaluating all possible limits!)

A simple check like "if (d>>0 == d) { ... } else return nil" within the Lua
functio nto compiler would do the trick where the compiler can determine
that d is effectively using an integer in correct range for integers. but
the actual test **in Lua** is a bit more complex because "local a = jd +
32044" can overflow in 32-bit integers, just like the subexpression "(4 * a
+ 3)" used in the next statement of the sample: a 32-bit check is not
enough to ensure the result would be correct : the result of the function
is actually correct if "jd" is in range of 29-bit integers so the initial
test should be "if (d>>3<<3 == d) { ... } else return nil" ...

So the compiler would need to track all constant and variable value ranges
to make inference. Type inference is not enough !


In the case where it _isn't_, this is pointless pedantry since it's
Post by Muh Muhten
not unreasonable to expect the compiler to be aware of (idioms
involving) the stdlib. (And vice versa.)
So you're creating another language which is not Lua, if you remove parts
of its core features. The compiler you're creating is not a compiler for
Lua...
Post by Muh Muhten
Post by Philippe Verdy
Note that the ompiler must still be correct semantically (notably for
handling exceptions/errors and pcall() correctly).
Pray tell, what does this entail under the pretext of _changing the
semantics_ to make things easier for the compiler?
Changing the semantics of the language means you're creating a compiler for
another language. Lua cannot work at all without exception/error handling
Sean Conner
2018-11-26 06:59:36 UTC
Permalink
Post by Muh Muhten
Post by Philippe Verdy
Note: math.floor() is not required toi return a native binary integer. It
may return a variant type using integers for some ranges, and doubles
outside the range. So even in this case the the inference would not help
eliminate the necessary tests on the effective type of the variant... The
math.floor can return literally anything. It can be redefined.
Of course, one could rewrite the function to remove the call to
math.floor() to apease the Great Pedantic One:

function fromjulianday(jd)
local a = jd + 32044
local b = (4 * a + 3) // 146097
local c = a - (b * 146097) // 4
local d = (4 * c + 3) // 1461
local e = c - (1461 * d) // 4
local m = (5 * e + 2) // 153

return {
day = e - (153 * m + 2) // 5 + 1,
month = m + 3 - 12 * (m // 10),
year = b * 100 + d - 4800 + (m // 10)
}
end

This is Lua 5.3 specific code where '//' is integer division.

-spc
Philippe Verdy
2018-11-26 10:53:33 UTC
Permalink
You've not understood the "Great Pedantic" (sic! visibly you use that "The
Great" expression as a directed insult) at all and what I really wanted to
show: your suggestion does not change anything to what I meant (you've
removed the call to the library "math.floor" (which does nbot matter at
all, I was just speaking about what it meant, not the fact it could be
overriden: the fact that even without changing/overridening it it still
does not warranties to return a valid 32-bit integer).

Now you've used the "//" operator, which also does not warranty it (even
without changing/overriding it): it still operates on Lua "number", which
are floating points, and returns a floating point value whose value may
STILL be out of range for native integers, so it's still not safe to
"optimize" the compilation of this code using native integers, because this
is still not what the Lua code effectively does: such compilation would be
broken as well (an I insist: this is is true EVEN if there's no override at
all for the Lua arithmetic operators).

The "//" operator can return a value like 1e99 which is an "integer", but
given as a floatting point value, completely out of range for native
integers (even if these native integers are 64 bit!). And the same applies
to the math.floor() operation.

Do you see now what I mean?

The rest of the function can still be computed, but will give "almost
random"/"non-meaningful" date results if you don't take into account the
maximum precision of floating point values in "Lua number" (which is 53 bit
if "number" is compiled as an IEEE 64-bit double), but which successfully
returns correct values when "jd" is in the range of a 50-bit signed integer.

Note also that this function is not correct if "jd" is negative: the "a //
b" operation in Lua is NOT returning a "floor(a/b)" operation but a
"floor(abs(a)/abs(b))*sign(a)*sign(b)" operation. But it is still defined
for the whole range of Lua "numbers" (even if their absolute value is
higher than 2^52 and then cannot store any fractional bits and can only
represent *some* integers by rounding them to some *other* integers)

You are then making false assumptions about what are Lua numbers: they are
not a simple mix of native binary integers and native floating points, they
only represent floatting point values with limited precision, but much
wider value ranges (plus special values for "NaNs", "infinites", "negative
zero", plus "denormal values" which are normal finite numbers but with
limited precision; all of them do not exist as native binary integers)

The Lua code as it is written is not simply optimizable using 32-bit or
64-bit native integers without taking into account the precision and value
range of numbers (independantly of the fact that the function does not
really test for the actual Lua type of its parameter).

So the real Lua code actually makes tests everywhere about types to infer
which function will be used to make every step (including in the Lua
virtual machine which operates only on "numbers"), and will then determine
how to handle the value range and precision correctly: there are type
checking and bound checking at every step).
Post by Sean Conner
Post by Muh Muhten
Post by Philippe Verdy
Note: math.floor() is not required toi return a native binary integer.
It
Post by Muh Muhten
Post by Philippe Verdy
may return a variant type using integers for some ranges, and doubles
outside the range. So even in this case the the inference would not
help
Post by Muh Muhten
Post by Philippe Verdy
eliminate the necessary tests on the effective type of the variant...
The
Post by Muh Muhten
math.floor can return literally anything. It can be redefined.
Of course, one could rewrite the function to remove the call to
function fromjulianday(jd)
local a = jd + 32044
local b = (4 * a + 3) // 146097
local c = a - (b * 146097) // 4
local d = (4 * c + 3) // 1461
local e = c - (1461 * d) // 4
local m = (5 * e + 2) // 153
return {
day = e - (153 * m + 2) // 5 + 1,
month = m + 3 - 12 * (m // 10),
year = b * 100 + d - 4800 + (m // 10)
}
end
This is Lua 5.3 specific code where '//' is integer division.
-spc
Dibyendu Majumdar
2018-11-26 19:33:14 UTC
Permalink
Post by Sean Conner
There are languages that do type inference, where from the context of the
code it can determine the type of a variable.
Yes that is an option that I would like to persue. One problem is
whenever you have return value from a function call you don't know
what the result will be ... so at that point you have to go back to
not knowing the type

The ideal solution is to specialize functions by types. I used to
think the function should be specialized upon entry - but it may make
more sense to specialize at call site. But return values are still a
problem.
Post by Sean Conner
Post by Dibyendu Majumdar
* Eliminate metamethods except for __gc.
I looked over my own code for uses of metatables. I use them extensively
for userdata, very rarely for tables, and not at all for the other types.
Furthermore, there are a few methods I've repeatedly used (for both, unless
__mode (table)
__index
__newindex
__tostring (userdata)
__gc
__len
Perhaps I should retain these. Certainly I intend to get rid of the
arithmetic operators to start with.
Post by Sean Conner
Post by Dibyendu Majumdar
Now, I do not know how often people implement operator overloading -
but given some languages do not have this feature it may not be a big
loss.
Personally, almost never. But I do use LPeg (a lot) and that does use
operator overloading. And again, for me, this would be a big loss.
Yes LPeg is one of the heavy users of operator overloading.
Post by Sean Conner
Post by Dibyendu Majumdar
I personally can do without coroutines too, I am not sure how widely
they are used; but the trade off is that we eliminate a bunch of
complexity from mini Lua.
I use them. Maybe not as often as LPeg, but in the context I use them,
they are invaluable. I use coroutines to implement servers.
I think it would be nice to have two VMs built-in - a full featured
one and a cut-down one, with user being able to choose the one they
want to use. But it is harder to switch between dual number type to a
single number type.


Thanks for the feedback.

Regards
Dibyendu
Philippe Verdy
2018-11-26 21:21:19 UTC
Permalink
Scientific apps are very likely to overload operators to work on less
limited numeric systems or microstructure semantics than just plus numbers
whose precision and value range are unspecified, but also have undesired
values that we want to restrict, or when we need more dimensions than just
one. This requires developing a new class where all operators will be
overridden, even if internally they are implemented with numbers.

Unfortunately we still cannot use any strict integer type in Lua without
the cost of IEEE doubles, and Lua is already inconsistent with some number
operators like shifts and binary ops which truncate operands to ints but
return the result by reconverting it to doubles, looking their initial
restriction of range and precision.

As well Lua still does not standardize (like C or C++, or C#, or JavaScript
or Java, or even COBOL, SQL, and RDF) any way to know the limits of its own
type system. This creates portability problems. This is a problem also
shared by PHP.
Post by Dibyendu Majumdar
Post by Sean Conner
There are languages that do type inference, where from the context of
the
Post by Sean Conner
code it can determine the type of a variable.
Yes that is an option that I would like to persue. One problem is
whenever you have return value from a function call you don't know
what the result will be ... so at that point you have to go back to
not knowing the type
The ideal solution is to specialize functions by types. I used to
think the function should be specialized upon entry - but it may make
more sense to specialize at call site. But return values are still a
problem.
Post by Sean Conner
Post by Dibyendu Majumdar
* Eliminate metamethods except for __gc.
I looked over my own code for uses of metatables. I use them
extensively
Post by Sean Conner
for userdata, very rarely for tables, and not at all for the other types.
Furthermore, there are a few methods I've repeatedly used (for both,
unless
Post by Sean Conner
__mode (table)
__index
__newindex
__tostring (userdata)
__gc
__len
Perhaps I should retain these. Certainly I intend to get rid of the
arithmetic operators to start with.
Post by Sean Conner
Post by Dibyendu Majumdar
Now, I do not know how often people implement operator overloading -
but given some languages do not have this feature it may not be a big
loss.
Personally, almost never. But I do use LPeg (a lot) and that does use
operator overloading. And again, for me, this would be a big loss.
Yes LPeg is one of the heavy users of operator overloading.
Post by Sean Conner
Post by Dibyendu Majumdar
I personally can do without coroutines too, I am not sure how widely
they are used; but the trade off is that we eliminate a bunch of
complexity from mini Lua.
I use them. Maybe not as often as LPeg, but in the context I use them,
they are invaluable. I use coroutines to implement servers.
I think it would be nice to have two VMs built-in - a full featured
one and a cut-down one, with user being able to choose the one they
want to use. But it is harder to switch between dual number type to a
single number type.
Thanks for the feedback.
Regards
Dibyendu
Dibyendu Majumdar
2018-11-26 21:32:17 UTC
Permalink
Unfortunately we still cannot use any strict integer type in Lua without the cost of IEEE doubles, and Lua is already inconsistent with some number operators like shifts and binary ops which truncate operands to ints but return the result by reconverting it to doubles, looking their initial restriction of range and precision.
Hi, Lua 5.3 has 64-bit integer types - I was not sure whether you are
thinking of an earlier version of Lua above.

Regards
Dibyendu
Philippe Verdy
2018-11-26 21:46:53 UTC
Permalink
That is not true. Lua is designed to allow its number type to be
configurable but in its default configuration it just uses what C gives as
a double.

Nothing indicates it can store any 64 bit integer and in fact it can almost
never store them in a single number, unless Lua is specifically compiled
using long double in C and they are represented as 80 bit with a 64 bit
mantissa part...

No there's no standard way to know the limits of numbers in lua. You need
to test them by trying some operations in test loops during initialization
of your code, and then use these dynamic results as if they were constants.
But a Lua interpreter or compiler will never know that they are constants
after the initialisation when bref to be variables during the
initialization where their final value is still not known.
Post by Philippe Verdy
Post by Philippe Verdy
Unfortunately we still cannot use any strict integer type in Lua without
the cost of IEEE doubles, and Lua is already inconsistent with some number
operators like shifts and binary ops which truncate operands to ints but
return the result by reconverting it to doubles, looking their initial
restriction of range and precision.
Hi, Lua 5.3 has 64-bit integer types - I was not sure whether you are
thinking of an earlier version of Lua above.
Regards
Dibyendu
Philippe Verdy
2018-11-26 21:50:36 UTC
Permalink
The same is true for Lua 5.3 when it also adds integers with unspecified
limits that are just by default bound to what the C compiler gives in it's
int datatype but also allows configuring that type.

Lua numeric types have unpredictable limits
Post by Philippe Verdy
That is not true. Lua is designed to allow its number type to be
configurable but in its default configuration it just uses what C gives as
a double.
Nothing indicates it can store any 64 bit integer and in fact it can
almost never store them in a single number, unless Lua is specifically
compiled using long double in C and they are represented as 80 bit with a
64 bit mantissa part...
No there's no standard way to know the limits of numbers in lua. You need
to test them by trying some operations in test loops during initialization
of your code, and then use these dynamic results as if they were constants.
But a Lua interpreter or compiler will never know that they are constants
after the initialisation when bref to be variables during the
initialization where their final value is still not known.
Post by Philippe Verdy
Unfortunately we still cannot use any strict integer type in Lua
without the cost of IEEE doubles, and Lua is already inconsistent with some
number operators like shifts and binary ops which truncate operands to ints
but return the result by reconverting it to doubles, looking their initial
restriction of range and precision.
Hi, Lua 5.3 has 64-bit integer types - I was not sure whether you are
thinking of an earlier version of Lua above.
Regards
Dibyendu
Dibyendu Majumdar
2018-11-26 21:51:59 UTC
Permalink
That is not true. Lua is designed to allow its number type to be configurable but in its default configuration it just uses what C gives as a double.
Nothing indicates it can store any 64 bit integer and in fact it can almost never store them in a single number, unless Lua is specifically compiled using long double in C and they are represented as 80 bit with a 64 bit mantissa part...
Hi, above is incorrect. I know as I maintain Ravi a derivative of Lua
5.3. Integers are maintained natively as integer values in 5.3.
Conversion to double happens as per C promotion rules - when you do
operations that involve integer and double values. I am a bit
concerned that you may be stating things as if they are true when they
are not.

I would suggest you try out Lua 5.3 and look at its source code.

Regards
Dibyendu
Sean Conner
2018-11-26 21:56:03 UTC
Permalink
Post by Dibyendu Majumdar
Post by Philippe Verdy
That is not true. Lua is designed to allow its number type to be
configurable but in its default configuration it just uses what C gives
as a double.
Nothing indicates it can store any 64 bit integer and in fact it can
almost never store them in a single number, unless Lua is specifically
compiled using long double in C and they are represented as 80 bit with
a 64 bit mantissa part...
Hi, above is incorrect. I know as I maintain Ravi a derivative of Lua
5.3. Integers are maintained natively as integer values in 5.3.
Conversion to double happens as per C promotion rules - when you do
operations that involve integer and double values. I am a bit
concerned that you may be stating things as if they are true when they
are not.
I would suggest you try out Lua 5.3 and look at its source code.
It's even stated in the Lua 5.3 Reference Manual, section 2.1:

Standard Lua uses 64-bit integers and double-precision (64-bit)
floats, but you can also compile Lua so that it uses 32-bit integers
and/or single-precision (32-bit) floats.

http://lucy/~spc/docs/Lua-5.3/manual.html

Also, the constants math.maxinteger and math.mininteger are defined to
determine the limits of integers in Lua 5.3.

-spc
Sean Conner
2018-11-26 21:48:57 UTC
Permalink
Post by Dibyendu Majumdar
Post by Sean Conner
There are languages that do type inference, where from the context of the
code it can determine the type of a variable.
Yes that is an option that I would like to persue. One problem is
whenever you have return value from a function call you don't know
what the result will be ... so at that point you have to go back to
not knowing the type
Only for functions called before their definition. My gut says that
doesn't happen as often as one may think, but as always, measure. Perhaps
look at luarocks and luacheck (two sizable Lua applications, probably a good
representation of Lua usage) and check to see how often a function is called
prior to its defintion.

Also, isn't Ravi a JIT? I would think the first time through encountering
a situation, you check the return code, but on subsequent visits you may
have the information to re-JIT that part of the code.

Again, not doing this I have no idea of the level of effort involed, I'm
just throwing out ideas here.
Post by Dibyendu Majumdar
Post by Sean Conner
Post by Dibyendu Majumdar
* Eliminate metamethods except for __gc.
I looked over my own code for uses of metatables. I use them extensively
for userdata, very rarely for tables, and not at all for the other types.
Furthermore, there are a few methods I've repeatedly used (for both, unless
__mode (table)
__index
__newindex
__tostring (userdata)
__gc
__len
Perhaps I should retain these. Certainly I intend to get rid of the
arithmetic operators to start with.
At the very least, __mode should be retained since it helps guide the GC.
Post by Dibyendu Majumdar
Thanks for the feedback.
You're welcome.

-spc
Gavin Wraith
2018-11-26 22:01:14 UTC
Permalink
Post by Sean Conner
There are languages that do type inference, where from the context of the
code it can determine the type of a variable.
I have always been a fan of Lua, and have great respect for the decisions
made in its development; so a thread like this about what one would like
to remove, add or change is inescapably invidious for me.

I would like to see a reversal of the 'numbers are just numbers' policy;
that is, I would like more of a type barrier between integers and floats
for example. I would like to see
type (57) --> integer
type (57.0) --> float
Integers are always needed by the interpreter for its own internal
purposes - the "necessary numbers", but floats, or other number
systems (big numbers, complex numbers, modular arithmetic, ... ) are better suited for libraries, and their suitability is platform dependent - they
are "optional number systems".
I would like to see the type function capable of extension to tables with metatables with a __type index.

With numbers that take arbitrary amount of storage, there is no
avoiding them using heap storage. But with numbers that use a fixed
number of bytes, but more than are required for a value on the stack,
there is a choice: either you bung them in a union type with other
values on the stack, so that a lot of the stack is wasted space, or
you have have an auxiliary stack for them, and every time you push a number
to the auxiliary stack you also push a pointer to it onto the ordinary
stack. There has to be some jargon for this - sorry I do not know it.
There is an obvious space/time tradeoff between the two approaches.
Somebody's thesis somewhere?
The latter has the advantage that it makes for more uniform
handling of optional number systems with fixed size storage.
The auxiliary stack requires no garbage collection, of course.
You might have 32-bit integers internally, and 256-bit bigintegers for cryptographic purposes, for example.

--
Gavin Wraith (***@wra1th.plus.com)
Home page: http://www.wra1th.plus.com/
Dibyendu Majumdar
2018-11-26 22:04:58 UTC
Permalink
Post by Gavin Wraith
I would like to see a reversal of the 'numbers are just numbers' policy;
that is, I would like more of a type barrier between integers and floats
for example. I would like to see
type (57) --> integer
type (57.0) --> float
This is already true. Try this:

math.type(57)
math.type(57.0)
Coda Highland
2018-11-26 22:22:09 UTC
Permalink
On Mon, Nov 26, 2018 at 1:33 PM Dibyendu Majumdar
Post by Dibyendu Majumdar
I think it would be nice to have two VMs built-in - a full featured
one and a cut-down one, with user being able to choose the one they
want to use. But it is harder to switch between dual number type to a
single number type.
Have you considered doing the work at a different level? One common
bytecode format, one common VM, two parsers? Or possibly even a
source-to-source transpiler that compiles full-Lua down to mini-Lua?

This still doesn't help with the number typing, but for that you could
consider using a polymorphic approach, emitting multiple variations of
a function based on the function parameter types and dispatching to
them at call or JIT time based on the known types of the parameters.
LuaJIT does something similar to this. The checks do imply a small
performance hit if you can't statically determine the types at compile
time, but it's probably less of a hit than checking for every
operation.

/s/ Adam
Dibyendu Majumdar
2018-11-26 22:40:53 UTC
Permalink
Post by Coda Highland
On Mon, Nov 26, 2018 at 1:33 PM Dibyendu Majumdar
Post by Dibyendu Majumdar
I think it would be nice to have two VMs built-in - a full featured
one and a cut-down one, with user being able to choose the one they
want to use. But it is harder to switch between dual number type to a
single number type.
Have you considered doing the work at a different level? One common
bytecode format, one common VM, two parsers? Or possibly even a
source-to-source transpiler that compiles full-Lua down to mini-Lua?
Hi, that would not solve the problem as the problem I am trying to
solve is to have simpler code to execute with less unpredictable
branches. A common VM would have to handle the worst case. However
what I could do is fall back to the full featured VM when a type check
fails. This would cause a small performance hit for Lua code that
relies upon the bigger feature set, but in theory this cost can be
minimized by black listing the function so that next time it goes
immediately to the fallback VM.

Regards
Coda Highland
2018-11-26 23:39:01 UTC
Permalink
On Mon, Nov 26, 2018 at 4:41 PM Dibyendu Majumdar
Post by Dibyendu Majumdar
Post by Coda Highland
On Mon, Nov 26, 2018 at 1:33 PM Dibyendu Majumdar
Post by Dibyendu Majumdar
I think it would be nice to have two VMs built-in - a full featured
one and a cut-down one, with user being able to choose the one they
want to use. But it is harder to switch between dual number type to a
single number type.
Have you considered doing the work at a different level? One common
bytecode format, one common VM, two parsers? Or possibly even a
source-to-source transpiler that compiles full-Lua down to mini-Lua?
Hi, that would not solve the problem as the problem I am trying to
solve is to have simpler code to execute with less unpredictable
branches. A common VM would have to handle the worst case. However
what I could do is fall back to the full featured VM when a type check
fails. This would cause a small performance hit for Lua code that
relies upon the bigger feature set, but in theory this cost can be
minimized by black listing the function so that next time it goes
immediately to the fallback VM.
That's consistent with the polymorphic variation I was describing.

LuaJIT tries to be a little bit more forgiving about it. If it fails
the type check, it goes ahead and JITs it again with the new types. If
it fails AGAIN (I don't know exactly how many possible variations it
allows), it gives up and assumes it's a megamorphic function and just
always sends it through the slow-and-steady route.

Assuming your two VMs are still able to operate on the same in-memory
data structures so execution can freely switch between them, this
isn't an unreasonable idea at all.

/s/ Adam
Philippe Verdy
2018-11-27 09:58:33 UTC
Permalink
It's interesting to know how Google's V8+ engine handles Javascript's
dynamic types: for each object defined or each time a property is
added/modified/removed, it checks its datatype and creates/updates an
"interface" internal structure; all new objects created from the same
object immediately reuses the same interface structure; the interface is
made so that it can access to the actual object's properties or to its
compiled version based on the interface's signature (this is only a
superficial description, more details are documented, but the principle is
that it does not need to compile code for each object or each time one of
its properties is set/modified/removed).

Many objects share the same interface and it's actually rare that new
interfaces will need to be created, so JIT compilation of new interfaces is
rare (except at start of an entiorely new script which defines new
objects), JIT compilation then only occurs the first time a method in the
object is accessed, then the compilation remains cached in the interface
object (in some cases, these cached entries may still be freed when needed
because the cache stores these precompiled binary fragment using weak
pointers so that the cache will not necessarily be permanent when the
object is created once, a method is compiled, called once but (almost)
never reused later and the VM is short of memory: these compiled fragments
are then garbage-collectable using some cache eviction strategy.

(I've not studied which strategy is used, but Google is probably aware that
the simple global LRU eviction strategy is now a severe security risk, and
may isolate these caches with one for each thread; other isolation
mechanism are possible to avoid time-based attacks between threads not
working within the same security context and there are certainly security
contexts which may include multiple threads with the same privileges, where
such separation of caches is not necessary as it would cost a lot in terms
of global memory usage, allowing possible DOS attacks). Since V8, the
engine has had several new major versions to refine how it works.

But the principles is there: the dynamic type of any object is converted to
a set of static types determined by the last state of an object. The
polymorphic object at one time may be in one type and then another, but
genreally each object only has a small finite set of possible actual static
types it can "adopt" during its lifetime. And the JIT is able to
automatically determine the signature of each type and cache as many
compiled versions of its methods as needed, and only compile what is
needed, method by method (and not necessarily all properties of the object
at once).

The compiler is also smart enough to not precompile a method if it does not
get reused at all: the first invokation of the method can just mark that
method to be compiled on next invokation, but then it can be interpreted (I
think that it is more granular than just a single method, it may precompile
smaller fragments, such as separate conditional branches, or loops, so that
only the first loop or fur use of the branch will be interpreted, then the
second use will be compiled; the compiler may also work in a background
thread instead of being blocking, using a cache of candidate fragments: the
compiler can do its work in the background without blocking the actual
threads which can continue running immediately in interpreted mode after
"suggesting" rather than "instructing" the compiler to transform the code
which may be a mix of interpreted virtual opcodes and native instructions
that are inserted to progressively replace some fragments in interpreted
mode)

The V8 engine is opensourced. You can see a summary description of how it
works on
http://thibaultlaurens.github.io/javascript/2013/04/29/how-the-v8-engine-works/
However the cache itself is not described and may need further inspection
because it can become the target of time-based attacks. In reality there
are at least two compilers, one is non-optimizing but produces code that is
automatically profiled, and then a second more complex compiler can run to
perform optimizations which cannot be made immediately (such as branch
prediction, or inlining called methods, and then detecting tests that are
always true/false to eliminate dead code and detect new constant
subexpressions, or moving common subexpressions out of loops, or better
scheduling the allocation of native registers and compact the set of
upvalues/local variables in stack with better placement to improve data
locality and maximize the efficiency of caches, or find way to parallelize
the instructions into more pipelines with a minimum "idle" states and less
contention between them, if their execution cause them to require acces to
some limited shared resources, like internal ALUs/FPUs, or external bus
ports for I/O and L2/L3/memory accesses).

An "optimizing" compiler is a real challenge as it exposes many risks that
are much harder to check and secure (especially with very complex
instructions sets like x86 and x64, where the actual implementation in
silicon varies a lot across CPU versions or manufacturers).
Post by Coda Highland
On Mon, Nov 26, 2018 at 4:41 PM Dibyendu Majumdar
Post by Dibyendu Majumdar
Post by Coda Highland
On Mon, Nov 26, 2018 at 1:33 PM Dibyendu Majumdar
Post by Dibyendu Majumdar
I think it would be nice to have two VMs built-in - a full featured
one and a cut-down one, with user being able to choose the one they
want to use. But it is harder to switch between dual number type to a
single number type.
Have you considered doing the work at a different level? One common
bytecode format, one common VM, two parsers? Or possibly even a
source-to-source transpiler that compiles full-Lua down to mini-Lua?
Hi, that would not solve the problem as the problem I am trying to
solve is to have simpler code to execute with less unpredictable
branches. A common VM would have to handle the worst case. However
what I could do is fall back to the full featured VM when a type check
fails. This would cause a small performance hit for Lua code that
relies upon the bigger feature set, but in theory this cost can be
minimized by black listing the function so that next time it goes
immediately to the fallback VM.
That's consistent with the polymorphic variation I was describing.
LuaJIT tries to be a little bit more forgiving about it. If it fails
the type check, it goes ahead and JITs it again with the new types. If
it fails AGAIN (I don't know exactly how many possible variations it
allows), it gives up and assumes it's a megamorphic function and just
always sends it through the slow-and-steady route.
Assuming your two VMs are still able to operate on the same in-memory
data structures so execution can freely switch between them, this
isn't an unreasonable idea at all.
/s/ Adam
Coda Highland
2018-11-27 12:55:07 UTC
Permalink
Post by Philippe Verdy
It's interesting to know how Google's V8+ engine handles Javascript's
dynamic types: for each object defined or each time a property is
added/modified/removed, it checks its datatype and creates/updates an
"interface" internal structure; all new objects created from the same
object immediately reuses the same interface structure; the interface is
made so that it can access to the actual object's properties or to its
compiled version based on the interface's signature (this is only a
superficial description, more details are documented, but the principle is
that it does not need to compile code for each object or each time one of
its properties is set/modified/removed).
Many objects share the same interface and it's actually rare that new
interfaces will need to be created, so JIT compilation of new interfaces is
rare (except at start of an entiorely new script which defines new
objects), JIT compilation then only occurs the first time a method in the
object is accessed, then the compilation remains cached in the interface
object (in some cases, these cached entries may still be freed when needed
because the cache stores these precompiled binary fragment using weak
pointers so that the cache will not necessarily be permanent when the
object is created once, a method is compiled, called once but (almost)
never reused later and the VM is short of memory: these compiled fragments
are then garbage-collectable using some cache eviction strategy.
(I've not studied which strategy is used, but Google is probably aware
that the simple global LRU eviction strategy is now a severe security risk,
and may isolate these caches with one for each thread; other isolation
mechanism are possible to avoid time-based attacks between threads not
working within the same security context and there are certainly security
contexts which may include multiple threads with the same privileges, where
such separation of caches is not necessary as it would cost a lot in terms
of global memory usage, allowing possible DOS attacks). Since V8, the
engine has had several new major versions to refine how it works.
But the principles is there: the dynamic type of any object is converted
to a set of static types determined by the last state of an object. The
polymorphic object at one time may be in one type and then another, but
genreally each object only has a small finite set of possible actual static
types it can "adopt" during its lifetime. And the JIT is able to
automatically determine the signature of each type and cache as many
compiled versions of its methods as needed, and only compile what is
needed, method by method (and not necessarily all properties of the object
at once).
The compiler is also smart enough to not precompile a method if it does
not get reused at all: the first invokation of the method can just mark
that method to be compiled on next invokation, but then it can be
interpreted (I think that it is more granular than just a single method, it
may precompile smaller fragments, such as separate conditional branches, or
loops, so that only the first loop or fur use of the branch will be
interpreted, then the second use will be compiled; the compiler may also
work in a background thread instead of being blocking, using a cache of
candidate fragments: the compiler can do its work in the background without
blocking the actual threads which can continue running immediately in
interpreted mode after "suggesting" rather than "instructing" the compiler
to transform the code which may be a mix of interpreted virtual opcodes and
native instructions that are inserted to progressively replace some
fragments in interpreted mode)
The V8 engine is opensourced. You can see a summary description of how it
works on
http://thibaultlaurens.github.io/javascript/2013/04/29/how-the-v8-engine-works/
However the cache itself is not described and may need further inspection
because it can become the target of time-based attacks. In reality there
are at least two compilers, one is non-optimizing but produces code that is
automatically profiled, and then a second more complex compiler can run to
perform optimizations which cannot be made immediately (such as branch
prediction, or inlining called methods, and then detecting tests that are
always true/false to eliminate dead code and detect new constant
subexpressions, or moving common subexpressions out of loops, or better
scheduling the allocation of native registers and compact the set of
upvalues/local variables in stack with better placement to improve data
locality and maximize the efficiency of caches, or find way to parallelize
the instructions into more pipelines with a minimum "idle" states and less
contention between them, if their execution cause them to require acces to
some limited shared resources, like internal ALUs/FPUs, or external bus
ports for I/O and L2/L3/memory accesses).
An "optimizing" compiler is a real challenge as it exposes many risks that
are much harder to check and secure (especially with very complex
instructions sets like x86 and x64, where the actual implementation in
silicon varies a lot across CPU versions or manufacturers).
Actually, V8 doesn't have an interpreter. It always compiles. It has a
dumb, fast one that doesn't optimize at all that it uses for the first run
of everything, and a slow, smart one that it uses to optimize hot spots.

/s/ Adam
Gé Weijers
2018-11-25 23:28:31 UTC
Permalink
Post by Dibyendu Majumdar
So I am thinking of experimenting with a minified Lua - regress back
* Eliminate metamethods except for __gc.
* No more coroutines
So back to only one number type. As you can imagine this would mean
a + b
You would still have to do a type check before adding the values,
because you may be adding a table to nil in stead of two numbers. Why
not call a runtime system routine that implements the full add
functionality if the type check fails?
Post by Dibyendu Majumdar
Now, I do not know how often people implement operator overloading -
but given some languages do not have this feature it may not be a big
loss.
I have used it in a module implementing date and calendar arithmetic,
and in some other cases.
Post by Dibyendu Majumdar
I personally can do without coroutines too, I am not sure how widely
they are used; but the trade off is that we eliminate a bunch of
complexity from mini Lua.
Coroutines are very useful in event driven programs, I use them all
the time to implement protocols/state machines. You can use callbacks
like you see in Javascript programs, but coroutines make the code much
more readable, the code is not littered with lambdas.

If removing complexity from the language adds complexity to the
programs written in the language you may be going down the wrong path.

An implementation of “mini Lua” would not be all that useful to me,
Lua 5.x has just about the right compromise of simplicity and
expressive power for me, and I can add any missing pieces through the
C interface. I’m currently using Lua and some custom C code for
in-house test scripts that interface to prototype hardware, and it’s
using about 1% of 1 CPU, so a JIT compiler would not be particularly
useful to have in this case.


Dibyendu Majumdar
2018-11-26 19:46:48 UTC
Permalink
Post by Gé Weijers
Post by Dibyendu Majumdar
So back to only one number type. As you can imagine this would mean
a + b
You would still have to do a type check before adding the values,
because you may be adding a table to nil in stead of two numbers. Why
not call a runtime system routine that implements the full add
functionality if the type check fails?
It seems compilers are happier if your code is transparent, i.e.
compiler can see what is going on. The moment you call a function, it
goes into a blackhole so to speak.
You are right that type checks are still needed, But they can of the form:

if (isnum(a) && isnum(b))
c = a + b
else goto error;
if (isnum(d) && isnum(f))
e = d / f
else goto error;

The compiler can see what is happening and having a single error
handling block helps too.
Post by Gé Weijers
If removing complexity from the language adds complexity to the
programs written in the language you may be going down the wrong path.
There is always a tradeoff otherwise why use Lua? Why not use the most
powerful language available. I am considering whether it is possible
to have a dual VM system, a full featured VM and a cutdown VM,
selectable by user.
Post by Gé Weijers
An implementation of “mini Lua” would not be all that useful to me,
Lua 5.x has just about the right compromise of simplicity and
expressive power for me, and I can add any missing pieces through the
C interface. I’m currently using Lua and some custom C code for
in-house test scripts that interface to prototype hardware, and it’s
using about 1% of 1 CPU, so a JIT compiler would not be particularly
useful to have in this case.
To be honest as I have posted before Lua's performance is usually
fine; I have not had any issues. But it does seem that LuaJIT has made
people think performance matters. It does maybe in some restricted use
cases where Lua is used almost like a proxy for C. But looking at
JavaScript as an example, performance and JIT compilation has no doubt
been part of the reason for its success. The problem with Lua is how
to achieve the same level of performance while keeping to the 'spirit
of Lua' - i.e. small and beautiful rather than large and ugly
implementation. LuaJIT is one way - and might have been the only way
if the effort to understand and maintain it were in the realms of what
is human.

Regards
Dibyendu
Dirk Laurie
2018-11-26 05:00:07 UTC
Permalink
Op So. 25 Nov. 2018 om 19:40 het Dibyendu Majumdar
[snip - TLDR]
We've been here before.
http://lua-users.org/lists/lua-l/2018-07/msg00700.html
I'll just repeat a remark I made then and not play along again this time.

Look at the specs of Lua 4.0 to find a quite decent little language
with far fewer features than Lua 5.3 has.
Hugo Musso Gualandi
2018-11-27 14:10:28 UTC
Permalink
Actually, V8 doesn't have an interpreter. It always compiles. It has a dumb, fast one that doesn't optimize at all that it uses for the first run of everything, and a slow, smart one that it uses to optimize hot spots.
These Javascript JITs change all the time. Nowadays v8 uses an interpreter as its first stage. One of the biggest reasons for the change was to reduce memory usage (the machine code generated by the baseline compiler was much larger than the equivalent interpreter bytecodes)

https://v8.dev
Javier Guerra Giraldez
2018-11-27 17:27:07 UTC
Permalink
On Tue, 27 Nov 2018 at 14:10, Hugo Musso Gualandi
Post by Hugo Musso Gualandi
These Javascript JITs change all the time. Nowadays v8 uses an interpreter as its first stage.
heh, i remember when V8 was introduced, they bragged how it "compiles
everything straight to optimized machine code" on first load. but
yes, it has to change continuously to keep up.

mozilla also has done a _lot_ of variations of JIT, classic function,
tracing, metatracing, procedure-based (again), pseudoclasses, phased
optimizations.... all while adopting asm.js and turning it from a bad
joke (hey, i can compile C to JS!), to a nice VM in WASM....

there's no limit on what people are willing to do to keep using JS
without actually writing JS!
--
Javier
Philippe Verdy
2018-11-27 23:56:18 UTC
Permalink
Well, it's just a proof that Javascript is extremely well designed and
powerful as it allows supporting any existing language, and even
virtualization (it can also virtualize itself) independantly of how the
native machine was actually built. This means that we'll see now machines
specially optimized to support Javascript almost natively insteald of being
built natively to give natural support to C. This could also change the
paradigm about how processors are specially configured in their firmware:
it could be as well reconfigured instantly or dynamically to support
multiple instruction sets.

Virtual machines are now the goals in many new development: because
Javascript is still much easier to formalize, it can be used to detect bugs
in software that would be hard to find manually.
Lua is a good language but for now it still does not propose a very good
implementation of its VM. And may be, instead of porting Lua to native C,
it would be highly valuable to port it to sucessful VMs : for Javascript
(V8, Mozilla), Java (Oracle JVM, or Google's VM for Android), Python...
When we know that all these VMs can also integrate the other ones.

The Lua VM however still has some progresses to do: its model for
pseudo-"threads" (coroutines) is limited to cooperation, and still does not
work really with full reentrance of Lua programs themselves, even if there
can be several Lua engines working in true threads, but completely
separately, it is still not reentrant except for its integration on a true
multithreading system, because it does not have native support for basic
elements: notably mutexes, or critical sections to build mutexes and create
atomic operations that annot be broken and splitted by competing
(non-cooperating) threads.

The cooperatiing-only model suffers of a wellknown problem: Lua threads can
bring competing thread to starvation, stealing all resources for itself (so
it is highly exposed to DOS attacks). As well Lua still does not allow
limiting the resources allocated to a coroutine (the only limit is their
initial stack size, and we know now that objects can be allocated in the
heap without real limits to each one, by giving them a limit to their own
"local" heap)

As well there's no limit on the number of "threads"/coroutines a single
"thread"/coroutine can create, this is not controled by the parent thread
creating a new thread) they should be created or configured in their
"*State" by using their own dedicated allocator before yielding to them,
and no thread should be allowed to change these limits for themselves.
However this should be possible by using "void lua_setallocf (lua_State *L,
lua_Alloc f, void *ud);" just after using "lua_State *lua_newthread
(lua_State *L);" so that the parent thread can enforce these limits (only
the child thread will fail if its allowed resources are exhausted, and in
such case the configured allocator should be able to given control to other
threads so they are no longer blocked, so all other threads will continue
without being affected. The child thread will have to use "pcall()" to
recover from these memory exhaustion or in case the allocator returns nil
to them causeing them to call error() and then exit if there's no other
error handler (i.e. a parent pcall() in their stack).

Finally the GC will collect back all these resources, trying to finalize
them (it may run in an infinite loop unless the parent thread has
configured the allocator to delay repeated resume() calls with an
exponential time between each GC cycle for the same "dead" thread; the loop
can then be controled, the dead thread will rapidly become permanently
blocked, the OS will page out this unused memory and nothing worse will
happen, until the Lua's host process is terminated).

How can we control the resources used in a child thread/coroutine; we can
use a C function like this:

typedef void * (*lua_Alloc) (void *ud, void *ptr, size_t osize, size_t
nsize);
"The type of the memory-allocation function used by Lua states. The
allocator function must provide a functionality similar to realloc, but not
exactly the same. Its arguments are ud, an opaque pointer passed to
lua_newstate; ptr, a pointer to the block being
allocated/reallocated/freed; osize, the original size of the block or some
code about what is being allocated; and nsize, the new size of the block.
When ptr is not NULL, osize is the size of the block pointed by ptr, that
is, the size given when it was allocated or reallocated. When ptr is NULL,
osize encodes the kind of object that Lua is allocating. osize is any of
LUA_TSTRING, LUA_TTABLE, LUA_TFUNCTION, LUA_TUSERDATA, or LUA_TTHREAD when
(and only when) Lua is creating a new object of that type. When osize is
some other value, Lua is allocating memory for something else."

It is possible from C only, but I don't find any equivalent API in the
standard Lua library that can be used from Lua instead of C (no equivalent
in the "coroutine.*" package or in metamethods for threads according to
https://www.lua.org/work/doc/manual.html#2.4).

We should be able to configure in Lua the thread object returned by "thread
coroutine.create(function f)" to set its allocator or a metamethod that
will be called each time the thread (will try to allocate something). The
only thing we can configure for now is its "__gc" metamethod for finalizers.

We need some "__alloc" metamethod used by the Lua VM just before
effectively using the parent thread's Allocator, or that will be used by
the default Allocator C function configured in the thread's "State", even
before really allocating the object and initializing it. That metamethod in
Lua should receive at least the three parameters "ptr, osize, nsize"
specifying the reference of the object being reallocated/freed (or nil
otherwise for new objects), and the old and new sizes (when the first
parameter is nil, the second specifies the type of object being allocated:
"string", "table", "function", "userdata" or "thread": normally strings,
functions, and threads cannot be reallocated but only freed, so the old
size is not significant for them, only the new size 0 is indicating they
are being freed; but for tables or userdata, the old size makes sense as
well as the new size for all of them, which will be 0 when freeing objects
before finalizing theml with the "__gc" metamethod). This "__alloc"
metamethod cannot perform the actual allocation or freeing, it just has to
return a boolean status indicating if the object (still not initialized
when oldsize is 0) will be allocated by the parent, or if the parent
allocator will not be used and a "nil" value or error() will be returned to
the child thread (if this "__alloc" metamethod allowed the allocation, then
the parent's Allocator may still fail to allocate the object, in which case
it must call once again the "_alloc" metamethod to notify that the desired
size for the object was in fact not allocated and must no longer count as
used resources).

The "__alloc" metamethod should be useful mostly for "thread" objects but
should be used as well for "string", "table", "function", "userdata";
however a "string" has no metatable: it is instead stored in an index
position of a "table" storing all strings, but we have a way to still
assign metamethods to strings and userdata objects using
"debug.setmetatable (string/userdata/thread/function, metatable)" instead
of "setmetatable (table, metatable)" and then measure and control and limit
their allocated sizes in all cases.

Another thing missing in Lua thread states is a status "paused" which
allows a parent thread to know that it must not resume() a thread because
its delay before running again is not passed. But this delay can be
configured and tested in the metatable of the thread (using the debug
library to define it).

An alternative would be to set a custom "registry" (see
https://www.lua.org/work/doc/manual.html#4.5) instead of the (debug)
metatable associated to each thread, but this can only work at global level
to provide some defaults for all threads if the "__alloc" and "__paused"
metamethods are not defined in their metatable; the registry cannot be
directly used in Lua in a safe way to properly isolate child threads
between each other. The registry is reserved for use in the C environment
in which the Lua VM instance is configured and running, and it should not
be exposed to Lua programs.
Post by Javier Guerra Giraldez
On Tue, 27 Nov 2018 at 14:10, Hugo Musso Gualandi
Post by Hugo Musso Gualandi
These Javascript JITs change all the time. Nowadays v8 uses an
interpreter as its first stage.
heh, i remember when V8 was introduced, they bragged how it "compiles
everything straight to optimized machine code" on first load. but
yes, it has to change continuously to keep up.
mozilla also has done a _lot_ of variations of JIT, classic function,
tracing, metatracing, procedure-based (again), pseudoclasses, phased
optimizations.... all while adopting asm.js and turning it from a bad
joke (hey, i can compile C to JS!), to a nice VM in WASM....
there's no limit on what people are willing to do to keep using JS
without actually writing JS!
--
Javier
Tim Hill
2018-11-28 01:55:50 UTC
Permalink
Well, it's just a proof that Javascript is extremely well designed and powerful as it allows supporting any existing language, and even virtualization (it can also virtualize itself) independantly of how the native machine was actually built. This means that we'll see now machines specially optimized to support Javascript almost natively insteald of being built natively to give natural support to C. This could also change the paradigm about how processors are specially configured in their firmware: it could be as well reconfigured instantly or dynamically to support multiple instruction sets.
I’m not sure I (or many others) would agree that “Javascript is extremely well designed and powerful” [1]. And in general attempts to add language-specific optimizations to CPU instruction sets have rarely fared well (e.g. Java native bytecode on ARM). The whole point (to my mind) of RISC was to move to a model where a very streamlined CPU + low-latency L1 cache was, in effect, an engine for efficient execution of VMs like Lua’s.

In fact we were able to realize this very thing in our last project with Lua, in which pretty much the whole of the Lua VM, libraries, CPU stack and current set of Lua registers (stack frame) were all held in L1 cache on the CPU. In effect, the CPU at that point *IS* micro-coded for the Lua VM, and we saw spectacular performance from the VM as a result.

—Tim

[1] https://www.destroyallsoftware.com/talks/wat <https://www.destroyallsoftware.com/talks/wat>
Muh Muhten
2018-11-28 07:29:45 UTC
Permalink
Post by Tim Hill
In fact we were able to realize this very thing in our last project with
Lua, in which pretty much the whole of the Lua VM, libraries, CPU stack and
current set of Lua registers (stack frame) were all held in L1 cache on the
CPU. In effect, the CPU at that point *IS* micro-coded for the Lua VM, and
we saw spectacular performance from the VM as a result.
Wow! How much tuning did it take to get all that to fit in the L1? I'm
guessing you could strip down much of the libraries and wouldn't need
e.g. the parser at runtime, but just the VM alone looks like it'd
almost fill the cache without some extra work.
Philippe Verdy
2018-11-28 08:37:23 UTC
Permalink
I think that the L1 argument is valid when Lua is used with a JIT compiler,
so that the native instruction set is extremely easy to streamline without
extra overheads.
However full support of Lua semantics on a CPU still requires typechecking
and bound checking at almost every instruction (something that a CPU
instruction set does not perform usually, except possibly bound checking in
a limited way, but not at all type checking).

Type checking for Lua does not require much space in L1 code cache (even if
it will be a significant constant cost), but it involves a cost in terms of
branch prediction and then indirectly on the L1 data cache. To soplve this
cost would require a much smarter optimizing JIT compiler capable of doing
itself what the CPU has to guess (on CISC CPU) or just decide arbitrarily
(on RISC machines)

If security is a concern, branch prediction has to be taken into account
and instead of "guessing" or trying to be accurate (~20% vs. ~80% on RISC)
or using assumption (=0% vs. =100% on CISC), it will be safer to just use
pure randomized branch prediction (~50% vs. ~50% between branching vs. no
branch: a strong random generator will be needed in processors, because a
simple "binary toggle" is very easy to exploit).

And I don't see how a native CPU instruction set can easily implement such
strategy without additional software and hardware support (a solid random
generator would be very costly to implement by software only for every
conditional branch in the compiled code, it would most probably fill up the
L1 code cache or the L1 data cache or both; performance would become
extremely poor, as if there were no L1 cache at all). This could change if
the CPU had a native instruction to return a strong random number from an
internal source (possibly compbining several things like high-performance
counters, temperature sensors, or measurement of chaotic physical responses
in clock generators and amplifiers, such as stabilization currents when a
numeric gate switch states; a military-grade machine could use an external
atomic source or an electron beam accelerator with electron counter, or a
tunnel-effect transistor, or a zener diode; in fact modern CPUs are so much
integrated that they constantly have to "fight" chaotic effects, so there
are good random sources everywhere and instead of filtering and dropping
them, they could be detected and used to feed a register accessible
directly by a usable instruction...)
Post by Muh Muhten
Post by Tim Hill
In fact we were able to realize this very thing in our last project with
Lua, in which pretty much the whole of the Lua VM, libraries, CPU stack
and
Post by Tim Hill
current set of Lua registers (stack frame) were all held in L1 cache on
the
Post by Tim Hill
CPU. In effect, the CPU at that point *IS* micro-coded for the Lua VM,
and
Post by Tim Hill
we saw spectacular performance from the VM as a result.
Wow! How much tuning did it take to get all that to fit in the L1? I'm
guessing you could strip down much of the libraries and wouldn't need
e.g. the parser at runtime, but just the VM alone looks like it'd
almost fill the cache without some extra work.
Gé Weijers
2018-11-28 08:33:52 UTC
Permalink
On Tue, Nov 27, 2018, 17:56 Tim Hill <***@gmail.com wrote:

I’m not sure I (or many others) would agree that “Javascript is extremely
Post by Tim Hill
well designed and powerful” [1]. And in general attempts to add
language-specific optimizations to CPU instruction sets have rarely fared
well (e.g. Java native bytecode on ARM). The whole point (to my mind) of
RISC was to move to a model where a very streamlined CPU + low-latency L1
cache was, in effect, an engine for efficient execution of VMs like Lua’s.
Another example of custom architectures that did not survive:
The LMI and Symbolics Lisp machines became obsolete when Lisp
implementations on commodity microprocessors started to outperform them, at
only a small fraction of the price.


Gé
Philippe Verdy
2018-11-28 08:51:03 UTC
Permalink
Price was a factor influenced by the market, because niche processors are
necessarily more costly.
This can still change very rapidly if there's a market need for such
processors, that would fill a new need (notably for safe virtualization):
this could start by CPUs made for application servers on the web (which can
support a higher initial cost per unit, but rapidly what is available on
servers would be used on smaller devices and in home and mobile usages
(which now depend more on distributed and cloud services).
We're reaching a point were pure local implementations is not viable (and
economically not the most cost effective solution) to get the performance
we need in modern applications (which now need scalability and hardware
redundancy). Mobile computing also cannot use infinite amounts of energy,
they need support from other computing units available via a fast network
(but not at the price of security: servers will need to be the first
equiped systems to support safe and efficient virtualization)
The traditional stack-based CPUs (with more or less native registers) can
be easily replaced, because they can be emulated completely by
virtualization, without major impact on existing softwares written for them.
Post by Tim Hill
I’m not sure I (or many others) would agree that “Javascript is extremely
Post by Tim Hill
well designed and powerful” [1]. And in general attempts to add
language-specific optimizations to CPU instruction sets have rarely fared
well (e.g. Java native bytecode on ARM). The whole point (to my mind) of
RISC was to move to a model where a very streamlined CPU + low-latency L1
cache was, in effect, an engine for efficient execution of VMs like Lua’s.
The LMI and Symbolics Lisp machines became obsolete when Lisp
implementations on commodity microprocessors started to outperform them, at
only a small fraction of the price.
Gé
Loading...