Discussion:
NaN trick
Roberto Ierusalimschy
2011-07-06 14:45:09 UTC
Permalink
Lua 5.2 beta implements the "NaN trick": it packs all Lua values into a
single double value by representing non-numeric values as signaled NaNs,
which are (usually) not produced by the system.

This trick should work on most 32-bit machines that follow IEEE
754-2008 for double representation. However, currently it is being
enabled only by predefined macro __i386__ (and similars). We would
like to know what other platforms could benefit from this trick
(and what macros should we check in luaconf.h). You can force the
trick by compiling Lua with -DLUA_NANTRICKLE (for little endian
systems) or -DLUA_NANTRICKBE (for big endian).

(Of course, we would also like to know whether there are problems
with this implementation.)

-- Roberto
Michael Rose
2011-07-06 15:57:06 UTC
Permalink
The NaN trick is very nice!

I think it should also work on current AMD 64bit platforms. This would reduce a Lua value
from 16 bytes down to 8 bytes on the Lua stack and within tables!

As far as I've learned, the current 64bit processors for these platforms only use 48 bit's for a pointer
(the highest bit number 47 -- counting from 0 -- has to be sign extended to 64 bit, dividing the virtual
ram into 2^47 bytes of user data and 2^47 bytes of system data), because the address bus is
only 48 bits long.
Mike Pall uses this NaN trick in Luajit as well and he packs pointers for light user data
into a double with just enough bits left to represent all the different Lua types.
Because he restricts Lua pointers to be within the lower Gig of memory (32 bit),
he only needs this trick for external C pointers, but it could be as well applied to
all pointers Lua has to work with.
For details have a look into his sources, especially into "lj_obj.h".

Regards,

Michael Rose

-----Ursprüngliche Nachricht-----
Von: "Roberto Ierusalimschy" <***@inf.puc-rio.br>
Gesendet: 06.07.2011 16:45:09
An: "Lua list" <lua-***@lists.lua.org>
Betreff: NaN trick
Post by Roberto Ierusalimschy
Lua 5.2 beta implements the "NaN trick": it packs all Lua values into a
single double value by representing non-numeric values as signaled NaNs,
which are (usually) not produced by the system.
This trick should work on most 32-bit machines that follow IEEE
754-2008 for double representation. However, currently it is being
enabled only by predefined macro __i386__ (and similars). We would
like to know what other platforms could benefit from this trick
(and what macros should we check in luaconf.h). You can force the
trick by compiling Lua with -DLUA_NANTRICKLE (for little endian
systems) or -DLUA_NANTRICKBE (for big endian).
(Of course, we would also like to know whether there are problems
with this implementation.)
-- Roberto
___________________________________________________________
Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://produkte.web.de/go/toolbar
Peter Cawley
2011-07-06 16:24:54 UTC
Permalink
Post by Michael Rose
I think it should also work on current AMD 64bit platforms. This would reduce a Lua value
from 16 bytes down to 8 bytes on the Lua stack and within tables!
Though you then need to get slightly more creative about encoding the
type. A double has 52 bits of mantissa; with 48 taken for a
pointer-sized value, and 1 taken to ensure a quiet NaN, you only have
3 bits left to encode the type. If you borrow the sign bit as well,
then you have 4 bits. For reference, the current implementation in 5.2
beta uses 6 bits for encoding type.
David Given
2011-07-06 16:38:16 UTC
Permalink
Peter Cawley wrote:
[...]
Post by Peter Cawley
Though you then need to get slightly more creative about encoding the
type. A double has 52 bits of mantissa; with 48 taken for a
pointer-sized value, and 1 taken to ensure a quiet NaN, you only have
3 bits left to encode the type. If you borrow the sign bit as well,
then you have 4 bits. For reference, the current implementation in 5.2
beta uses 6 bits for encoding type.
If the thing you're pointing at is 4-aligned, then you get two free bits
at the bottom of the pointer. If the thing you're pointing at has been
allocated with malloc(), then you get more --- the result of malloc() is
guaranteed to be correctly aligned for any kind of variable, which in
practice usually gives you four bits.
--
┌───  ───── http://www.cowlark.com ─────
│ "I have always wished for my computer to be as easy to use as my
│ telephone; my wish has come true because I can no longer figure out
│ how to use my telephone." --- Bjarne Stroustrup
Peter Cawley
2011-07-06 16:44:47 UTC
Permalink
Post by David Given
[...]
Post by Peter Cawley
Though you then need to get slightly more creative about encoding the
type. A double has 52 bits of mantissa; with 48 taken for a
pointer-sized value, and 1 taken to ensure a quiet NaN, you only have
3 bits left to encode the type. If you borrow the sign bit as well,
then you have 4 bits. For reference, the current implementation in 5.2
beta uses 6 bits for encoding type.
If the thing you're pointing at is 4-aligned, then you get two free bits
at the bottom of the pointer. If the thing you're pointing at has been
allocated with malloc(), then you get more --- the result of malloc() is
guaranteed to be correctly aligned for any kind of variable, which in
practice usually gives you four bits.
Light userdata need not be aligned, light C functions need not be
aligned, and memory for garbage collectable objects can come from a
custom allocator, which might give unaligned addresses.
David Kastrup
2011-07-06 16:45:16 UTC
Permalink
Post by David Given
[...]
Post by Peter Cawley
Though you then need to get slightly more creative about encoding the
type. A double has 52 bits of mantissa; with 48 taken for a
pointer-sized value, and 1 taken to ensure a quiet NaN, you only have
3 bits left to encode the type. If you borrow the sign bit as well,
then you have 4 bits. For reference, the current implementation in 5.2
beta uses 6 bits for encoding type.
If the thing you're pointing at is 4-aligned, then you get two free bits
at the bottom of the pointer. If the thing you're pointing at has been
allocated with malloc(), then you get more --- the result of malloc() is
guaranteed to be correctly aligned for any kind of variable, which in
practice usually gives you four bits.
Since an 80x86 as far as I know will accept any variable at _byte_
boundaries, the result of malloc is not guaranteed to be of any
granularity coarser than a byte. Actual implementations will likely
offer smaller granularity. But since the native floating point format
of the x86 is _10_ bytes, a granularity coarser than 2 bytes will waste
storage in full precision floating point arrays.
--
David Kastrup
Michael Rose
2011-07-06 23:54:56 UTC
Permalink
Post by Peter Cawley
Post by Michael Rose
I think it should also work on current AMD 64bit platforms. This would reduce a Lua value
from 16 bytes down to 8 bytes on the Lua stack and within tables!
Though you then need to get slightly more creative about encoding the
type. A double has 52 bits of mantissa; with 48 taken for a
pointer-sized value, and 1 taken to ensure a quiet NaN, you only have
3 bits left to encode the type. If you borrow the sign bit as well,
then you have 4 bits. For reference, the current implementation in 5.2
beta uses 6 bits for encoding type.
Of course You have to compress the type information a bit to be successful.

NaN's to be abused are restricted to be
"FFF8 0000 0000 0000" + X or
"7FF8 0000 0000 0000" + X with X > 0
according to the IEEE specs.

The topmost sign bit 63 "s" can be used, as well as bits 48-50 "ttt"
Bit 47 could be used for all GC data, because they are never system
pointers.
All immediate values which use only 32 data bits could
also be combined under one major type number, because they
can be distinguished by minor type numbers in the bit range
32-47.
If allocators could be restricted to some alignment, also bits 0-2
are useable, but if generality is of more concern (especially to not
restrict custom allocators), these bits are not available.

s ttt
0 000 LUA_TNUMBER
0 001 LUA_TNIL
0 010 LUA_TBOOLEAN
0 011 LUA_TLIGHTUSERDATA
1 000 LUA_TSTRING
1 001 LUA_TTABLE
1 100 LUA_TFUNCTION LUA_TLCL
1 101 LUA_TFUNCTION LUA_TLCF
1 110 LUA_TFUNCTION LUA_TCCL

Some free combinations 0100-0111, 1010-1011 are still available.
The sign bit indicates collectable data. In case of functions,
bits 48-49 codes the variant tag.
LUA_TNUMBER has coding 0 000, because then all other
data represent quiet NaN's, which are free to be abused
as indicated in the specification.

Regards,

Michael Rose
Peter Cawley
2011-07-07 14:08:18 UTC
Permalink
Post by Michael Rose
Bit 47 could be used for all GC data, because they are never system
pointers.
What if Lua was embedded into the kernel and hence got allocated some
memory in kernel space? I know of at least two cases of 5.1 being
embedded into a kernel.
Roberto Ierusalimschy
2011-07-07 14:30:19 UTC
Permalink
Post by Peter Cawley
What if Lua was embedded into the kernel and hence got allocated some
memory in kernel space? I know of at least two cases of 5.1 being
embedded into a kernel.
Lua embedded in the kernel does not use double for numbers. (Most
kernels do not allow floating-point manipulation to avoid having to save
the floating-point registers.)

-- Roberto
Miles Bader
2011-07-07 04:54:00 UTC
Permalink
Post by Peter Cawley
Post by Michael Rose
I think it should also work on current AMD 64bit platforms. This
would reduce a Lua value from 16 bytes down to 8 bytes on the Lua
stack and within tables!
Though you then need to get slightly more creative about encoding the
type. A double has 52 bits of mantissa; with 48 taken for a
pointer-sized value, and 1 taken to ensure a quiet NaN, you only have
3 bits left to encode the type. If you borrow the sign bit as well,
then you have 4 bits. For reference, the current implementation in 5.2
beta uses 6 bits for encoding type.
The way other systems seem to do this, is to divide types into "true
immediates", and "pointer + language heap obj".

The former must have their tag uniquely encoded using only the bits
available in immediate values, but the latter can have extra tag
information in the header of the "language heap obj" pointed to by the
pointer (and of course gc flags etc can also be stored there).

For Lua, there seem to be 6 "true immediate" types: number, nil,
boolean, lightuserdata, and lightcfunction (anything else?). So,
adding a 7th immediate tag for the "lua_heap_object" case (which
handles string, table, function, userdata, and thread types), that
would seem to require only 3 tag bits in immediates...

[nil and boolean could be combined using other bits in the immediate
to distinguish them, meaning 6 immediate tags are used, leaving 2 for
future use...]

-Miles
--
Abstainer, n. A weak person who yields to the temptation of denying himself a
pleasure. A total abstainer is one who abstains from everything but
abstention, and especially from inactivity in the affairs of others.
Patrick Donnelly
2011-07-07 05:14:16 UTC
Permalink
Post by Miles Bader
Post by Peter Cawley
Post by Michael Rose
I think it should also work on current AMD 64bit platforms. This
would reduce a Lua value from 16 bytes down to 8 bytes on the Lua
stack and within tables!
Though you then need to get slightly more creative about encoding the
type. A double has 52 bits of mantissa; with 48 taken for a
pointer-sized value, and 1 taken to ensure a quiet NaN, you only have
3 bits left to encode the type. If you borrow the sign bit as well,
then you have 4 bits. For reference, the current implementation in 5.2
beta uses 6 bits for encoding type.
The way other systems seem to do this, is to divide types into "true
immediates", and "pointer + language heap obj".
The former must have their tag uniquely encoded using only the bits
available in immediate values, but the latter can have extra tag
information in the header of the "language heap obj" pointed to by the
pointer (and of course gc flags etc can also be stored there).
For Lua, there seem to be 6 "true immediate" types:  number, nil,
boolean, lightuserdata, and lightcfunction (anything else?).  So,
adding a 7th immediate tag for the "lua_heap_object" case (which
handles string, table, function, userdata, and thread types), that
would seem to require only 3 tag bits in immediates...
[nil and boolean could be combined using other bits in the immediate
to distinguish them, meaning 6 immediate tags are used, leaving 2 for
future use...]
There are other types as well: LUA_TPROTO, LUA_TUPVAL, LUA_TDEADKEY
(see lobject.h). The latter would be an immediate value I would think.
--
- Patrick Donnelly
Axel Kittenberger
2011-07-07 18:39:59 UTC
Permalink
Post by Peter Cawley
Though you then need to get slightly more creative about encoding the
type. A double has 52 bits of mantissa; with 48 taken for a
pointer-sized value, and 1 taken to ensure a quiet NaN, you only have
3 bits left to encode the type. If you borrow the sign bit as well,
then you have 4 bits. For reference, the current implementation in 5.2
beta uses 6 bits for encoding type.
Dont you need just 1 bit to encode if its a pointer or something else.
Or 2 bits for the various different pointer types Lua has (light user
data vs "heavy" user data, etc.).

All other types who only need a 32bit data load have enough bits left
to get encoded what that load is.
Miles Bader
2011-07-08 04:18:15 UTC
Permalink
Post by Axel Kittenberger
Post by Peter Cawley
Though you then need to get slightly more creative about encoding the
type. A double has 52 bits of mantissa; with 48 taken for a
pointer-sized value, and 1 taken to ensure a quiet NaN, you only have
3 bits left to encode the type. If you borrow the sign bit as well,
then you have 4 bits. For reference, the current implementation in 5.2
beta uses 6 bits for encoding type.
Dont you need just 1 bit to encode if its a pointer or something else.
Or 2 bits for the various different pointer types Lua has (light user
data vs "heavy" user data, etc.).
There are a fair number of immediate types that contain pointers.

Many of them point to Lua internal objects (table, string, function,
etc), which can all share a header in the object that contains another
tag field to distinguish among them. However pointer immediates which
point to external objects need to have unique tags in the immediate
value.

[See my previous post in this thread for a bit more detail.]

-Miles
--
"Yorton, Wressle, and Gospel Oak, the richness of your heritage is ended.
We shall not stop at you again; for Dr Beeching stops at nothing."
Axel Kittenberger
2011-07-08 07:54:38 UTC
Permalink
Lua has 10 types (as far "none" is a type too?)

Nil and boolean and none can share one type. In the >48bit field.
#define LUA_TNONE (-1)
#define LUA_TNIL 0
#define LUA_TBOOLEAN 1

Number doesnt need an encoding therre, its everything non NAN, or NAN as in NAN.
#define LUA_TNUMBER 3

Leaves 6 types with pointers.

#define LUA_TLIGHTUSERDATA 2
#define LUA_TSTRING 4
#define LUA_TTABLE 5
#define LUA_TFUNCTION 6
#define LUA_TUSERDATA 7
#define LUA_TTHREAD 8

6+1 (for NIL; BOOLEAN and NONE) = 7, 3 bit suffice ;-)
Post by Miles Bader
Post by Axel Kittenberger
Post by Peter Cawley
Though you then need to get slightly more creative about encoding the
type. A double has 52 bits of mantissa; with 48 taken for a
pointer-sized value, and 1 taken to ensure a quiet NaN, you only have
3 bits left to encode the type. If you borrow the sign bit as well,
then you have 4 bits. For reference, the current implementation in 5.2
beta uses 6 bits for encoding type.
Dont you need just 1 bit to encode if its a pointer or something else.
Or 2 bits for the various different pointer types Lua has (light user
data vs "heavy" user data, etc.).
There are a fair number of immediate types that contain pointers.
Many of them point to Lua internal objects (table, string, function,
etc), which can all share a header in the object that contains another
tag field to distinguish among them.  However pointer immediates which
point to external objects need to have unique tags in the immediate
value.
[See my previous post in this thread for a bit more detail.]
-Miles
--
"Yorton, Wressle, and Gospel Oak, the richness of your heritage is ended.
We shall not stop at you again; for Dr Beeching stops at nothing."
Miles Bader
2011-07-08 08:20:48 UTC
Permalink
Post by Axel Kittenberger
6+1 (for NIL; BOOLEAN and NONE) = 7, 3 bit suffice ;-)
Lua 5.2 also has "light C function" (LUA_TLCF), which I think is an
immediate-only pointer type, so that needs its own immediate tag.
Also, as Patrick pointed out, there are 3 internal object types that
might need their own tags with pointers (LUA_TPROTO, LUA_TUPVAL,
LUA_TDEADKEY), but I'm a bit fuzzy on exactly what constraints they
have.

Luckily, though, all the "Lua heap object" types (string, table, function,
userdata, thread) can use one immediate tag value, with a separate tag
field in the Lua object header to distinguish them.

So the immediate tags might be:

LUA_IMM_TNUMBER 0
LUA_IMM_TMISCVAL 1 /* NIL, BOOL, NONE, using lower imm bits */
LUA_IMM_THEAPVAL 2 /* TABLE, STRING, FUNCTION, USERDATA, etc */
LUA_IMM_TLIGHTUSERDATA 3
LUA_IMM_TLIGHTCFUNC 4
...

And the tags in heap object headers:

LUA_HEAP_TTABLE 0
LUA_HEAP_TSTRING 1
LUA_HEAP_TFUNCTION 2
LUA_HEAP_TUSERDATA 3
LUA_HEAP_TTHREAD 4
...

I dunno offhand where TPROTO, UPVAL, and DEADKEY should go, but
there seems to be room either way...

-Miles
--
Cat is power.  Cat is peace.
Patrick Rapin
2011-07-07 08:20:48 UTC
Permalink
Post by Roberto Ierusalimschy
Lua 5.2 beta implements the "NaN trick": it packs all Lua values into a
single double value by representing non-numeric values as signaled NaNs,
which are (usually) not produced by the system.
Thank you for pointing this out. I previously saw references to such a
"NaN Trick", but without realizing what it means.
While trying to compare the memory usage of Lua 5.1 and 5.2, I
realized that the trick was not active on my build, because Microsoft
compilers do not define any of the tokens from (__i386__, __i386,
__X86__). So you should add _M_IX86 to that list.
Roberto Ierusalimschy
2011-07-07 12:34:32 UTC
Permalink
Post by Patrick Rapin
Thank you for pointing this out. I previously saw references to such a
"NaN Trick", but without realizing what it means.
While trying to compare the memory usage of Lua 5.1 and 5.2, I
realized that the trick was not active on my build, because Microsoft
compilers do not define any of the tokens from (__i386__, __i386,
__X86__). So you should add _M_IX86 to that list.
Thanks. We will add it.

-- Roberto
Loading...