Lua bytecodes and endian-ness

Discussion:

Peter Hull

2006-06-10 13:30:29 UTC

Dear all,
I'm trying to compile some Lua scripts with luac so that they will run
on both big and little endian machines (I want to make a Mac Universal
Binary for KQ[0]) I am using the latest 5.1.1 version of lua. I see
from older messages ([1], [2]) that Lua's bytecodes were
endian-independent. However, as far as I can tell from looking at the
code (specifically, LoadHeader in lundump.c) and trying to run the
compiled scripts, this is not the case.

Am I correct that Lua binary files are not endian-independent, and if
so, what's the best way to deal with this?

Thanks,

Peter

[0] http://kqlives.sourceforge.net/
[1] http://thread.gmane.org/gmane.comp.lang.lua.general/11787/focus=11790
[2] http://thread.gmane.org/gmane.comp.lang.lua.general/15684/focus=15725

Luiz Henrique de Figueiredo

2006-06-10 13:49:55 UTC

Permalink

Post by Peter Hull
Am I correct that Lua binary files are not endian-independent

Yes. See http://lua-users.org/lists/lua-l/2006-01/msg00024.html

Post by Peter Hull
so, what's the best way to deal with this?

See http://lua-users.org/lists/lua-l/2006-02/msg00507.html

--lhf

Peter Hull

2006-06-10 13:59:22 UTC

Permalink

Thanks, I understand now. I should have found those threads - I
suppose I just used the wrong search terms!

There is a mention of a more fully featured dump/undump. Is that being
worked on at the moment?

Pete

Post by Luiz Henrique de Figueiredo

Post by Peter Hull
Am I correct that Lua binary files are not endian-independent

Yes. See http://lua-users.org/lists/lua-l/2006-01/msg00024.html

Post by Peter Hull
so, what's the best way to deal with this?

See http://lua-users.org/lists/lua-l/2006-02/msg00507.html
--lhf

Luiz Henrique de Figueiredo

2006-06-10 14:10:04 UTC

Permalink

Post by Peter Hull
There is a mention of a more fully featured dump/undump. Is that being
worked on at the moment?

I'm working on that external tool for rewriting precompiled files right now.
No promises about when it'll be ready though...
--lhf

Greg McCreath

2006-06-10 22:19:11 UTC

Permalink

Hi All,

I didn't realise that endianess was an issue with lua. Actually, I
thought it to be quite the opposite - that it did not matter. We're
anticipating multiple client types communicating with a host and that
host delivering scripting in real time to the clients.

It sounds like we've got to have multiple compiled versions of code at
the host for each endian type then ... ?

We certainly want to pre-compile at the host, the clients are only low
resource units and we want to avoid client-side compilation.

Greg.

Greg McCreath
Chief Technical Officer
TAFMO Limited
ABN: 94 109 766 592

Level 8, 342 Flinders Street
Melbourne
Victoria, 3000
Australia

http://www.tafmo.com
Ph : +61 (0) 3 9018 6824
Fax : +61 (0) 3 9018 6899
Mobile : +61 (0) 401 988 957

-----Original Message-----
From: Luiz Henrique de Figueiredo [mailto:***@tecgraf.puc-rio.br]
Sent: Saturday, 10 June 2006 11:50 PM
To: Lua list
Subject: Re: Lua bytecodes and endian-ness

Post by Peter Hull
Am I correct that Lua binary files are not endian-independent

Yes. See http://lua-users.org/lists/lua-l/2006-01/msg00024.html

Post by Peter Hull
so, what's the best way to deal with this?

See http://lua-users.org/lists/lua-l/2006-02/msg00507.html

--lhf

This email and any files transmitted with it may be confidential and are intended solely for the use of the individual or entity to whom they are addressed. This email may contain personal information of individuals, and be subject to Commonwealth and/or State privacy laws in Australia. This email is also subject to copyright. If you are not the intended recipient, you must not read, print, store, copy, forward or use this email for any reason, in accordance with privacy and copyright laws. If you have received this email in error, please notify the sender by return email, and delete this email from your inbox.

Luiz Henrique de Figueiredo

2006-06-10 22:24:20 UTC

Permalink

Post by Greg McCreath
I didn't realise that endianess was an issue with lua.

It wasn't until 5.0 but this changed in 5.1. Sorry.

Post by Greg McCreath
It sounds like we've got to have multiple compiled versions of code at
the host for each endian type then ... ?

If endianness is the only difference in the clients, then you can use
the lundump.c that I posted earlier today.
--lhf

Greg McCreath

2006-06-10 22:30:45 UTC

Permalink

Thanks Luiz,

This might be a silly question, but why did such an important thing
change?

Greg.

Greg McCreath
Chief Technical Officer
TAFMO Limited
ABN: 94 109 766 592

Level 8, 342 Flinders Street
Melbourne
Victoria, 3000
Australia

http://www.tafmo.com
Ph : +61 (0) 3 9018 6824
Fax : +61 (0) 3 9018 6899
Mobile : +61 (0) 401 988 957

-----Original Message-----
From: Luiz Henrique de Figueiredo [mailto:***@tecgraf.puc-rio.br]
Sent: Sunday, 11 June 2006 8:24 AM
To: Lua list
Subject: Re: Lua bytecodes and endian-ness

Post by Greg McCreath
I didn't realise that endianess was an issue with lua.

It wasn't until 5.0 but this changed in 5.1. Sorry.

Post by Greg McCreath
It sounds like we've got to have multiple compiled versions of code at
the host for each endian type then ... ?

If endianness is the only difference in the clients, then you can use
the lundump.c that I posted earlier today.
--lhf

This email and any files transmitted with it may be confidential and are intended solely for the use of the individual or entity to whom they are addressed. This email may contain personal information of individuals, and be subject to Commonwealth and/or State privacy laws in Australia. This email is also subject to copyright. If you are not the intended recipient, you must not read, print, store, copy, forward or use this email for any reason, in accordance with privacy and copyright laws. If you have received this email in error, please notify the sender by return email, and delete this email from your inbox.

Luiz Henrique de Figueiredo

2006-06-10 22:40:17 UTC

Permalink

Post by Greg McCreath
This might be a silly question, but why did such an important thing
change?

Greg McCreath

2006-06-10 23:16:47 UTC

Permalink

Hi Luiz,

My apologies for monopolizing your time on a Brazilian Saturday night.

Roger. So the 5.1.x luac will always generate (say) little-endian
bytecode and any other platforms will need to cross-compile that
bytecode (or the client platform requires the modified C code).?

Greg.

-----Original Message-----
From: Luiz Henrique de Figueiredo [mailto:***@tecgraf.puc-rio.br]
Sent: Sunday, 11 June 2006 8:40 AM
To: Lua list
Subject: Re: Lua bytecodes and endian-ness

Post by Greg McCreath
This might be a silly question, but why did such an important thing
change?

Because we aim at simplicity. We do understand that there is a need for
avoiding endianness issues; hence the modified lundump.c that I posted.
But that is not part of the core (perhaps we could add it to etc/).
See the threads below:

http://lua-users.org/lists/lua-l/2005-06/msg00019.html
http://lua-users.org/lists/lua-l/2006-01/msg00024.html
http://lua-users.org/lists/lua-l/2006-02/msg00507.html

Now, endianness is not the only issue. See

http://lua-users.org/lists/lua-l/2005-06/msg00048.html

I'm sorry for the inconvenience.
--lhf

This email and any files transmitted with it may be confidential and are intended solely for the use of the individual or entity to whom they are addressed. This email may contain personal information of individuals, and be subject to Commonwealth and/or State privacy laws in Australia. This email is also subject to copyright. If you are not the intended recipient, you must not read, print, store, copy, forward or use this email for any reason, in accordance with privacy and copyright laws. If you have received this email in error, please notify the sender by return email, and delete this email from your inbox.

Luiz Henrique de Figueiredo

2006-06-10 23:23:28 UTC

Permalink

Post by Greg McCreath
Roger. So the 5.1.x luac will always generate (say) little-endian
bytecode and any other platforms will need to cross-compile that
bytecode (or the client platform requires the modified C code).?

luac will always generate native-endian bytecode. There is no cross-compiler
(yet). If you want clients to load bytecode of any endiannes, just build Lua
with the modified lundump.c that I posted. I'm sorry if I wasn't clear about
that before.
--lhf

Asko Kauppi

2006-06-11 07:57:02 UTC

Permalink

Could you reconsider the endianess issue? That is, to have the
modified lundump.c in the main sources, it could be #ifdef'fed like
readline support is, that would be fine (= we need to specifically
enable multi-endianness support).

This comes up at least in fink (OS X) packaging; now, we'll have to
include & deliver a separate patch. Otherwise, we could be building
just based on the official .tar.gz.

Thanks for the consideration, :)

-asko

Post by Luiz Henrique de Figueiredo

luac will always generate native-endian bytecode. There is no cross-
compiler
(yet). If you want clients to load bytecode of any endiannes, just build Lua
with the modified lundump.c that I posted. I'm sorry if I wasn't clear about
that before.
--lhf

D Burgess

2006-06-11 09:38:18 UTC

Permalink

I am of the same view as Asko.
Is it also possible to make the bytecode either independent
or cross-compilable for sizeof(lua_Integer)?

DB

Post by Asko Kauppi
Could you reconsider the endianess issue? That is, to have the
modified lundump.c in the main sources, it could be #ifdef'fed like
readline support is, that would be fine (= we need to specifically
enable multi-endianness support).
This comes up at least in fink (OS X) packaging; now, we'll have to
include & deliver a separate patch. Otherwise, we could be building
just based on the official .tar.gz.
Thanks for the consideration, :)
-asko

Post by Luiz Henrique de Figueiredo

luac will always generate native-endian bytecode. There is no cross-
compiler
(yet). If you want clients to load bytecode of any endiannes, just build Lua
with the modified lundump.c that I posted. I'm sorry if I wasn't clear about
that before.
--lhf

Doug Rogers

2006-06-12 13:32:48 UTC

Permalink

Post by D Burgess
Is it also possible to make the bytecode either independent
or cross-compilable for sizeof(lua_Integer)?

Originally I held the same opinion, but now I acknowledge that the
problem is more difficult than just endianness. Even floating point
representations might differ (Alpha supports both IEEE and VAX floating
point!). So... for which platforms will the SEALs (Supreme and Esteemed
Authors of Lua) provide this universal bytecode dumping and loading?

In my own embedded target arena, a universal bytecode format would be
very beneficial since I could then exclude the parser. But embedded
targets are also the ones that require the most varied implementations
of the loader. Let those implementations modify lundump.c to their own
benefit, rather than force the SEALs to maintain packing and unpacking
code for a bunch of implementations. More powerful platforms could load
strings (perhaps compressed) and compile them. A weakly keyed table
could be used to map the string representation with the compiled chunk.

As usual there are lots of options, many already discussed.

I have chosen to include the parser, even though it chews up a lot of
precious FLASH. That allows me to run the Lua interpreter directly
through the serial port.

Doug

--
--__-__-____------_--_-_-_-___-___-____-_--_-___--____
Doug Rogers - ICI - V:703.893.2007x220 www.innocon.com
-_-_--_------____-_-_-___-_--___-_-___-_-_---_--_-__-_

D Burgess

2006-06-12 15:36:06 UTC

Permalink

From my simple view I would happy for there to be no changes to the

base lua loader. But an extended luac that cross compiles. e.g.

luac -e l -o luac.out bytecode.lua -- little endian
luac -e b -o luac.out bytecode.lua --big endian

Adding further options in the future for numeric size and format would be
nice.

As I read Luiz's code, the endian thing has been done for luac.
I dont seek true bytecode portability, just an ability to generate/convert
bytecode for foreign platforms.

Or have I completely missed it?

DB

Luiz Henrique de Figueiredo

2006-06-12 16:41:17 UTC

Permalink

Post by Doug Rogers
In my own embedded target arena, a universal bytecode format would be
very beneficial since I could then exclude the parser. But embedded
targets are also the ones that require the most varied implementations
of the loader. Let those implementations modify lundump.c to their own
benefit, rather than force the SEALs to maintain packing and unpacking
code for a bunch of implementations.

As has already been discussed several times, there are many issues here.
I'm sorry for the long post, but please bear with me if you're interested
in these issues.

0. Our philosophy regarding precompiled scripts is that loading them
should be as fast as possible. That goal dictates the current
implementation, which only works for native formats. Also, if the
byecode loader is to have a size advantage over the full parser,
then the loader must be quite small.

1. It is very convenient to precompile scripts once and load them on
multiple platforms. The solution is a universal bytecode loader,
not a cross-compiler. However, such a loader is bound to be complex
and possibly hard to test and maintain, if it caters for many platforms
(but that's the whole point).

2. The endianness issue. If platforms differ *only* on endianness, then
the modified lundump.c that I have posted is the solution. This goes
against simplicity of the loader, but the convenience may offset this.
Perhaps we could more official and distribute it in etc/.

3. A cross-compiler can be useful if you cannot run luac on the target
platform. The solution is a modified ldump.c suited to the target;
this will keep the loader as simple as possible, which is the original
goal.

4. Lua does not depend on what ldump.c and lundump.c do, as long as the
loader builds the correct internal data structures. You can replace
ldump.c and lundump.c by anything you want, as long as they agree on
the external format.

5. In the current implementation, this format has two levels: a
structural level and a physical level. This should make it simple to
replace the physical level, which only contains a handful of simple
routines. Take ldump.c. The lowest physical level is implemented
by DumpBlock and DumpMem. DumpMem is for data that could depend on
endianness. So you can, say, write a dumper that saves files in a
fixed endianness by simply rewriting DumpMem (which is currently
a macro over DumpBlock). The other half of the physical level is
implemented by DumpChar, DumpInt, DumpNumber, DumpVector, and
DumpString. You can use a different external representation for
integers, Lua numbers, and strings by rewriting some of those. The
rest of the code in ldump.c implements the structural level. You can
use a different structure by rewriting it. The loader is implemented
in a similar way. LoadMem is for data that depend on endianness. The
swap-aware loader that I posted earlier implements byte swapping
in LoadMemand that's the only real change (except for testing the
endianness of the file being loaded). Again, you can cater for
different external representations by rewriting the corresponding
routines in the loader.

Finally, the header of precompiled scripts contain information about the
internal format. This can be used to make the necessary decisions for
complicaded loaders and bytecode transformers. The header also contains
a format number, which can and should be used by anyone writting a different
external format.

All the modifications discussed above are simple to make, given a specific
goal or target platform. It's just that making all of them is too much to
include in the core Lua distribution.

If you want to modify ldump.c or lundump.c for a specific task, please feel
free to contact me (or post here) if you have any questions.

--lhf
If you have any

Peter Hull

2006-06-13 09:33:51 UTC

Permalink

This does mean that you'd have to freeze and document some part of the
Lua internals to give a stable interface for the user's implementation
of dump/undump. Otherwise it would be annoying to have to re-implement
it for every Lua release. And, I suppose there would be some overhead
to this. Would this be feasible?

My second comment is that a platform independent dump/undump would
certainly be useful for me, and I suspect would be useful or neutral
for most people, apart from the embedded guys, where every byte/cycle
counts.

Pete

Post by Luiz Henrique de Figueiredo

As has already been discussed several times, there are many issues here.
I'm sorry for the long post, but please bear with me if you're interested
in these issues.
0. Our philosophy regarding precompiled scripts is that loading them
should be as fast as possible. That goal dictates the current
implementation, which only works for native formats. Also, if the
byecode loader is to have a size advantage over the full parser,
then the loader must be quite small.
1. It is very convenient to precompile scripts once and load them on
multiple platforms. The solution is a universal bytecode loader,
not a cross-compiler. However, such a loader is bound to be complex
and possibly hard to test and maintain, if it caters for many platforms
(but that's the whole point).
2. The endianness issue. If platforms differ *only* on endianness, then
the modified lundump.c that I have posted is the solution. This goes
against simplicity of the loader, but the convenience may offset this.
Perhaps we could more official and distribute it in etc/.
3. A cross-compiler can be useful if you cannot run luac on the target
platform. The solution is a modified ldump.c suited to the target;
this will keep the loader as simple as possible, which is the original
goal.
4. Lua does not depend on what ldump.c and lundump.c do, as long as the
loader builds the correct internal data structures. You can replace
ldump.c and lundump.c by anything you want, as long as they agree on
the external format.
5. In the current implementation, this format has two levels: a
structural level and a physical level. This should make it simple to
replace the physical level, which only contains a handful of simple
routines. Take ldump.c. The lowest physical level is implemented
by DumpBlock and DumpMem. DumpMem is for data that could depend on
endianness. So you can, say, write a dumper that saves files in a
fixed endianness by simply rewriting DumpMem (which is currently
a macro over DumpBlock). The other half of the physical level is
implemented by DumpChar, DumpInt, DumpNumber, DumpVector, and
DumpString. You can use a different external representation for
integers, Lua numbers, and strings by rewriting some of those. The
rest of the code in ldump.c implements the structural level. You can
use a different structure by rewriting it. The loader is implemented
in a similar way. LoadMem is for data that depend on endianness. The
swap-aware loader that I posted earlier implements byte swapping
in LoadMemand that's the only real change (except for testing the
endianness of the file being loaded). Again, you can cater for
different external representations by rewriting the corresponding
routines in the loader.
Finally, the header of precompiled scripts contain information about the
internal format. This can be used to make the necessary decisions for
complicaded loaders and bytecode transformers. The header also contains
a format number, which can and should be used by anyone writting a different
external format.
All the modifications discussed above are simple to make, given a specific
goal or target platform. It's just that making all of them is too much to
include in the core Lua distribution.
If you want to modify ldump.c or lundump.c for a specific task, please feel
free to contact me (or post here) if you have any questions.
--lhf
If you have any

Luiz Henrique de Figueiredo

2006-06-13 11:16:41 UTC

Permalink

Post by Peter Hull
This does mean that you'd have to freeze and document some part of the
Lua internals to give a stable interface for the user's implementation
of dump/undump.

I don't think so. The internals of dump/undump are pretty simple.

Post by Peter Hull
Otherwise it would be annoying to have to re-implement
it for every Lua release.

A bit annoying, yes. But Lua releases do not happen frequently...
Moreover, the number of useful dump/undump is not too large and ideally
each such pair would have a maintainer that would do the work each time
it is needed.
--lhf

a***@dnainternet.net

2006-06-14 05:34:26 UTC

Permalink

On 6/12/06, Luiz Henrique de Figueiredo

...

Post by Luiz Henrique de Figueiredo
0. Our philosophy regarding precompiled scripts is that
loading them
should be as fast as possible. That goal dictates the
current
implementation, which only works for native formats.
Also, if the
bytecode loader is to have a size advantage over the
full parser,
then the loader must be quite small.
1. It is very convenient to precompile scripts once and
load them on
multiple platforms. The solution is a universal
bytecode loader,
not a cross-compiler. However, such a loader is bound
to be complex
and possibly hard to test and maintain, if it caters
for many platforms
(but that's the whole point).

1. Hmm.. with the existance of both cross compiler, and
universal loader, it would be easy to make a regression
test suite that finds any bugs introduced automatically.
About one days work?

Post by Luiz Henrique de Figueiredo
2. The endianness issue. If platforms differ *only* on
endianness, then
the modified lundump.c that I have posted is the
solution. This goes
against simplicity of the loader, but the convenience
may offset this.
Perhaps we could more official and distribute it in
etc/.

2. Having the modified lundump.c in etc/ (or a patch)
would actually be enough for the fink case I mentioned.
Build code could then simply copy etc -> src before
building, without needing to generate or fetch the patch
from elsewhere. That would do. :) But I still think
#ifdef'fed approach is actually better (basically, a
question on how official you want to be with the
approach).

Post by Luiz Henrique de Figueiredo
3. A cross-compiler can be useful if you cannot run luac
on the target
platform. The solution is a modified ldump.c suited to
the target;
this will keep the loader as simple as possible, which
is the original
goal.

3. Cross compiler has been requested frequently enough, in
my opinion, that I fail to see why luac wouldn't be
developed into that direction. Possible in 5.2? :)

Post by Luiz Henrique de Figueiredo
4. Lua does not depend on what ldump.c and lundump.c do,
as long as the
loader builds the correct internal data structures.
You can replace
ldump.c and lundump.c by anything you want, as long as
they agree on
the external format.
5. In the current implementation, this format has two
levels: a
structural level and a physical level. This should
make it simple to
replace the physical level, which only contains a
handful of simple
routines. Take ldump.c. The lowest physical level is
implemented
by DumpBlock and DumpMem. DumpMem is for data that
could depend on
endianness. So you can, say, write a dumper that saves
files in a
fixed endianness by simply rewriting DumpMem (which is
currently
a macro over DumpBlock). The other half of the
physical level is
implemented by DumpChar, DumpInt, DumpNumber,
DumpVector, and
DumpString. You can use a different external
representation for
integers, Lua numbers, and strings by rewriting some
of those. The
rest of the code in ldump.c implements the structural
level. You can
use a different structure by rewriting it. The loader
is implemented
in a similar way. LoadMem is for data that depend on
endianness. The
swap-aware loader that I posted earlier implements
byte swapping
in LoadMemand that's the only real change (except for
testing the
endianness of the file being loaded). Again, you can
cater for
different external representations by rewriting the
corresponding
routines in the loader.
Finally, the header of precompiled scripts contain
information about the
internal format. This can be used to make the necessary
decisions for
complicaded loaders and bytecode transformers. The
header also contains
a format number, which can and should be used by anyone
writting a different
external format.
All the modifications discussed above are simple to
make, given a specific
goal or target platform. It's just that making all of
them is too much to
include in the core Lua distribution.

Perhaps we need someone interested enough on this, to
produce and maintain the cross-compiler, and other these
issues?

Then again, some parts (the cross compiler, and having
endianness read-in support) should at least be covered by
the authors. Sorry if we're making your life harder,
that's what the users are for!! :) But, they are also
fun, aren't they? :P

-asko

Post by Luiz Henrique de Figueiredo
If you want to modify ldump.c or lundump.c for a
specific task, please feel
free to contact me (or post here) if you have any
questions.
--lhf
If you have any

Sam Roberts

2006-06-14 16:30:46 UTC

Permalink

Post by a***@dnainternet.net

Post by Luiz Henrique de Figueiredo
1. It is very convenient to precompile scripts once and
load them on
multiple platforms. The solution is a universal
bytecode loader,
not a cross-compiler. However, such a loader is bound
to be complex
and possibly hard to test and maintain, if it caters
for many platforms
(but that's the whole point).

1. Hmm.. with the existance of both cross compiler, and
universal loader, it would be easy to make a regression
test suite that finds any bugs introduced automatically.
About one days work?

I think you missed the point. Writing is easy, running the test suite on
dozens of platforms you don't have access to is not so easy. Lua in
particular is used on machines other than the typical sparc/ppc/intel.

Also, The point of loading pre-compiled lua code is that the lua core
can be stripped to its simplest, and to save a little bit of time during
load.

If you don't want smallest and fastest, and instead want most portable
and most flexible, lua source is already that.

So, the two extremes are supported, and in between there is a whole
range, from simply byte-swapping on similar machines, to more
complicated manipulations. People with compelling reasons to be
somewhere between these two extremes will likely want to be in
a precise place, and might not want all the baggage of a universal
loader, they might just want byte swapping, for example, or just
FP representation independence. They have lots of hooks in the lua
core to enable building exactly what they want.

Also, its not clear to me that a universal loader would be smaller or
faster than the lua parser, and if it wasn't, I don't see what its point
would be for anybody. Maybe just that the byte-code would be smaller
than source?

Cheers,
Sam