Discussion:
LuaJIT 2 ffi (casting / force hotpath)
CrazyButcher
2011-01-15 15:07:10 UTC
Permalink
excellent work Mike!

just gave the new ffi functionality a quick spin as the dll loading
mechanism seems to have its core functionality for a couple days now.

Loading Image... (yeah I mistyped
some stuff in there hehe)

Now some questions:

blubb_data = ffi.new("blubb")
blah_data = ffi.cast("blah*",blubb_data)
works fine, does the object returned also reference (in GC sense) the
ffi object it originated from (blubb_data)? Or is the returned object
just a pointer, and only objects created by ffi.new have GC to cleanup
memory.

The jit would optimize away the safety checks and table lookups if I
call some dll function always in the same manner

"ogl32.glClear(...)" or would it still be favorable to store functions locally?

local clr = ogl32.glClear
function oftenCalled()
...
clr(...)
...
end

Given the 2008 roadmap description of LuaJIT 2[1]
- constant folding could not apply, as ogl32 is just a table, hence
the "upvalue" case would be favorable
- On the other hand guard-moving would result into upvalue like code...

You also wrote that there is a trace-tree, so "hot switches/branches"
would result into dedicated trace paths... basically all that means
there is not really much left the user would have to manually do, to
get optimal results?

Is there a way to "pre-define" a hotpath. Say I want no delays due to
runtime compilation at a later time, but as I have the knowledge of
the hotpath I want to trigger the compilation for it manually?

say that I know the first time the function is called, I want to force
it "over the threshold" to do the trace record and so on. That way the
first "frame" would take a bit longer, but the others would be faster.

- Christoph

[1] http://lua-users.org/lists/lua-l/2008-02/msg00051.html
Mike Pall
2011-01-15 17:25:48 UTC
Permalink
Post by CrazyButcher
http://crazybutcher.luxinia.de/wip/luajit_ffi_ogl.gif (yeah I mistyped
some stuff in there hehe)
Thanks for testing the LuaJIT FFI! I guess I have to clear up some
misconceptions (my fault, since I haven't written the docs, yet):

ffi.cdef doesn't return a value, but it takes multiple declarations
('extern' by default). So it's usually used only once to declare
everything at the start of a module. Something like this:

local ffi = require("ffi")

ffi.cdef[[
struct foo { int a,b; } foo_t;
int foo(foo_t *x);
int MessageBoxA(void *w, const char *txt, const char *cap, int type);
]]

The ffi.C default namespace and the namespaces returned by ffi.load
can be indexed like any Lua object. So t["a"] is the same as t.a.
I.e. you can shorten function calls:

ffi.C.MessageBoxA(nil, "Hello world!", "Test", 0)

[Note that MessageBoxA is a __stdcall. The LuaJIT FFI auto-detects
this, so you don't have to deal with this mess. :-) ]

Or for your example:

ffi.cdef[[
int glfwInit(void);
]]

local glfw = ffi.load("glfw")
glfw.glfwInit()
Post by CrazyButcher
blubb_data = ffi.new("blubb")
blah_data = ffi.cast("blah*",blubb_data)
works fine, does the object returned also reference (in GC sense) the
ffi object it originated from (blubb_data)? Or is the returned object
just a pointer, and only objects created by ffi.new have GC to cleanup
memory.
Casts, constructors or any implicit references do NOT create any
GC references. Neither could this be the case for any objects you
get back from C calls. You need to take care to keep references to
GC objects yourself (e.g. the results of ffi.new() or ffi.cast()).

Some more hints:

It's faster/easier to use constructors instead of ffi.new():

local foo_t = ffi.typeof("foo_t") -- Do this once.

-- Some often called part:
local x = foo_t()
local y = foo_t(1, 2) -- Takes initializers, just like ffi.new().

There's little need to cast arrays/structs to pointers. All
conversions that result in a pointer to an aggregate accept either
a pointer to an aggregate _or_ an aggregate itself. So you can
just write this:

C.foo(x)

Even though 'x' is a struct and foo() wants a pointer to a struct.

In general you'll only need ffi.cast() to cast between pointer types
to satisfy external API constraints or to force a specific type for
vararg parameters (though you could use a scalar constructor, too):

ffi.cdef[[
int printf(const char *fmt, ...);
]]

local x = 12
ffi.C.printf("double=%g int=%d\n", x, ffi.cast("int", x))

[Lua numbers are doubles. It doesn't matter whether they have a
fractional value or how you write the number literal. '12.0' is
absolutely identical to '12' from the view of the parser. The
conversion to integers is automatic for fixed C function
parameters -- only varargs need special handling.]

If any C call wants a pointer to a scalar (to return something),
just pass it a one-element array:

ffi.cdef[[
int sscanf(const char *str, const char *fmt, ...);
]]

local pn = ffi.new("int[1]")
ffi.C.sscanf("foo 123", "foo %d", pn)
print(pn[0]) --> 123

A slightly more involved example, showing how to use mutable buffers:

ffi.cdef[[
int uncompress(uint8_t *dest, unsigned long *destLen,
const uint8_t *source, unsigned long sourceLen);
]]

local zlib = ffi.load("z")

local function uncompress_string(comp, origsize)
local buf = ffi.new("uint8_t[?]", origsize)
local buflen = ffi.new("unsigned long[1]", origsize)
assert(zlib.uncompress(buf, buflen, comp, #comp) == 0)
return ffi.string(buf, tonumber(buflen[0]))
end
Post by CrazyButcher
The jit would optimize away the safety checks and table lookups if I
call some dll function always in the same manner
"ogl32.glClear(...)" or would it still be favorable to store functions locally?
local clr = ogl32.glClear
function oftenCalled()
...
clr(...)
...
end
No, please don't do this. The namespace lookup will be shortcut,
so there's no need (and in fact it's counter-productive) to keep
local references to functions. Also, please don't keep local
references to intermediate parts of nested arrays/structs (always
use 'y = foo[10].x' and not: 'local s = foo[10]; ...; y = s.x')

OTOH you should keep the namespace itself ('ogl32') in a local or
upvalue. So the recommendation is to do it like this:

local ogl = ffi.load("OpenGL32")

local function oftenCalled()
...
ogl.glClear(...)
...
end

[
Currently the JIT compiler doesn't compile C function calls (needs
some redesign first), so they are still interpreted. But this is
going to be the behavior whenever I implement it.

And I should note that converting Lua functions to C callbacks
doesn't work yet. This is quite tricky and will likely be one of
the last things I'll implement.
]
Post by CrazyButcher
Given the 2008 roadmap description of LuaJIT 2[1]
- constant folding could not apply, as ogl32 is just a table, hence
the "upvalue" case would be favorable
- On the other hand guard-moving would result into upvalue like code...
Namespaces are not tables, they are tagged userdata objects. The
JIT compiler detects this and specializes to the key ("glClear").
Thus the value (the cdata function object) becomes a constant,
which in turn makes its type and address a constant. This
eliminates all lookups at runtime.
Post by CrazyButcher
You also wrote that there is a trace-tree, so "hot switches/branches"
would result into dedicated trace paths... basically all that means
there is not really much left the user would have to manually do, to
get optimal results?
Is there a way to "pre-define" a hotpath. Say I want no delays due to
runtime compilation at a later time, but as I have the knowledge of
the hotpath I want to trigger the compilation for it manually?
say that I know the first time the function is called, I want to force
it "over the threshold" to do the trace record and so on. That way the
first "frame" would take a bit longer, but the others would be faster.
The JIT compiler is _very_ fast. It's operating in the microsecond
range and the load is spread out over the bytecode execution (the
compiler pipeline is fed incrementally). I'd be very surprised if
you'd be able to see a frame glitch or even notice that the JIT
compiler is running at all.

Also, it's not a good idea to second-guess the region selection
heuristics. I tried this many times ('This can't be the best
path!') and failed miserably. The selected traces are sometimes
very strange, but so far they turned out to be near optimal. It's
quite embarrasing if your own creation turns out to make smarter
decisions than yourself. ;-)

--Mike
David Kastrup
2011-01-15 17:52:44 UTC
Permalink
It's quite embarrasing if your own creation turns out to make smarter
decisions than yourself. ;-)
But it's pretty much the main hope for humanity.
--
David Kastrup
Miles Bader
2011-01-15 18:21:21 UTC
Permalink
Post by Mike Pall
No, please don't do this. The namespace lookup will be shortcut,
so there's no need (and in fact it's counter-productive) to keep
local references to functions. Also, please don't keep local
references to intermediate parts of nested arrays/structs (always
use 'y = foo[10].x' and not: 'local s = foo[10]; ...; y = s.x')
Why is it counter-productive? _Not_ being able to do this will be very
counter-intuitive for most programmers I think...

-Miles
--
My spirit felt washed. With blood. [Eli Shin, on "The Passion of the Christ"]
Mike Pall
2011-01-15 19:11:06 UTC
Permalink
Post by Miles Bader
Post by Mike Pall
No, please don't do this. The namespace lookup will be shortcut,
so there's no need (and in fact it's counter-productive) to keep
local references to functions. Also, please don't keep local
references to intermediate parts of nested arrays/structs (always
use 'y = foo[10].x' and not: 'local s = foo[10]; ...; y = s.x')
Why is it counter-productive? _Not_ being able to do this will be very
counter-intuitive for most programmers I think...
Err, no. Please read again. You _can_ do this, of course. But it's
not helpful to create intermediate pointers/refs where a single
expression would suffice.

Neither is it a good idea to do this with any modern C compiler.
The general rule is _not_ to create pointers where you can use
indexes, because the compiler (whether C or LuaJIT) has to work
hard to undo your manual 'optimization' (which really isn't one).

So if you have this struct of arrays:

ffi.cdef[[
struct foo { int x[100], y[100], z[100]; };
]]
local s = ffi.new("struct foo")

... and want to copy two components, then do it like this:

for i=0,99 do s.x[i] = s.y[i] end -- Good!

... and NOT like this:

local x, y = s.x, s.y -- Creates two intermediate references.
for i=0,99 do x[i] = y[i] end -- Works, but please don't!

In LuaJIT's case, intermediate references are always created in
the interpreter, but the JIT compiler can easily eliminate them if
they are consumed right away. This doesn't work if you store them
in some local variable and this variable escapes to some side
exit. Then it gets really difficult or impossible to eliminate the
allocation of the intermediate reference (and allocation sinking
is not yet implemented, too).

A C compiler easily gets into trouble with pointer aliasing and
needs some expensive analysis to turn this back into the original
index expression the programmer tried to 'optimize'. :-)

Morale: what a programmer may think is helpful for the compiler,
often is not. So write things in the most straightforward way.

--Mike
Quae Quack
2011-01-15 20:11:04 UTC
Permalink
 for i=0,99 do s.x[i] = s.y[i] end -- Good!
 local x, y = s.x, s.y -- Creates two intermediate references.
 for i=0,99 do x[i] = y[i] end   -- Works, but please don't!
In LuaJIT's case, intermediate references are always created in
the interpreter, but the JIT compiler can easily eliminate them if
they are consumed right away. This doesn't work if you store them
in some local variable and this variable escapes to some side
exit. Then it gets really difficult or impossible to eliminate the
allocation of the intermediate reference (and allocation sinking
is not yet implemented, too).
A C compiler easily gets into trouble with pointer aliasing and
needs some expensive analysis to turn this back into the original
index expression the programmer tried to 'optimize'. :-)
Morale: what a programmer may think is helpful for the compiler,
often is not. So write things in the most straightforward way.
--Mike
This seems so strange to me; In the regular lua interpreter we
obviously gain (by removing 2 hash lookups in your example)
additionally, i DO find the latter example much more straight forward.
==> in more complex expressions you end up typing 's' many more times,
which I would find ugly.

What sort of performace do we loose by continuing this practice?

Daurn.
Wesley Smith
2011-01-15 20:13:16 UTC
Permalink
Post by Quae Quack
This seems so strange to me; In the regular lua interpreter we
obviously gain (by removing 2 hash lookups in your example)
Yes, but the Lua interpreter does not do the kinds of optimizations a
typical C compiler will do.
Miles Bader
2011-01-15 23:44:12 UTC
Permalink
Post by Mike Pall
A C compiler easily gets into trouble with pointer aliasing and
needs some expensive analysis to turn this back into the original
index expression the programmer tried to 'optimize'. :-)
Morale: what a programmer may think is helpful for the compiler,
often is not. So write things in the most straightforward way.
Sure, but often the goal is not "optimization", but clarity -- if the
factored-out expression is long, it can be significantly harder to read
code without the factoring-out. [I.e., "straightforward" doesn't
always equal "readable"]

-miles
--
Admiration, n. Our polite recognition of another's resemblance to ourselves.
CrazyButcher
2011-01-15 23:52:51 UTC
Permalink
Post by Miles Bader
Sure, but often the goal is not "optimization", but clarity -- if the
factored-out expression is long, it can be significantly harder to read
code without the factoring-out.   [I.e., "straightforward" doesn't
always equal "readable"]
-miles
that's why Mike stressed the fact that both ways will work correctly.
As I was asking for optimal use, it's the kind of reply I wanted.
Miles Bader
2011-01-16 03:30:56 UTC
Permalink
Post by CrazyButcher
Post by Miles Bader
Sure, but often the goal is not "optimization", but clarity -- if the
factored-out expression is long, it can be significantly harder to read
code without the factoring-out.   [I.e., "straightforward" doesn't
always equal "readable"]
that's why Mike stressed the fact that both ways will work correctly.
As I was asking for optimal use, it's the kind of reply I wanted.
Er, right, but the _degree_ to which such practices might cause less
efficient code is important in many cases.

Maybe for your usage, you don't care, and just want what's most
efficient, but I think most programmers will to some degree be looking
to understand the tradeoffs involved. I think most experienced
programmers have a reasonable intuition about these tradeoffs for more
traditional environments (base Lua, C/C++, etc).

In C/C++, I know that while aliasing can sometimes be an issue with such
code, it's generally not a big deal as long the scope is kept small
(e.g., within a loop body, not across a function call, etc), and perhaps
avoiding certain specific circumstances.

It's tempting to think of LuaJIT as "just another compiler" -- and thus
that the tradeoffs might be similar -- but in fact I guess that's not
necessarily true.

Really what I want is some good intuition to latch onto. With
traditional compilers, one can often get this by examining assembly
output and doing benchmarking, but this seems harder with a JIT
compiler....

-Miles
--
The car has become... an article of dress without which we feel uncertain,
unclad, and incomplete. [Marshall McLuhan, Understanding Media, 1964]
Mike Pall
2011-01-20 21:23:30 UTC
Permalink
Post by Mike Pall
Thanks for testing the LuaJIT FFI! I guess I have to clear up some
Ok, for anyone still interested: I've just added a first (rough)
cut of the documentation for the FFI library. The tutorial and the
section on semantics is still missing.

So point a browser to:

doc/ext_ffi.html

after pulling from LuaJIT git HEAD.

I hope the docs clear up many of your questions. If not, then
please ask here. Feedback welcome!

--Mike
Enrico Tassi
2011-01-20 21:57:07 UTC
Permalink
Post by Mike Pall
I hope the docs clear up many of your questions.
Hope you don't mind if I raise a (new?) one, not related to the docs,
but that came up reading:

The FFI library is tightly integrated into LuaJIT (it's not available
as a separate module).

Is it impossible, doable, planned or a completely crazy idea... to have
that very nice feature working with plain Lua? I mean, generating C
glue, instead of asm glue?

Cheers
--
Enrico Tassi
Quae Quack
2011-01-20 22:25:23 UTC
Permalink
Post by Enrico Tassi
Post by Mike Pall
I hope the docs clear up many of your questions.
Hope you don't mind if I raise a (new?) one, not related to the docs,
 The FFI library is tightly integrated into LuaJIT (it's not available
 as a separate module).
Is it impossible, doable, planned or a completely crazy idea... to have
that very nice feature working with plain Lua? I mean, generating C
glue, instead of asm glue?
Cheers
--
Enrico Tassi
Mike has previously said that it is not within a reasonable amount of
effort to port it to plain lua.
Otherwise, I would propose wrappers around the tinycc or alien
libraries to provide API compatible code.... Something I wish I had
the time to get around to doing.
Alex Bradbury
2011-01-20 22:36:25 UTC
Permalink
Post by Mike Pall
I hope the docs clear up many of your questions. If not, then
please ask here. Feedback welcome!
Unless I missed it, I don't think there's currently a way to iterate
over a C library namespace. This would be useful, for instance, to
enable tab-completion in interactive Lua sessions.

Alex
Mike Pall
2011-01-20 22:46:48 UTC
Permalink
Post by Alex Bradbury
Unless I missed it, I don't think there's currently a way to iterate
over a C library namespace. This would be useful, for instance, to
enable tab-completion in interactive Lua sessions.
Sorry, but that cannot work. C library namespaces are auto-binding:
only if you actually index them with a symbol, does the symbol get
bound (loaded from the library and associated with its declaration).
Internally the namespace is a userdata plus a cache. And the cache
contents are unsuitable for that purpose (empty at start of course).

Also, C declarations have no concept of origin. They could be valid
symbols in any and all libraries. Until you actually try to bind
them by indexing the namespace, they are not associated with any
particular library. In fact, you can bind the same symbol multiple
times in different namespaces. So iterating over all symbol
declarations wouldn't make much sense either.

--Mike
Alex Bradbury
2011-01-21 11:12:58 UTC
Permalink
Post by Mike Pall
I hope the docs clear up many of your questions. If not, then
please ask here. Feedback welcome!
I can imagine binding C APIs by creating Lua objects that hold a
reference to the C object. If the library lets you allocate the memory
yourself, then garbage collection can be handled for you. Otherwise
just the pointer cdata will be gced, so you must write your own __gc
metamethod to call the library's free function - but this isn't
supported on tables. Is there a better way than creating a newproxy
for each object and giving it a __gc metamethod that makes the
appropriate free call?

Alex
Mike Pall
2011-01-21 11:47:32 UTC
Permalink
Post by Alex Bradbury
I can imagine binding C APIs by creating Lua objects that hold a
reference to the C object. If the library lets you allocate the memory
yourself, then garbage collection can be handled for you. Otherwise
just the pointer cdata will be gced, so you must write your own __gc
metamethod to call the library's free function - but this isn't
supported on tables. Is there a better way than creating a newproxy
for each object and giving it a __gc metamethod that makes the
appropriate free call?
You raise a valid point. Currently wrapping it in userdata is the
only way to trigger the __gc metamethod. Although cdata objects
will gain user-definable metamethods (per type, not per instance),
the __gc metamethod won't work on them with the current GC. You'll
have to wait for LuaJIT 2.1.

--Mike
Michal Kottman
2011-01-21 11:39:00 UTC
Permalink
Post by Mike Pall
Post by Mike Pall
Thanks for testing the LuaJIT FFI! I guess I have to clear up some
Ok, for anyone still interested: I've just added a first (rough)
cut of the documentation for the FFI library. The tutorial and the
section on semantics is still missing.
doc/ext_ffi.html
after pulling from LuaJIT git HEAD.
I hope the docs clear up many of your questions. If not, then
please ask here. Feedback welcome!
The docs are pretty clear, I imagine a great lot can be done with these
C bindings, thanks!

I have a question I wanted to ask for a long time - are there any plans
to support C++? Using a database of classes/methods/arguments, I could
replace the whole 25Mb generated Qt bindings with a single library that
generates bindings on-the-fly. This is just a dream, but I hope there is
a faint chance of it becoming true :)
Mike Pall
2011-01-21 12:05:19 UTC
Permalink
Post by Michal Kottman
I have a question I wanted to ask for a long time - are there any plans
to support C++? Using a database of classes/methods/arguments, I could
replace the whole 25Mb generated Qt bindings with a single library that
generates bindings on-the-fly. This is just a dream, but I hope there is
a faint chance of it becoming true :)
Everything in the LuaJIT FFI is prepared to support C++, but I
think implementing it will be a nightmare rather than a dream. :-)

Getting simple textbook examples of C++ up and running is
certainly doable. The problem is that none of the real-world code
out there works that way. Inline functions and templates are just
a few of the main stumbling blocks. E.g. Boost is hopeless (not
that it would be that useful for Lua).

So if and when I add C++ support, it'll be a handpicked subset.
The problem is that everyone needs a different subset. And Qt has
a very quaint idea of that, too.

Alas, I do have other plans for the next ten years of my life than
implementing a full-blown C++ compiler. So don't hold your breath.

--Mike
Michal Kottman
2011-01-21 12:24:11 UTC
Permalink
Post by Mike Pall
Post by Michal Kottman
I have a question I wanted to ask for a long time - are there any plans
to support C++? Using a database of classes/methods/arguments, I could
replace the whole 25Mb generated Qt bindings with a single library that
generates bindings on-the-fly. This is just a dream, but I hope there is
a faint chance of it becoming true :)
Everything in the LuaJIT FFI is prepared to support C++, but I
think implementing it will be a nightmare rather than a dream. :-)
Getting simple textbook examples of C++ up and running is
certainly doable. The problem is that none of the real-world code
out there works that way. Inline functions and templates are just
a few of the main stumbling blocks. E.g. Boost is hopeless (not
that it would be that useful for Lua).
You're right, I was willing to miss templates, but I totally forgot
about inline functions. Looks like I should stop flying in dreams and
land in reality :)
Post by Mike Pall
So if and when I add C++ support, it'll be a handpicked subset.
The problem is that everyone needs a different subset. And Qt has
a very quaint idea of that, too.
Alas, I do have other plans for the next ten years of my life than
implementing a full-blown C++ compiler. So don't hold your breath.
No one in his right mind should expect that from you :) Essentially all
I wanted to know is whether the thiscall convention is supported in
LuaJIT. Anyway, thanks for the great work!
steve donovan
2011-01-21 12:24:46 UTC
Permalink
Post by Mike Pall
Alas, I do have other plans for the next ten years of my life than
implementing a full-blown C++ compiler. So don't hold your breath.
Having been there, I'd say it's a road of pain. First you have to
handle all the different name-mangling schemes. Then the actual ABI
(does 'this' get passed as a register? How are value objects returned?
). C is a walk in the park in comparison.

steve d.
Florian Weimer
2011-01-22 20:49:18 UTC
Permalink
Post by Michal Kottman
I have a question I wanted to ask for a long time - are there any plans
to support C++? Using a database of classes/methods/arguments, I could
replace the whole 25Mb generated Qt bindings with a single library that
generates bindings on-the-fly. This is just a dream, but I hope there is
a faint chance of it becoming true :)
Wouldn't the humongous size of C++ header files causes problems with
application startup time?
Michal Kottman
2011-01-23 13:47:07 UTC
Permalink
Post by Florian Weimer
Post by Michal Kottman
I have a question I wanted to ask for a long time - are there any plans
to support C++? Using a database of classes/methods/arguments, I could
replace the whole 25Mb generated Qt bindings with a single library that
generates bindings on-the-fly. This is just a dream, but I hope there is
a faint chance of it becoming true :)
Wouldn't the humongous size of C++ header files causes problems with
application startup time?
The idea was to prepare the headers beforehand and convert it into some
easily digestable format, like some form of a key-value database. We
already do the parsing while generating the bindings, and it's
relatively fast (i.e. not more than a second of processing on my 2.4GHz
processor).

Let's say that a user wanted to create qtgui.QPushButton. An __index
metamethod would be triggered and the definitions would be read from the
database and constructed so that LuaJIT FFI could parse it. The
resulting type would be saved back in qtgui.QPushButton and used to
construct new objects.

This way, we would not have to distribute the whole generator, only a
single 'live-binding' module and a pre-processed database of types.

I'm sure this idea could work on smaller libraries, but now it seems to
me that Qt is too complicated for this (it has it's own nuances, like
the signal-slot mechanism, parent-child memory management etc.)
Mike Pall
2011-01-23 14:43:31 UTC
Permalink
Post by Michal Kottman
Post by Florian Weimer
Wouldn't the humongous size of C++ header files causes problems with
application startup time?
The idea was to prepare the headers beforehand and convert it into some
easily digestable format, like some form of a key-value database. We
already do the parsing while generating the bindings, and it's
relatively fast (i.e. not more than a second of processing on my 2.4GHz
processor).
Let's say that a user wanted to create qtgui.QPushButton. An __index
metamethod would be triggered and the definitions would be read from the
database and constructed so that LuaJIT FFI could parse it. The
resulting type would be saved back in qtgui.QPushButton and used to
construct new objects.
This way, we would not have to distribute the whole generator, only a
single 'live-binding' module and a pre-processed database of types.
Thanks for the inspiration! I think I know how to handle the
deployment problem for the LuaJIT FFI now:

On the developers machine:
- Feed all of the headers to the C/C++ parser of the LuaJIT FFI.
This may be somewhat slow of course.
- Then dump the whole internal database of C type declarations in
some kind of output format.
- To enable fast access, this would need to be a mix of a hash
table (for names) and a direct index (for C type IDs). The FFI
internally uses something like this, but it needs to be extended
to work with non-interned strings.
- The ideal format would be an object file, which can either be
statically linked or turned into a shared library. The data
structures should be pure read-only data without the need for
relocations (indexes instead of pointers).
- Generating C code that holds a huge const array is probably the
easiest way. A small luaopen_*() wrapper could be added, so it
behaves like any Lua/C module.

On the users machine:
- Just load the module containing the pre-compiled database with
require.
- Internally, whenever the namespace of the module is indexed, it
recursively copies/interns the needed declarations from the
read-only data to the FFI declarations database and then runs
the standard C library namespace handler.
- This should be quite fast, because the internal FFI data
structures already have support for interning. Also, it works
incrementally and only the needed definitions end up in
read-write memory.
- Since the read-only data is shared across processes and pages
are loaded on-demand by the kernel, the impact on the memory
footprint would be limited.
- One could shrink the declaration database further by adding some
kind of post-execution analysis tool.

Alas, this sounds like quite a bit of work ...
Post by Michal Kottman
I'm sure this idea could work on smaller libraries, but now it seems to
me that Qt is too complicated for this (it has it's own nuances, like
the signal-slot mechanism, parent-child memory management etc.)
That's why I said Qt has a rather quaint idea of C++. :-)
But given the rigid rules for the Qt code base, it's still more
manageable than trying to handle all of C++.

--Mike
Florian Weimer
2011-01-22 09:13:46 UTC
Permalink
Post by Mike Pall
I hope the docs clear up many of your questions. If not, then
please ask here. Feedback welcome!
I think it could be useful if you could create a cdata/userdata
combination. This would allow access to the blob in a userdata as a C
structure, and you could still provide an object-specific metatable.

I think this is surprising (to a C++ programmer at least):

| Objects which are passed as an argument to an external C function
| are kept alive until the call returns.

I think the lifetime should extend to full expression, that is, beyond
the call. The reason is that a function might return a pointer that
is passed in, such as:

char *check_string(char *);

If this is called as

do_something(check_string(ffi.new(...)))

then the returned pointer will not keep the original pointer live.
CrazyButcher
2011-01-22 10:42:59 UTC
Permalink
Post by Florian Weimer
I think the lifetime should extend to full expression, that is, beyond
the call.  The reason is that a function might return a pointer that
 char *check_string(char *);
If this is called as
 do_something(check_string(ffi.new(...)))
then the returned pointer will not keep the original pointer live.
How would the system know that the returned pointer matches the input
pointer, if you know, you can just store it locally. Wouldn't you have
the same problem in C, if you don't store a new allocated pointer
before the function call, how'd you be able to manage the memory after
the function call? Cause then you assume the check_string also "frees"
the memory, if it returns NULL?
Florian Weimer
2011-01-22 12:23:07 UTC
Permalink
Post by CrazyButcher
Post by Florian Weimer
I think the lifetime should extend to full expression, that is, beyond
the call.  The reason is that a function might return a pointer that
 char *check_string(char *);
If this is called as
 do_something(check_string(ffi.new(...)))
then the returned pointer will not keep the original pointer live.
How would the system know that the returned pointer matches the input
pointer, if you know, you can just store it locally.
It could keep a weak-value table from C pointers to cdata objects.
But this is not necessary if freeing is deferred until the full
expression has been evaluated.
Post by CrazyButcher
Wouldn't you have the same problem in C, if you don't store a new
allocated pointer before the function call, how'd you be able to
manage the memory after the function call?
C doesn't have the problem because it doesn't have destructors.
Mike Pall
2011-01-22 18:51:41 UTC
Permalink
Post by Florian Weimer
I think it could be useful if you could create a cdata/userdata
combination. This would allow access to the blob in a userdata as a C
structure, and you could still provide an object-specific metatable.
The FFI treats userdata like a 'void *' pointing to the payload.
So you can assign or cast it to a pointer to a struct and then
access its fields:

ffi.cdef[[
typedef struct { int x; } foo_t;
]]
local tostruct = ffi.typeof("foo_t *")
local function inc_x(ud)
local s = tostruct(ud)
s.x = s.x + 1
end
Post by Florian Weimer
| Objects which are passed as an argument to an external C function
| are kept alive until the call returns.
I think the lifetime should extend to full expression, that is, beyond
the call. The reason is that a function might return a pointer that
char *check_string(char *);
If this is called as
do_something(check_string(ffi.new(...)))
then the returned pointer will not keep the original pointer live.
There's no way to do that, since the bytecode doesn't have any
concept of sequence points. And the FFI knows nothing about the
bytecode, too.

In general, the VM makes no attempt to infer the lifetime of
pointers to allocated objects. That's pretty hopeless, anyway.

The FFI has a strict 'no hand-holding' policy. If you want to keep
some object alive, then assign it to a local variable:

do
local a = ffi.new(...)
do_something(check_string(a))
...
end -- 'a' is not GC'ed before here

--Mike
Florian Weimer
2011-01-22 19:35:33 UTC
Permalink
Post by Mike Pall
Post by Florian Weimer
I think it could be useful if you could create a cdata/userdata
combination. This would allow access to the blob in a userdata as a C
structure, and you could still provide an object-specific metatable.
The FFI treats userdata like a 'void *' pointing to the payload.
So you can assign or cast it to a pointer to a struct and then
ffi.cdef[[
typedef struct { int x; } foo_t;
]]
local tostruct = ffi.typeof("foo_t *")
local function inc_x(ud)
local s = tostruct(ud)
s.x = s.x + 1
end
Ah, very nice. You have to check the metatable just as in C, but
making this explicit clearly simplifies the FFI model.
Post by Mike Pall
Post by Florian Weimer
| Objects which are passed as an argument to an external C function
| are kept alive until the call returns.
I think the lifetime should extend to full expression, that is, beyond
the call. The reason is that a function might return a pointer that
There's no way to do that, since the bytecode doesn't have any
concept of sequence points. And the FFI knows nothing about the
bytecode, too.
You could keep all intermediate values alive until you reach the end
of the full expression, but this likely interferes with tail calls and
has performance issues.
Post by Mike Pall
The FFI has a strict 'no hand-holding' policy. If you want to keep
do
local a = ffi.new(...)
do_something(check_string(a))
...
end -- 'a' is not GC'ed before here
I think we have to wait and see if this lack of reliable functional
composition causes trouble in practice. To be honest, I've seen
obscure problems even with the C++ full expression rule. So it might
very well be better to be very explicit about guaranteed minimum
lifetimes.

More questions:

Is there a way to mark af FFI calls which may trigger callbacks (via C
extensions)? What's the general story about callbacks?

Do you plan to make popular POSIX types (off_t, time_t, struct
timeval, struct timespec), constants (those for errno, for instance)
and errno itself available without meddling with header files? It's
great if you can get this data from header files, but reading them has
downsides (performance, availability at run time, compatibility issues
caused by new GCC extensions).
Mike Pall
2011-01-22 20:52:56 UTC
Permalink
Post by Florian Weimer
Is there a way to mark af FFI calls which may trigger callbacks (via C
extensions)? What's the general story about callbacks?
Right now the call chain 'FFI -> C -> lua_*()' with the same
lua_State is a big no-no. I'm planning to add an extra GCC-like
attribute that could be used to mark C functions which do that.
So you can still call these functions from Lua, but it's only done
from the interpreter (e.g. for the entry into a GUI main loop).

[You may however call luaL_newstate() from the FFI to start up an
_independent_ Lua state and play with it. Works fine.]

Converting a Lua function into a callback for a C function is a
different issue (the canonical example would be qsort()). This is
quite tricky and has some inherent issues, e.g. dynamically
generated trampolines cannot be deallocated. So I've postponed
this for now.
Post by Florian Weimer
Do you plan to make popular POSIX types (off_t, time_t, struct
timeval, struct timespec), constants (those for errno, for instance)
and errno itself available without meddling with header files? It's
great if you can get this data from header files, but reading them has
downsides (performance, availability at run time, compatibility issues
caused by new GCC extensions).
Once I add an internal pre-processor to the C parser, you could
just ffi.cinclude("sys/types.h") (don't try this, NYI). Though
that doesn't really help for deployment, as not every installation
provides header files.

One possibility would be to convert headers into Lua files while
LuaJIT is built, like h2ph does for Perl. But that has its own
share of problems. And should it really convert all headers from
the developer's system? I don't think so.

Another option would be to add a ffi.osdef convenience module,
which has all of the generic POSIX types, or the LPCTSTR mess for
Windows. The latter wouldn't be too bad, because the Windows types
are cast in stone. But the former requires tricky pre-processing
to get the needed OS-specific types without including lots of
extra cruft.

And errno is an entirely different story: all modern libc's are
multithreaded and provide their own (non-standard) mechanism for
getting errno. The only real spec is the errno macro, which
contains C code of course. Figuring this out for all OS and libc
combinations is going to be difficult. And even if you do, it's
probably not safe to simulate this from Lua at a higher level.
Some memory allocation or a debug hook could get inbetween and
clear the errno value.

So I've thought about adding an __errno attribute. If you declare
a C function with this attribute, you get the errno as an extra
return value (local ok, errno = ffi.C.mkdir(...)). This is fetched
by low-level code immediately after the function returns. Alas,
this means I have to figure out all of the variants for getting
the errno value myself and teach them to the JIT compiler. :-/

--Mike
CrazyButcher
2011-01-22 21:05:51 UTC
Permalink
Post by Mike Pall
Right now the call chain 'FFI -> C -> lua_*()' with the same
lua_State is a big no-no.
so for this use-case old-styled lua bindings would still be the way to go?
lua -> classic lua binding- > C -> lua function
as evoking a classic lua bound function enforces the interpreter to
set up the proper environment?
Mike Pall
2011-01-22 21:09:41 UTC
Permalink
Post by CrazyButcher
Post by Mike Pall
Right now the call chain 'FFI -> C -> lua_*()' with the same
lua_State is a big no-no.
so for this use-case old-styled lua bindings would still be the way to go?
lua -> classic lua binding- > C -> lua function
as evoking a classic lua bound function enforces the interpreter to
set up the proper environment?
Well, maybe. Or selectively turn off JIT compilation for the Lua
function that performs the FFI call (jit.off(func)).

--Mike
Florian Weimer
2011-01-22 21:37:56 UTC
Permalink
Post by Mike Pall
Post by Florian Weimer
Is there a way to mark af FFI calls which may trigger callbacks (via C
extensions)? What's the general story about callbacks?
Right now the call chain 'FFI -> C -> lua_*()' with the same
lua_State is a big no-no. I'm planning to add an extra GCC-like
attribute that could be used to mark C functions which do that.
So you can still call these functions from Lua, but it's only done
from the interpreter (e.g. for the entry into a GUI main loop).
Would the jit.off(func) approach work right now?

Could you bail out the interpreter only if a callback actually
happens, or would that overhead apply in all cases?
Post by Mike Pall
[You may however call luaL_newstate() from the FFI to start up an
_independent_ Lua state and play with it. Works fine.]
Interesting idea. However, for things which are as intertwined as
SQLite or Curl, this wouldn't work. 8-(
Post by Mike Pall
Converting a Lua function into a callback for a C function is a
different issue (the canonical example would be qsort()).
There is qsort_r now (with non-portable argument ordering, however).
Post by Mike Pall
This is quite tricky and has some inherent issues, e.g. dynamically
generated trampolines cannot be deallocated. So I've postponed this
for now.
The usual void * parameter makes the per-closure trampoline
superfluous. Would it be possible to put the auxiliary data into some
special cdata object, and require that the programmer keeps the object
live while the callback can be called?
Post by Mike Pall
Another option would be to add a ffi.osdef convenience module,
which has all of the generic POSIX types, or the LPCTSTR mess for
Windows. The latter wouldn't be too bad, because the Windows types
are cast in stone. But the former requires tricky pre-processing
to get the needed OS-specific types without including lots of
extra cruft.
I have done this to generate Ada bindings. Currently, if you know
that something is an integral type, you only need to check the value
of sizeof and the signedness of the type. This gives you enough
information to reconstruct an equivalent typedef. Errno constants can
be extracted using brittle GCC hacks such as "gcc -E -dN -x c
/usr/include/errno.h | grep '^#define E'", and generating a C program
from that which prints their values.
Post by Mike Pall
So I've thought about adding an __errno attribute. If you declare
a C function with this attribute, you get the errno as an extra
return value (local ok, errno = ffi.C.mkdir(...)). This is fetched
by low-level code immediately after the function returns. Alas,
this means I have to figure out all of the variants for getting
the errno value myself and teach them to the JIT compiler. :-/
For my Ada stuff, I called a get_errno function like this one:

#include <errno.h>
int
get_errno(void)
{
return errno;
}

(Apparently, Oracle still ships code with "extern int errno;" in it,
which makes things much easier, but puts the affected software firmly
into the past.)
Mike Pall
2011-01-23 12:12:29 UTC
Permalink
Post by Florian Weimer
Post by Mike Pall
Right now the call chain 'FFI -> C -> lua_*()' with the same
lua_State is a big no-no. I'm planning to add an extra GCC-like
attribute that could be used to mark C functions which do that.
So you can still call these functions from Lua, but it's only done
from the interpreter (e.g. for the entry into a GUI main loop).
Would the jit.off(func) approach work right now?
Yes.
Post by Florian Weimer
Could you bail out the interpreter only if a callback actually
happens, or would that overhead apply in all cases?
Nope, that's too late. The full state cannot be easily restored
without, well, always restoring it before calling into C. This is
too expensive and pointless most of the time.
Post by Florian Weimer
Post by Mike Pall
This is quite tricky and has some inherent issues, e.g. dynamically
generated trampolines cannot be deallocated. So I've postponed this
for now.
The usual void * parameter makes the per-closure trampoline
superfluous.
There's no way the FFI could infer that.
Post by Florian Weimer
Would it be possible to put the auxiliary data into some
special cdata object, and require that the programmer keeps the object
live while the callback can be called?
That would imply explicit callback management:

local cb = ffi.callback(function() ... end) -- hypothetical
saved[cb] = true
somegui.onclick(cb)

I find that rather tedious and I don't want to inflict this on
developers. I'm sure the first thing they'll try, is this:

somegui.onclick(ffi.callback(function() ... end)) -- WRONG!

;-)
Post by Florian Weimer
Post by Mike Pall
Another option would be to add a ffi.osdef convenience module,
which has all of the generic POSIX types, or the LPCTSTR mess for
Windows. The latter wouldn't be too bad, because the Windows types
are cast in stone. But the former requires tricky pre-processing
to get the needed OS-specific types without including lots of
extra cruft.
I have done this to generate Ada bindings. Currently, if you know
that something is an integral type, you only need to check the value
of sizeof and the signedness of the type. This gives you enough
information to reconstruct an equivalent typedef.
Gets tricky for structs. Not that POSIX defines that many.

And it's going to mess up C++ name mangling, too. Because 'long'
is considered a separate type from 'int', even when they are the
same size. AFAIR there are inconsistencies wrt. POSIX types and
'long', depending on when the OS/libc got a 64 bit variant or how
they deal with Y2K38 issues.
Post by Florian Weimer
#include <errno.h>
int
get_errno(void)
{
return errno;
}
Good idea. But the errno value needs to be fetched every time,
just in case it's checked later on. That would slow down things
quite a bit. One could define __errno_if_minus_1 or such, to fetch
it on demand. Hmm.
Post by Florian Weimer
(Apparently, Oracle still ships code with "extern int errno;" in it,
which makes things much easier, but puts the affected software firmly
into the past.)
Someone thought about that. It turns into a dummy declaration. :-)

Here's the result for GLIBC after pre-processing:

extern int (*__errno_location ()); // was: extern int errno;
...
printf("%d\n", (*__errno_location ())); // was: printf("%d\n", errno);

--Mike
Florian Weimer
2011-01-23 20:26:21 UTC
Permalink
Post by Mike Pall
The FFI treats userdata like a 'void *' pointing to the payload.
So you can assign or cast it to a pointer to a struct and then
ffi.cdef[[
typedef struct { int x; } foo_t;
]]
local tostruct = ffi.typeof("foo_t *")
local function inc_x(ud)
local s = tostruct(ud)
s.x = s.x + 1
end
However, it seems that there is no way to create a userdata object
which is not zero-sized. Am I missing something?
Mike Pall
2011-01-23 20:50:51 UTC
Permalink
Post by Florian Weimer
However, it seems that there is no way to create a userdata object
which is not zero-sized. Am I missing something?
Sure, I could expose a function to create arbitrary userdata
objects from Lua code. Previously the userdata payload was
inaccessible to Lua code, so this was a bit pointless. The initial
motivation for the automatic userdata -> void * conversion was to
support existing code.

I realize userdata is the only way to integrate cdata into the
current GC. However I'm not so sure that mixing userdata and cdata
is the way to go forward. But __gc for cdata will have to wait for
LuaJIT 2.1. And user-defined GC object traversal may be difficult
to implement efficiently. Hmm ...

--Mike
Alex Bradbury
2011-01-24 13:52:40 UTC
Permalink
Post by Mike Pall
I hope the docs clear up many of your questions. If not, then
please ask here. Feedback welcome!
Just to check I haven't missed anything - am I correct to think that
if I create a uint8_t by e.g. ui8 = ffi.new("uint8_t") then there's no
way of assigning to it directly from Lua. Instead I want to do ui8 =
ffi.new("uint8_t[1]") and then do ui8[0] = whatever?

Alex
Mike Pall
2011-01-24 15:10:26 UTC
Permalink
Post by Alex Bradbury
Just to check I haven't missed anything - am I correct to think that
if I create a uint8_t by e.g. ui8 = ffi.new("uint8_t") then there's no
way of assigning to it directly from Lua. Instead I want to do ui8 =
ffi.new("uint8_t[1]") and then do ui8[0] = whatever?
Scalar cdata is immutable after creation. You'll rarely need them,
except to force a specific type for vararg arguments.

Either use ffi.new("uint8_t", x) for one-time initialization or
use a one-element array, if you need a mutable type.

The latter is also the preferred way to convert C code which uses
'&' to simulate multiple return values:

C code:
int buflen = n;
int status = fillbuf(buf, &buflen);
if (status == 0) return -1;
return buflen;

LuaJIT FFI code:
local buflen = ffi.new("int[1]", n)
local status = fillbuf(buf, buflen)
if status == 0 then return -1 end
return buflen[0]

--Mike
CrazyButcher
2011-01-15 21:54:19 UTC
Permalink
This post might be inappropriate. Click to display it.
Mike Pall
2011-01-15 22:54:15 UTC
Permalink
Post by CrazyButcher
looking at the source I also noticed that all individual enums become global
ffi.cdef "enum Type {TA,TB,TC};"
will mean there is
ffi.C.TA (returning 0)
ffi.C.TB (returning 1) and so on
You can create scoped enums (actually a C++ feature). They are
still global in C mode (they have to), but would be local in a
future C++ mode. Scoped static const values are indeed local:

ffi.cdef[[
struct foo {
enum { NO, YES, MAYBE };
static const int MAGIC = 42;
int a, b; // Or whatever members this struct needs.
};
]]

local s = ffi.new("struct foo")
print(s.YES) --> 1
print(s.MAGIC) --> 42

print(ffi.C.YES) --> 1
print(ffi.C.MAGIC) --> error, MAGIC is not a global symbol

s.a = s.MAYBE
if s.b == s.YES then ... end
Post by CrazyButcher
this is certainly C style as it just becomes an int. But for
type-safety it would be kind of cool if one cannot pass a number to a
function "someFunc(Type)" but has to use the real enum. Although, well
there is also cases where I hate c++ for not taking an int ;)
Well, I don't want to copy _all_ of the joys of C++ in Lua. :-)
So LuaJIT simply allows all of these:

ffi.cdef[[
typedef enum { RED, GREEN, BLUE } PrimaryColor;
int bar(PrimaryColor x);
]]

ffi.C.bar(2)
ffi.C.bar(ffi.C.BLUE)
ffi.C.bar("BLUE") -- Strings are (type-safe!) converted to enums.
Post by CrazyButcher
I also was able to create an cdata object of "Type", but I guess the
main use case for that is if a pointer to Type was required as
function argument? Otherwise using numbers would be fine or? I didn't
try with a enum function argument yet, but I'd expect it to not take
an int*, but insist on Type*
Scalar cdata objects are treated like their contents, but are only
useful in certain circumstances (e.g varargs). The type conversion
rules are the same as for C, with small changes to allow better
interoperability with Lua types. So enums are treated like their
base type and are fully interchangeable with integers. The rules
for pointer compatibility are mostly the same as for C, too.
Post by CrazyButcher
in a c++ mode one could think of using
ffi.CPP.Type.TA
Not sure about the namespace issues, yet. But see above: you can
use an instance of a type right now.

BTW: Another C++ feature the FFI already supports are references
('int &x'), which are quite useful for struct fields.
Post by CrazyButcher
design wise a ffi.CPP and ffi.cppdef with namespaces and such would
probably be the cleaner way to go about and rather keep the C mode
like C works.
Dang! You discovered my secret plan ...
Post by CrazyButcher
Regarding the callbacks, the only way to get this for now is having a
regular C function, which then calls the lua runtime to push
arguments, and retrieves returns afterwards?
Yes, that would work (more or less) as an interim solution.
Post by CrazyButcher
But how would I be able to convert the cdata args and returns of a
function safely to what the C function must deal with.
For that to work, you would need to expose your cdata conversion
handling via LuaJIT's C Api somehow or?
Well, you'd need to convert to plain Lua objects then. There's no
support for creating/accessing cdata objects in the old Lua/C API.
This will not be needed once I implement automatic callback
generation.
Post by CrazyButcher
probably has been answered before but luajit2 is thread-safe, or? A
brief look at luaL_newstate shows per LuaState thread-safe allocators.
Yes. But due to the separate allocators you should try to pool or
recycle states (they are not bound to any specific thread).

--Mike
CrazyButcher
2011-01-15 23:23:20 UTC
Permalink
Post by Mike Pall
Well, you'd need to convert to plain Lua objects then. There's no
support for creating/accessing cdata objects in the old Lua/C API.
This will not be needed once I implement automatic callback
generation.
independently of you fixing this, I also saw that
ffi.cast("Type*", luaUserData / luaLightUserData)
works, in fact ffi.cast also automatically casts strings to enums or
const char, as well as numbers and so on... nice!

That way one can pass pointers as light user data into Lua from C, and
have the lua function do the proper conversion.
As for returns I saw that "lua_topointer" works well with cdata. Which
means communication works and no need to expose native cdata api.
Cool!
Leonardo Palozzi
2011-01-26 07:02:59 UTC
Permalink
Post by Mike Pall
Well, I don't want to copy _all_ of the joys of C++ in Lua. :-)
ffi.cdef[[
typedef enum { RED, GREEN, BLUE } PrimaryColor;
int bar(PrimaryColor x);
]]
ffi.C.bar(2)
ffi.C.bar(ffi.C.BLUE)
ffi.C.bar("BLUE") -- Strings are (type-safe!) converted to enums.
Are enum function return types supported?

I get "attempt to compare 'enum 95' with 'double'" error message when i
try something like this:

ffi.cdef[[PrimaryColor foo();]]
if foo() == ffi.C.BLUE then ...

I would like to avoid using tonumber():

if tonumber(foo()) == ffi.C.BLUE then ...

Thanks.
Mike Pall
2011-01-26 13:23:47 UTC
Permalink
Post by Leonardo Palozzi
Are enum function return types supported?
I get "attempt to compare 'enum 95' with 'double'" error message when i
Sorry, that was an oversight. Fixed in git HEAD.
Post by Leonardo Palozzi
if foo() == ffi.C.BLUE then ...
This should work now. Enum return values are treated like their
underlying type (uint32_t or int32_t), which is converted to a Lua
number as usual.

--Mike

Loading...