Discussion:
Heap memory issues for Embedded Systems
g***@onyxengr.com
2005-07-03 01:00:47 UTC
Permalink
Hi everyone

I have recently incorporated lua into a cable modem settop box. The effort
was very successful in demonstrating the usefulness of lua as a test and
integration environment. Also, lua applications have the ability of being
downloaded from the cable head end.

The main problem with the lua environment was the dynamic memory management.
Dynamic memory for the lua environment was taken from a fixed private heap of
around 100k - 200k bytes. The private heap which was used was taken from the
back of the K&R C book. Each memory block had a 16 byte overhead.

Before I looked into redesigning the memory interface and possibly lua's
interface to dynamic memory, I wondered if there were any better ways all
ready out there or better ways incorporated into up comming version of lua.

The problems can be summarized as follows:

1. The garbage collector thresholds did not function correctly in a fixed
heap size. The script writers had to manually force garbage collection.

2. The utilization of the heap was around 50%. This implied that there were
a significantly small objects being allocated??? This is a problem with C++
and special heaps are designed to solve this problem.

3. For soft real time problems, the lua scripts were a successful solutions.
Being greedy, I would like to extend the lua scripts into more real time
areas such as 60Hz type applications. Web postings seem to indicate that the
game guys are possible interested in the same area. Are there any garbage
collectors now or currently planned that will allow lua to operate in that
area?

Glenn Edgar
David Given
2005-07-05 20:51:14 UTC
Permalink
On Sunday 03 July 2005 02:00, glenn-***@onyxengr.com wrote:
[...]
Post by g***@onyxengr.com
The private heap which was used
was taken from the back of the K&R C book. Each memory block had a 16 byte
overhead.
Unless those 16 bytes are mostly used for debugging, you're being ripped off,
I'm afraid. For an allocated block, the overhead should be exactly four bytes
--- the size of the block, and depending on your application you might not
need that (if the app keeps track of the size of blocks itself, for example).
I don't know the K&R algorithm but you can probably do better.

[...]
Post by g***@onyxengr.com
2. The utilization of the heap was around 50%. This implied that there
were a significantly small objects being allocated??? This is a problem
with C++ and special heaps are designed to solve this problem.
Having a better allocator might help here, as it definitely sounds like you
have a memory fragmentation problem. I'd suggest doing a brief search of the
literature, and then using BSD's libc allocator --- the license should be
acceptable and the algorithm should be good.

[...]
Post by g***@onyxengr.com
Are
there any garbage collectors now or currently planned that will allow lua
to operate in that area?
IIRC (and I possibly don't) you *cannot* run a hard realtime system with a
garbage collector. The problem is that there is no (short) upper limit on how
long it takes to do a garbage collection and since you can potentially run
out of memory at any point, your worst-case latencies are going to suck.

That said, there are tricks you can do: partial collecting, incremental
collecting, background collecting, all that kind of thing. However, they do
not *solve* the problem, only ameliorate it. If you have a fast enough
processor, you can get away with it... most of the time. If you require
*truly* hard real-time, with low latencies, I suspect you're doomed if you
use Lua.
--
"Curses! Foiled by the chilled dairy treats of righteousness!" --- Earthworm
Jim (evil)
Klaus Ripke
2005-07-05 21:49:48 UTC
Permalink
Post by David Given
IIRC (and I possibly don't) you *cannot* run a hard realtime system with a
garbage collector. The problem is that there is no (short) upper limit on how
long it takes to do a garbage collection and since you can potentially run
out of memory at any point, your worst-case latencies are going to suck.
hum, guess GC need not be that unpredictable.
As "real time" always is about requests/events,
you clearly must not leave garbage from one request
lying yround to spoil the next one.
But when doing a full GC "after" every request (i.e. at the
end of the request, as far as total timing is concerned),
you should find a perfectly predictable environment for
every event, including the time it takes to clean up, no?
Post by David Given
*truly* hard real-time, with low latencies, I suspect you're doomed if you
use Lua.
recently I saw "realtime for Java" mentioned.
This is a joke, since java can not really control
it's GC due to MT issues.
But I guess Lua can.


hand
Ben Sunshine-Hill
2005-07-05 22:07:42 UTC
Permalink
Post by Klaus Ripke
recently I saw "realtime for Java" mentioned.
This is a joke, since java can not really control
it's GC due to MT issues.
But I guess Lua can.
The hard-realtime approaches to Java I've seen do some very
interesting things to maintain usability in the face of full GC not
being possible. Basically, you provide the VM with certain invariants
regarding the lifetime and accessibility of an object, and create
objects within pools, references between which are subject to very
restrictive semantics.

I don't think these same approaches would work well with Lua (or
perhaps I just don't want to see Lua uglified with them ;-).

Ben
Luiz Henrique de Figueiredo
2005-07-06 00:42:51 UTC
Permalink
Post by David Given
For an allocated block, the overhead should be exactly four bytes
--- the size of the block, and depending on your application you might not
need that (if the app keeps track of the size of blocks itself, for example).
Lua does keep track of block sizes.
--lhf
Glenn Maynard
2005-07-08 00:37:00 UTC
Permalink
Post by Luiz Henrique de Figueiredo
Post by David Given
For an allocated block, the overhead should be exactly four bytes
--- the size of the block, and depending on your application you might not
need that (if the app keeps track of the size of blocks itself, for example).
Lua does keep track of block sizes.
I think most C++ allocators take advantage of that; it's one reason C++
allocators can be much more efficient than C malloc/free. They don't
support realloc, though.

I wonder if it'd be a win, for allocation speed and memory fragmentation,
if Lua could be convinced to hook into C++-style allocation through callbacks.
I havn't profiled memory use or allocation counts recently in my code to see
if it's relevant, and I don't know enough about Lua to know if it makes a lot
of small allocations.
--
Glenn Maynard
Luiz Henrique de Figueiredo
2005-07-08 00:43:59 UTC
Permalink
Post by Glenn Maynard
I wonder if it'd be a win, for allocation speed and memory fragmentation,
if Lua could be convinced to hook into C++-style allocation through callbacks.
I'm not sure what you mean, but memory allocation in 5.1 is through callbacks.
In 5.0 you can compile Lua to use your own allocation routines, which use
the same interface (ie, get the old size).
--lhf
Mark Hamburg
2005-07-08 01:11:15 UTC
Permalink
Has anyone instrumented Lua to see what block sizes are common? Or is this
likely to vary from project to project? I'm thinking that a sub-allocator
for the common cases (or at least a free list) could be a significant win.

Mark
Post by Luiz Henrique de Figueiredo
Post by Glenn Maynard
I wonder if it'd be a win, for allocation speed and memory fragmentation,
if Lua could be convinced to hook into C++-style allocation through callbacks.
I'm not sure what you mean, but memory allocation in 5.1 is through callbacks.
In 5.0 you can compile Lua to use your own allocation routines, which use
the same interface (ie, get the old size).
--lhf
Luiz Henrique de Figueiredo
2005-07-08 10:25:49 UTC
Permalink
Post by Mark Hamburg
Has anyone instrumented Lua to see what block sizes are common? Or is this
likely to vary from project to project? I'm thinking that a sub-allocator
for the common cases (or at least a free list) could be a significant win.
I think it's likely to depend on the application because string and closure
structs are allocated with a varying tail. However, simple value structs are
likely to be the most frequent.

Here is the result of "lua /dev/null" for Lua 5.1w6. The first column is the
number of requests for a block of the size given in the second column. The
alloc function (to replace the one in lauxlib.c) is given at the end.
Running "lua life.lua" shows a different pattern, although the top 10 are
the same (in a slightly different order).

169 20
39 22
38 21
31 23
27 32
23 24
16 28
14 25
12 56
10 224
10 112
9 19
7 27
6 448
4 26
3 896
2 31
2 29
2 16
2 12
1 91
1 88
1 76
1 72
1 540
1 512
1 348
1 34
1 256
1 192
1 18
1 1792
1 17
1 128
1 1024

static void *l_alloc (void *ud, void *ptr, size_t osize, size_t nsize) {
static size_t total=0;
static int time=0;
void *optr=ptr;
(void)ud;
(void)osize;
++time;
if (nsize == 0) {
free(ptr);
total-=osize;
printf(">>> %p %p %8d - T=%d O=%d N=%d\n",ptr,optr,time,total,osize,nsize);
return NULL;
}
else {
total+=nsize-osize;
ptr=realloc(ptr, nsize);
printf(">>> %p %p %8d %c T=%d O=%d N=%d\n",ptr,optr,time,osize==0?'+':'=',total,osize,nsize);
return ptr;
}
}

--lhf
Adam D. Moss
2005-07-05 22:40:52 UTC
Permalink
Post by g***@onyxengr.com
1. The garbage collector thresholds did not function correctly in a fixed
heap size. The script writers had to manually force garbage collection.
2. The utilization of the heap was around 50%. This implied that there were
a significantly small objects being allocated??? This is a problem with C++
and special heaps are designed to solve this problem.
3. For soft real time problems, the lua scripts were a successful solutions.
Being greedy, I would like to extend the lua scripts into more real time
areas such as 60Hz type applications. Web postings seem to indicate that the
game guys are possible interested in the same area. Are there any garbage
collectors now or currently planned that will allow lua to operate in that
area?
You don't mention which version of Lua you investigated. Lua 5.1's
incremental GC has some very interesting tuneables.

Lua isn't going to be more than 'soft' real time, if that, without
a rewrite/redesign I believe (and a hard realtime platform, natch).
But I think it's realistically quite good enough for ~60Hz
applications on modern hardware, 5.0 with some care, and 5.1 with
some minor tuning. YMMV.

--adam
--
Adam D. Moss - ***@gimp.org
Alex Evans
2005-07-08 10:39:18 UTC
Permalink
That's interesting! Given that I'm more interested in speed than memory
tightness, I guess there would be mileage in running out and writing a
super simplistic free-list-allocator-thing of 32 byte blocks used for
any lua allocs between 20 and 32 bytes...
A

-----Original Message-----
From: lua-***@bazar2.conectiva.com.br
[mailto:lua-***@bazar2.conectiva.com.br] On Behalf Of Luiz Henrique
de Figueiredo
Sent: 08 July 2005 11:26
To: Lua list
Subject: Re: Heap memory issues for Embedded Systems
Post by Mark Hamburg
Has anyone instrumented Lua to see what block sizes are common? Or is
this
Post by Mark Hamburg
likely to vary from project to project? I'm thinking that a
sub-allocator
Post by Mark Hamburg
for the common cases (or at least a free list) could be a significant
win.

I think it's likely to depend on the application because string and
closure
structs are allocated with a varying tail. However, simple value structs
are
likely to be the most frequent.

Here is the result of "lua /dev/null" for Lua 5.1w6. The first column is
the
number of requests for a block of the size given in the second column.
The
alloc function (to replace the one in lauxlib.c) is given at the end.
Running "lua life.lua" shows a different pattern, although the top 10
are
the same (in a slightly different order).

169 20
39 22
38 21
31 23
27 32
23 24
16 28
14 25
12 56
10 224
10 112
9 19
7 27
6 448
4 26
3 896
2 31
2 29
2 16
2 12
1 91
1 88
1 76
1 72
1 540
1 512
1 348
1 34
1 256
1 192
1 18
1 1792
1 17
1 128
1 1024

static void *l_alloc (void *ud, void *ptr, size_t osize, size_t nsize) {
static size_t total=0;
static int time=0;
void *optr=ptr;
(void)ud;
(void)osize;
++time;
if (nsize == 0) {
free(ptr);
total-=osize;
printf(">>> %p %p %8d - T=%d O=%d
N=%d\n",ptr,optr,time,total,osize,nsize);
return NULL;
}
else {
total+=nsize-osize;
ptr=realloc(ptr, nsize);
printf(">>> %p %p %8d %c T=%d O=%d
N=%d\n",ptr,optr,time,osize==0?'+':'=',total,osize,nsize);
return ptr;
}
}

--lhf
Luiz Henrique de Figueiredo
2005-07-08 10:46:19 UTC
Permalink
I guess there would be mileage in running out and writing a super
simplistic free-list-allocator-thing of 32 byte blocks used for any
lua allocs between 20 and 32 bytes...
Perhaps, but you'd have to instrument your application. Don't take
the numbers I posted as valid for every application; they were for a
do-nothing run. Also, the numbers did not show the history of allocations.
I mean, there's no point in doing custom allocation for the initialization
phase of your application; you should concentrate on its main phase.
--lhf

Continue reading on narkive:
Loading...