Discussion:
Weird memory usage when using jemalloc
Carlos Abalde
2018-10-14 17:08:07 UTC
Permalink
Hi all,

I've found an apparently weird situation when using Lua + jemalloc. After a lot of debugging I'been able to reduce the problem to a toy C program uploaded to https://gist.github.com/carlosabalde/bd6cb2b17aa71fd4c53864037b89f36b <https://gist.github.com/carlosabalde/bd6cb2b17aa71fd4c53864037b89f36b> (I've tested this in CentOS 7 using Lua 5.1): 20k threads are created, all of them sharing a single Lua engine (obviously protecting it with a mutex), an executing lua_pcall() every second. That call executes a dummy function containing a local table.

Everything works as expected when using the system allocator (i.e. gcc test.c -lpthread -llua): contant resident memory consumtion (~ 170 MB). However, when using jemalloc (i.e. gcc test.c -lpthread -llua -ljemalloc) memory consumption is much higher (~ 3.6 GB) and it continues slowly increasing.

It seems that's related with the local table defined inside the function, but I don't know if that's the expected behavior or how to avoid it (not using jemalloc is not an option). Btw, this is also happening in Lua 5.3, but not in LuaJIT (I guess because LuaJIT uses its own allocator).

Any explanation for this behavior?

Thanks a lot! :)

--
Carlos Abalde
云风 Cloud Wu
2018-10-15 02:38:41 UTC
Permalink
Post by Carlos Abalde
Hi all,
I've found an apparently weird situation when using Lua + jemalloc. After
a lot of debugging I'been able to reduce the problem to a toy C program
uploaded to
https://gist.github.com/carlosabalde/bd6cb2b17aa71fd4c53864037b89f36b (I've
tested this in CentOS 7 using Lua 5.1): 20k threads are created, all of
them sharing a single Lua engine (obviously protecting it with a mutex), an
executing lua_pcall() every second. That call executes a dummy function
containing a local table.
Everything works as expected when using the system allocator (i.e. gcc
test.c -lpthread -llua): contant resident memory consumtion (~ 170 MB).
However, when using jemalloc (i.e. gcc test.c -lpthread -llua -ljemalloc)
memory consumption is much higher (~ 3.6 GB) and it continues slowly
increasing.
It seems that's related with the local table defined inside the function,
but I don't know if that's the expected behavior or how to avoid it (not
using jemalloc is not an option). Btw, this is also happening in Lua 5.3,
but not in LuaJIT (I guess because LuaJIT uses its own allocator).
Any explanation for this behavior?
Thanks a lot! :)
You may try to turn off the tcache of jemalloc. I guess tcache may use huge
memory since you create 20k threads.
Carlos Abalde
2018-10-15 13:43:44 UTC
Permalink
Post by Carlos Abalde
...
It seems that's related with the local table defined inside the function, but I don't know if that's the expected behavior or how to avoid it (not using jemalloc is not an option). Btw, this is also happening in Lua 5.3, but not in LuaJIT (I guess because LuaJIT uses its own allocator).
Any explanation for this behavior?
Thanks a lot! :)
You may try to turn off the tcache of jemalloc. I guess tcache may use huge memory since you create 20k threads.
Hi,

Turning off tcache is not an option (this code runs as a Varnish module -i.e. shared library-, which uses jemalloc), but using your hint I tested two approaches:

1. Using lua_newstate() int oder to provide a custom allocator implementation based on jemalloc's mallocx() and rallocx(), using a single specific arena (i.e. MALLOCX_ARENA()).

2. Calling 'thread.tcache.flush' after every script execution.

Both work in the toy C program, but only second one works when benchmarking the real implementation. I don't know why, but I guess that's because of jemalloc or Varnish Cache internals I still need to investigate.

Thank you very much!

Best,

--
Carlos Abalde

Loading...