Egor Skriptunoff
2018-10-06 21:44:37 UTC
Hi!
I wrote a module with SHA2 functions in pure Lua.
The code tries to use specific benefits of your system to make calculations
as fast as possible.
For every Lua version the module contains separate implementation branch
optimized for this version.
The project is here:
https://github.com/Egor-Skriptunoff/pure_lua_SHA2
Bitwise calculations in Lua have painful history of improvements,
every such improvement brings new incompatibility between Lua versions :-)
This is the timeline of SHA256 performance improvements from Lua 5.1 to Lua
5.4:
*Lua 5.1 -> Lua 5.2*
Introducing of "bit32" library improved SHA256 performance in about 5
times.
*Lua 5.2 -> Lua 5.3*
Replacing "bit32" library with native bitwise operators improved SHA256
performance in about 3 times.
*Lua 5.3 -> Lua 5.4*
Optimization of Lua VM improved SHA256 performance in about 1.5 times.
Benchmark details are on this picture:
Loading Image...
The vertical axis is SHA2 calculation speed in bytes per second.
The scale is logarithmic: one pixel upwards means 1.009 times faster.
For example, 77 pixels upwards means 2 times faster (1.009^77 == 2)
Each item is connected to minimum and maximum speed values obtained across
different compilers Lua was built.
"Unreachable Ideal" corresponds to the speed of "sha256sum"/"sha512sum"
utilities, measured with the following commands:
time dd bs=1M count=1K if=/dev/zero | sha256sum -
time dd bs=1M count=1K if=/dev/zero | sha512sum -
Some interesting observations from "sha2.lua" benchmark:
1)
SHA512 on 64-bit systems is faster than SHA256.
So, replacing SHA256 with SHA512/256 in your projects would increase both
performance and safety.
2)
If we compare x86 Lua 5.4 having 32-bit integers (LUA_INT_TYPE=LUA_INT_INT)
with "normal" x86 Lua 5.4 having 64-bit integers:
- When calculating SHA256, the former is faster.
- When calculating SHA512, the latter is faster.
Explanation:
Implementing "int64" datatype on 32-bit architecture is costly,
and the profit of introducing "int64" is negative when application doesn't
need 64-bit integers
(application is compelled to use expensive "int64" datatype to work with
cheap 32-bit values).
3)
How SHA2 performance depends on the compiler Lua was built:
Performance was measured on Windows (Visual Studio and MinGW) and Debian.
- For x86 architecture, Visual Studio usually gives better result.
- For x64 architecture, gcc is usually better.
The difference might be very big.
Example: comparing MinGW vs. Visual Studio:
- Lua 5.1 x86: MinGW is 20% faster than Visual Studio;
- Lua 5.3 x86: Visual Studio is 3 times faster than MinGW.
4)
LuaJIT's black magic is very unpredictable, especially on x86.
For example, any minor change in the source .lua file (such as relocation
of function's definition) might change LuaJIT performance up to 4 times.
5)
How performance depends on CPU:
Vanilla Lua on Intel i7 CPU is twice faster as on Intel i5 CPU with the
same CPU frequency.
Probably, i7 was designed to greatly improve execution speed of typical
C-compiled code (or this is just because i7 L3 cache is twice larger?).
LuaJIT has the same performance on both i5 and i7 (LuaJIT generates
optimized machine code directly and doesn't get benefits from this i7
improvement).
6)
LuaJIT 2.1 is way better than LuaJIT 2.0:
- On 32-bit calculations LuaJIT 2.1 appears to be up to 1.5 times faster.
- On 64-bit calculations "LuaJIT 2.1 with FFI" is much faster than "LuaJIT
2.0 with FFI" due to significant improvement in "bit" library: in LuaJIT
2.1 bitwise functions can accept "int64_t" arguments, while in LuaJIT 2.0
only 32-bit numbers are allowed.
7)
How fast is Fengari:
Fengari in FireFox is 10-15 times slower than Lua 5.3 x86 built with
LUA_INT_TYPE=LUA_INT_INT
(Fengari implements "int32" integers instead of "int64")
8)
How fast is Ravi:
I downloaded "ravi-distro-0.4-windows-64bit-ravi".
I was interested in testing performance of SHA2 under Ravi JIT.
I didn't use any Ravi extensions, my intention was to use Ravi as
"JIT-powered Lua 5.3" for executing usual (non-Ravi) Lua scripts.
In interpreter mode Ravi runs 5-8% faster than Lua 5.3, but significantly
slower than Lua 5.4.
I tried to turn JIT on with "ravi.auto(true)", but Ravi crashed
with warning "*constant 4294967296 is so big it is long long*"
and error "*for initial value must be a number*"
(although there was no "for" loop at the line mentioned in the error
message).
The error could be reproduced by running
ravi.exe -e"ravi.auto(true)" sha2.lua
I'm waiting for next Ravi release to see how fast SHA2 is under Ravi JIT.
-- Egor
I wrote a module with SHA2 functions in pure Lua.
The code tries to use specific benefits of your system to make calculations
as fast as possible.
For every Lua version the module contains separate implementation branch
optimized for this version.
The project is here:
https://github.com/Egor-Skriptunoff/pure_lua_SHA2
Bitwise calculations in Lua have painful history of improvements,
every such improvement brings new incompatibility between Lua versions :-)
This is the timeline of SHA256 performance improvements from Lua 5.1 to Lua
5.4:
*Lua 5.1 -> Lua 5.2*
Introducing of "bit32" library improved SHA256 performance in about 5
times.
*Lua 5.2 -> Lua 5.3*
Replacing "bit32" library with native bitwise operators improved SHA256
performance in about 3 times.
*Lua 5.3 -> Lua 5.4*
Optimization of Lua VM improved SHA256 performance in about 1.5 times.
Benchmark details are on this picture:
Loading Image...
The vertical axis is SHA2 calculation speed in bytes per second.
The scale is logarithmic: one pixel upwards means 1.009 times faster.
For example, 77 pixels upwards means 2 times faster (1.009^77 == 2)
Each item is connected to minimum and maximum speed values obtained across
different compilers Lua was built.
"Unreachable Ideal" corresponds to the speed of "sha256sum"/"sha512sum"
utilities, measured with the following commands:
time dd bs=1M count=1K if=/dev/zero | sha256sum -
time dd bs=1M count=1K if=/dev/zero | sha512sum -
Some interesting observations from "sha2.lua" benchmark:
1)
SHA512 on 64-bit systems is faster than SHA256.
So, replacing SHA256 with SHA512/256 in your projects would increase both
performance and safety.
2)
If we compare x86 Lua 5.4 having 32-bit integers (LUA_INT_TYPE=LUA_INT_INT)
with "normal" x86 Lua 5.4 having 64-bit integers:
- When calculating SHA256, the former is faster.
- When calculating SHA512, the latter is faster.
Explanation:
Implementing "int64" datatype on 32-bit architecture is costly,
and the profit of introducing "int64" is negative when application doesn't
need 64-bit integers
(application is compelled to use expensive "int64" datatype to work with
cheap 32-bit values).
3)
How SHA2 performance depends on the compiler Lua was built:
Performance was measured on Windows (Visual Studio and MinGW) and Debian.
- For x86 architecture, Visual Studio usually gives better result.
- For x64 architecture, gcc is usually better.
The difference might be very big.
Example: comparing MinGW vs. Visual Studio:
- Lua 5.1 x86: MinGW is 20% faster than Visual Studio;
- Lua 5.3 x86: Visual Studio is 3 times faster than MinGW.
4)
LuaJIT's black magic is very unpredictable, especially on x86.
For example, any minor change in the source .lua file (such as relocation
of function's definition) might change LuaJIT performance up to 4 times.
5)
How performance depends on CPU:
Vanilla Lua on Intel i7 CPU is twice faster as on Intel i5 CPU with the
same CPU frequency.
Probably, i7 was designed to greatly improve execution speed of typical
C-compiled code (or this is just because i7 L3 cache is twice larger?).
LuaJIT has the same performance on both i5 and i7 (LuaJIT generates
optimized machine code directly and doesn't get benefits from this i7
improvement).
6)
LuaJIT 2.1 is way better than LuaJIT 2.0:
- On 32-bit calculations LuaJIT 2.1 appears to be up to 1.5 times faster.
- On 64-bit calculations "LuaJIT 2.1 with FFI" is much faster than "LuaJIT
2.0 with FFI" due to significant improvement in "bit" library: in LuaJIT
2.1 bitwise functions can accept "int64_t" arguments, while in LuaJIT 2.0
only 32-bit numbers are allowed.
7)
How fast is Fengari:
Fengari in FireFox is 10-15 times slower than Lua 5.3 x86 built with
LUA_INT_TYPE=LUA_INT_INT
(Fengari implements "int32" integers instead of "int64")
8)
How fast is Ravi:
I downloaded "ravi-distro-0.4-windows-64bit-ravi".
I was interested in testing performance of SHA2 under Ravi JIT.
I didn't use any Ravi extensions, my intention was to use Ravi as
"JIT-powered Lua 5.3" for executing usual (non-Ravi) Lua scripts.
In interpreter mode Ravi runs 5-8% faster than Lua 5.3, but significantly
slower than Lua 5.4.
I tried to turn JIT on with "ravi.auto(true)", but Ravi crashed
with warning "*constant 4294967296 is so big it is long long*"
and error "*for initial value must be a number*"
(although there was no "for" loop at the line mentioned in the error
message).
The error could be reproduced by running
ravi.exe -e"ravi.auto(true)" sha2.lua
I'm waiting for next Ravi release to see how fast SHA2 is under Ravi JIT.
-- Egor