Tiny Core Linux
Tiny Core Extensions => TCE Bugs => Topic started by: jazzbiker on September 15, 2022, 11:17:11 AM
-
Greetings!
I'm currently at TC13 x86. For lua.tcz it appears:
tc@box:~$ time lua -e 'for i=1,1000000 do end'
real 0m 1.22s
user 0m 1.21s
sys 0m 0.00s
while using separately built lua 5.4.4 shows:
tc@box:~$ time /home/tc/lua/bin/lua -e 'for i=1,1000000 do end'
real 0m 0.07s
user 0m 0.03s
sys 0m 0.00s
In TC13 x86_64 lua-5.3.tcz shows faster time for the same script (smth around 0.05s at CPU clocked with almost the same speed)
Current lua.tcz is inherited from TC12 x86 and works in the same way (slowly). In TC10 x86 lua.tcz is 5.3.5 and is slow too.
The most surprising is that I've built Lua on TC13 x86 following http://tinycorelinux.net/12.x/x86/tcz/src/lua/compile_lua recipe and it works normally (fast).
What's wrong with x86 lua.tcz?
Regards!
-
Can it be compiled with out and with in thread support ??
SMP :)
-
Can it be compiled with out and with in thread support ??
SMP :)
Lua is single-thread by design :)
-
just for self-information(and general interest):
https://www.lua.org/about.html
-
@gadget42
Thx for the link.
Don't know anything about lua. One only thing i know that some embedded systems uses lua.
I think about openwrt
But maybe there are some differences between the compilations.
Like using -o3 and maybe using clang... and maybe using more sophisticated options so gcc compile the code to use the simd instructions.
There are lots of things we don't see as (app) users/programmer, behind the curtain.
How our C, C++ code being compiled to asm instructions.
And how that asm instructions being treated in the processor, with cache alignment and so on.
:)
-
Hi jazzbiker
... Current lua.tcz is inherited from TC12 x86 and works in the same way (slowly). In TC10 x86 lua.tcz is 5.3.5 and is slow too.
The most surprising is that I've built Lua on TC13 x86 following http://tinycorelinux.net/12.x/x86/tcz/src/lua/compile_lua recipe and it works normally (fast). ...
Run ldd on the slower versions of lua:
tc@E310:~$ ldd /usr/local/bin/lua
linux-gate.so.1 (0xb7f08000)
libm.so.6 => /lib/libm.so.6 (0xb7e38000)
libdl.so.2 => /lib/libdl.so.2 (0xb7e33000)
libreadline.so.7 => /usr/local/lib/libreadline.so.7 (0xb7df9000)
libc.so.6 => /lib/libc.so.6 (0xb7ccc000)
/lib/ld-linux.so.2 (0xb7f09000)
libncursesw.so.6 => /usr/local/lib/libncursesw.so.6 (0xb7c87000)
tc@E310:~$
Then run ldd on the fast version you compiled:
ldd /home/tc/lua/bin/lua
See if your version shows any extra dependencies (liblua.so maybe?) that account for the speed increase.
-
Hi jazzbiker
I notice one other thing.
The compiler flags for http://tinycorelinux.net/12.x/x86/tcz/src/lua/compile_lua are:
CC= gcc -flto -march=i486 -mtune=i686 -Os -pipe -std=gnu99
The compiler flags for http://tinycorelinux.net/9.x/x86/tcz/src/lua/compile_lua (TC 10 version) are:
CC= gcc -mtune=generic -Os -pipe -std=gnu99
CFLAGS= -fPIC -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_COMPAT_5_1 $(SYSCFLAGS) $(MYCFLAGS)
I suspect the lack of a -march=i486 might mean it defaulted to the processor of the compiling machine.
I think -mtune=generic tries to avoid code that would be slow on some AMD CPUs and avoid other
code that would be slow on some Intel CPUs, producing code that is not optimized for any CPU.
The DLUA_COMPAT_5 variables allow for backward compatibility with older versions which may or may not
impact performance.
Or maybe the compiler version you used generated faster code.
-
Hi Rich,
ldd was the first to check and dependencies are the same. It was not surprise because lua binary is linked with liblua.a:
gcc -flto -march=i486 -mtune=i686 -Os -pipe -std=gnu99 -o lua lua.o liblua.a -lm -Wl,-E -ldl -lreadline -lncursesw
Fast incarnation of lua-5.3.6 in TC13 x86 was compiled literally following recipe, using the same -march=i486 and -mtune=i686.
But!!!
I've made the thing I was to do before - compiled the same lua source using the same recipe, but under TC12 x86. And this lua binary is slowpoke! The only difference is gcc version.
I've compiled lua extensions for my needs in TC13 x86 using -no-unwind-tables and -fno-asynchronous-unwind-tables as You proposed earlier for pure C code, binary sizes decreased significantly.
Should we dig further? It may be useful, but I have no ideas in what direction. It may be useful for another extensions packing too.
Thanks!
-
But maybe there are some differences between the compilations.
Like using -o3 and maybe using clang... and maybe using more sophisticated options so gcc compile the code to use the simd instructions.
There are lots of things we don't see as (app) users/programmer, behind the curtain.
How our C, C++ code being compiled to asm instructions.
And how that asm instructions being treated in the processor, with cache alignment and so on.
:)
Well, well, well... See previous post :)
-
Hi jazzbiker
... The only difference is gcc version. ...
And older versions of dependencies.
... I've made the thing I was to do before - compiled the same lua source using the same recipe, but under TC12 x86. And this lua binary is slowpoke! ...
If you copy that to TC13 does it run faster?
-
And older versions of dependencies.
But the only non-system dependence is readline and it undoubtedly is not engaged in the test chunk execution - empty cycle.
I've compiled vanilla lua-5.3.6 in TC12 x86 without any patches and editing - simply
make linux
and the result is the same slow.
-
I've built lua-5.3.6 in TC12 x86 with tcc. The changes made in src/Makefile:
CC= tcc -std=gnu99
linux:
$(MAKE) $(ALL) SYSCFLAGS="-DLUA_USE_LINUX" SYSLIBS="-ldl -lreadline"
And
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real 0m 0.12s
user 0m 0.11s
sys 0m 0.00s
tcc produced the fast code )
-
Looks like something in Lua sources makes gcc 10 and earlier insane, while gcc 11 keeps conscious )
-
TC12 x86 lua-5.3.6 gcc, no optimization.
src/Makefile
CFLAGS= -Wall -Wextra -DLUA_COMPAT_5_2 $(SYSCFLAGS) $(MYCFLAGS)
Testing:
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real 0m 0.12s
user 0m 0.11s
sys 0m 0.00s
fast, still slower than gcc 11 in TC13 x86 with -Os.
-
TC12 x86 gcc 10.2 lua-5.3.6
tcc
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r-- 1 tc staff 359332 Sep 16 22:00 liblua.a
-rwxr-xr-x 1 tc staff 243008 Sep 16 22:00 lua
-rwxr-xr-x 1 tc staff 170488 Sep 16 22:00 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real 0m 0.14s
user 0m 0.13s
sys 0m 0.00s
gcc 10 no optimizations
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r-- 1 tc staff 342388 Sep 16 22:03 liblua.a
-rwxr-xr-x 1 tc staff 270888 Sep 16 22:03 lua
-rwxr-xr-x 1 tc staff 186408 Sep 16 22:03 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real 0m 0.12s
user 0m 0.11s
sys 0m 0.00s
gcc 10 -O2
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r-- 1 tc staff 359420 Sep 16 22:06 liblua.a
-rwxr-xr-x 1 tc staff 284032 Sep 16 22:06 lua
-rwxr-xr-x 1 tc staff 181220 Sep 16 22:06 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real 0m 1.20s
user 0m 1.19s
sys 0m 0.00s
gcc 10 -Os
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r-- 1 tc staff 269664 Sep 16 22:09 liblua.a
-rwxr-xr-x 1 tc staff 200160 Sep 16 22:09 lua
-rwxr-xr-x 1 tc staff 129348 Sep 16 22:09 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real 0m 1.20s
user 0m 1.20s
sys 0m 0.00s
-
@Rich
Hello, isn't the -Os option in the gcc line for small size ?
Have you tried to use -Ofast ??
-
gcc 10 -O3
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r-- 1 tc staff 401796 Sep 16 22:14 liblua.a
-rwxr-xr-x 1 tc staff 319660 Sep 16 22:14 lua
-rwxr-xr-x 1 tc staff 209232 Sep 16 22:14 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real 0m 1.18s
user 0m 1.18s
sys 0m 0.00s
-
gcc 10 -Ofast
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r-- 1 tc staff 401020 Sep 16 22:18 liblua.a
-rwxr-xr-x 1 tc staff 319112 Sep 16 22:18 lua
-rwxr-xr-x 1 tc staff 209196 Sep 16 22:18 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real 0m 1.20s
user 0m 1.19s
sys 0m 0.00s
-
@jazzbiker
*lol*
You have to insert all your values into some type of spreadsheet, to get some grip of all variants.
Have you tried to use clang ??
I don't know if someone have make an extension of clang.
But i have seen some youtuber struggle with compiling clang.
Making LLVM and so on.
Make his own Linux dist...https://t2sde.org/ for many processors architecture.
Like it alot.
Here's the links:
https://www.youtube.com/user/renerebe/videos
https://www.youtube.com/c/MoreReneRebe/videos
-
@patrikg
And the title of the spreadsheet will be "Don't use gcc 10 optimizations" :) :) :)
-
Hi jazzbiker
TC12 x86 lua-5.3.6 gcc, no optimization.
src/Makefile
CFLAGS= -Wall -Wextra -DLUA_COMPAT_5_2 $(SYSCFLAGS) $(MYCFLAGS)
...
fast, still slower than gcc 11 in TC13 x86 with -Os.
Since you show -Os and -O2 slow down the code, it makes me wonder what values SYSCFLAGS
and MYCFLAGS are set to.
-
Since you show -Os and -O2 slow down the code, it makes me wonder what values SYSCFLAGS
and MYCFLAGS are set to.
@Rich
Here are lines copied from the build output:
gcc -std=gnu99 -O2 -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_USE_LINUX -c -o lbaselib.o lbaselib.c
gcc -std=gnu99 -o lua lua.o liblua.a -lm -Wl,-E -ldl -lreadline
just from vanilla make linux
-
Have you tried to use clang ??
No, gcc and tcc (like it). Planning to try https://github.com/michaelforney/cproc, frontend for https://c9x.me/compile/
I don't know if someone have make an extension of clang.
clang.tcz I see in repo. Isn't it?
Make his own Linux dist...https://t2sde.org/ for many processors architecture.
Thanks for the link! One more link in reply - https://github.com/oasislinux/oasis.
-
@jazzbiker sorry but am using arch for my desktop.
Not using tc for my desktop, so i don't know so much about x86_64 and i386.
But have in the past used tc for my rpi.
-
TC12 x86 gcc 10 lua-5.3.6
Applying -fno-unwind-tables and -fno-asynchronous-unwind-tables reverts the strange increase of binary size if -O2 used (see above)
gcc -O2
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r-- 1 tc staff 359420 Sep 17 01:27 liblua.a
-rwxr-xr-x 1 tc staff 288128 Sep 17 01:27 lua
-rwxr-xr-x 1 tc staff 189412 Sep 17 01:27 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real 0m 1.19s
user 0m 1.16s
sys 0m 0.01s
gcc -O2 no unwind tables
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r-- 1 tc staff 280756 Sep 17 01:30 liblua.a
-rwxr-xr-x 1 tc staff 210304 Sep 17 01:30 lua
-rwxr-xr-x 1 tc staff 144356 Sep 17 01:30 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real 0m 1.17s
user 0m 1.16s
sys 0m 0.00s
gcc -O0 no unwind tables
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r-- 1 tc staff 302524 Sep 17 01:31 liblua.a
-rwxr-xr-x 1 tc staff 238120 Sep 17 01:31 lua
-rwxr-xr-x 1 tc staff 170024 Sep 17 01:31 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real 0m 0.11s
user 0m 0.10s
sys 0m 0.00s
tcc
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r-- 1 tc staff 359332 Sep 17 01:34 liblua.a
-rwxr-xr-x 1 tc staff 243008 Sep 17 01:34 lua
-rwxr-xr-x 1 tc staff 170488 Sep 17 01:34 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real 0m 0.11s
user 0m 0.11s
sys 0m 0.00s
-
TC12 x86 gcc 10 lua-5.4.4
vanilla "make linux-readline" with -O2
tc@box:/tmp/lua-5.4.4$ time src/lua -e 'for i=1,1000000 do end'
real 0m 0.05s
user 0m 0.05s
sys 0m 0.00s
tc@box:/tmp/lua-5.4.4$ ls -l src/liblua.a src/lua src/luac
-rw-r--r-- 1 tc staff 406946 Sep 17 10:22 src/liblua.a
-rwxr-xr-x 1 tc staff 331708 Sep 17 10:22 src/lua
-rwxr-xr-x 1 tc staff 228748 Sep 17 10:22 src/luac
MYCFLAGS= -fno-unwind-tables -fno-asynchronous-unwind-tables
tc@box:/tmp/lua-5.4.4$ time src/lua -e 'for i=1,1000000 do end'
real 0m 0.07s
user 0m 0.06s
sys 0m 0.00s
tc@box:/tmp/lua-5.4.4$ ls -l src/liblua.a src/lua src/luac
-rw-r--r-- 1 tc staff 319930 Sep 17 10:29 src/liblua.a
-rwxr-xr-x 1 tc staff 245692 Sep 17 10:29 src/lua
-rwxr-xr-x 1 tc staff 179596 Sep 17 10:29 src/luac
CFLAGS= -Os -Wall -Wextra -DLUA_COMPAT_5_3 $(SYSCFLAGS) $(MYCFLAGS)
MYCFLAGS= -fno-unwind-tables -fno-asynchronous-unwind-tables
tc@box:/tmp/lua-5.4.4$ time src/lua -e 'for i=1,1000000 do end'
real 0m 0.07s
user 0m 0.07s
sys 0m 0.00s
tc@box:/tmp/lua-5.4.4$ ls -l src/liblua.a src/lua src/luac
-rw-r--r-- 1 tc staff 265170 Sep 17 10:31 src/liblua.a
-rwxr-xr-x 1 tc staff 203164 Sep 17 10:31 src/lua
-rwxr-xr-x 1 tc staff 144480 Sep 17 10:31 src/luac
As we can see everything is all right in Lua 5.4.
tc@box:/tmp/lua-5.4.4$ src/luac -l -
for i=1,1000000 do end
main <stdin:0,0> (7 instructions at 0x9510490)
0+ params, 4 slots, 1 upvalue, 4 locals, 1 constant, 0 functions
1 [1] VARARGPREP 0
2 [1] LOADI 0 1
3 [1] LOADK 1 0 ; 1000000
4 [1] LOADI 2 1
5 [1] FORPREP 0 0 ; exit to 7
6 [1] FORLOOP 0 1 ; to 6
7 [1] RETURN 0 1 1 ; 0 out
In Lua 5.3 implementation of FORPREP or FORLOOP bytecodes includes some stuff making gcc 10 optimizations insane. Maybe this can be explored and allow to find what exactly cause the trouble. But gcc 11 works better, at least concerning the described circumstances.
@Rich, what's Your opinion, should we dive deeper? Will it be useful?
-
Hi jazzbiker
I played around a bit with this under TC10 x86, gcc 8.2, lua 5.3.6. I created a build script from:
http://tinycorelinux.net/12.x/x86/tcz/src/lua/compile_lua
I attached a copy to this post. The script has comments added.
After it builds lua, it shows its size before and after running sstrip lua. Then it runs your benchmark:
-rwxr-xr-x 1 tc staff 213056 Sep 17 10:33 lua
-rwxr-xr-x 1 tc staff 180252 Sep 17 10:33 lua
real 0m 0.02s
user 0m 0.01s
sys 0m 0.00s
tc@E310:~/BuildLua$
This result was with optimization set to -O1. Settings of -O0 and -Og show similar results.
To figure out whats happening would probably require figuring out which optimizations -O2, -O3, and -Os share
that are not present in -O0, -O1, and -Og. Then try disabling them one at a time to see what changes.
Personally, I have doubts its worth the effort.
-
Hi Rich,
Thank You for the handy script :)
To figure out whats happening would probably require figuring out which optimizations -O2, -O3, and -Os share that are not present in -O0, -O1, and -Og. Then try disabling them one at a time to see what changes.
Ok, this will work nice. If the trouble is caused by one of optimization methods. Not by some combination of them :) . I see 40+ optimization various options being turned on after -O1 :( .
Personally, I have doubts its worth the effort.
Probably, yes. The problem is not in Lua code. It is written in absolutely plain C. Lua even has build target "c89". No pragmas, __attribute__ and another stuff alike. It means that absolutely plain C code can make gcc crazy, and we don't know in what way :( . Not very comfortable to know such things, especially encountering that nearly all extensions in TinyCore are threatened.
Anyway, I want to ask You to mark the topic as solved.
-
Hi jazzbiker
Thank You for the handy script :) ...
You are quite welcome. Just be aware since I was only looking at building the interpreter I
placed an exit command before it builds lua.so.
... Anyway, I want to ask You to mark the topic as solved.
Done. :)