Tiny Core Linux

Tiny Core Extensions => TCE Bugs => Topic started by: jazzbiker on September 15, 2022, 11:17:11 AM

Title: [Solved] Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 15, 2022, 11:17:11 AM
Greetings!

I'm currently at TC13 x86. For lua.tcz it appears:
Code: [Select]
tc@box:~$ time lua -e 'for i=1,1000000 do end'
real    0m 1.22s
user    0m 1.21s
sys     0m 0.00s
while using separately built lua 5.4.4 shows:
Code: [Select]
tc@box:~$ time /home/tc/lua/bin/lua  -e 'for i=1,1000000 do end'
real    0m 0.07s
user    0m 0.03s
sys     0m 0.00s
In TC13 x86_64 lua-5.3.tcz shows faster time for the same script (smth around 0.05s at CPU clocked with almost the same speed)

Current lua.tcz is inherited from TC12 x86 and works in the same way (slowly). In TC10 x86 lua.tcz is 5.3.5 and is slow too.

The most surprising is that I've built Lua on TC13 x86 following http://tinycorelinux.net/12.x/x86/tcz/src/lua/compile_lua recipe and it works normally (fast).

What's wrong with x86 lua.tcz?

Regards!
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: patrikg on September 15, 2022, 11:22:55 AM
Can it be compiled with out and with in thread support ??

SMP :)
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 15, 2022, 11:32:41 AM
Can it be compiled with out and with in thread support ??

SMP :)

Lua is single-thread by design :)
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: gadget42 on September 16, 2022, 01:30:27 AM
just for self-information(and general interest):
https://www.lua.org/about.html
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: patrikg on September 16, 2022, 11:06:21 AM
@gadget42

Thx for the link.
Don't know anything about lua. One only thing i know that some embedded systems uses lua.
I think about openwrt

But maybe there are some differences between the compilations.
Like using -o3 and maybe using clang... and maybe using more sophisticated options so gcc compile the code to use the simd instructions.

There are lots of things we don't see as (app) users/programmer, behind the curtain.
How our C, C++ code being compiled to asm instructions.
And how that asm instructions being treated in the processor, with cache alignment and so on.
:)
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: Rich on September 16, 2022, 11:53:39 AM
Hi jazzbiker
... Current lua.tcz is inherited from TC12 x86 and works in the same way (slowly). In TC10 x86 lua.tcz is 5.3.5 and is slow too.

The most surprising is that I've built Lua on TC13 x86 following http://tinycorelinux.net/12.x/x86/tcz/src/lua/compile_lua recipe and it works normally (fast). ...
Run  ldd  on the slower versions of  lua:
Code: [Select]
tc@E310:~$ ldd /usr/local/bin/lua
        linux-gate.so.1 (0xb7f08000)
        libm.so.6 => /lib/libm.so.6 (0xb7e38000)
        libdl.so.2 => /lib/libdl.so.2 (0xb7e33000)
        libreadline.so.7 => /usr/local/lib/libreadline.so.7 (0xb7df9000)
        libc.so.6 => /lib/libc.so.6 (0xb7ccc000)
        /lib/ld-linux.so.2 (0xb7f09000)
        libncursesw.so.6 => /usr/local/lib/libncursesw.so.6 (0xb7c87000)
tc@E310:~$

Then run  ldd  on the fast version you compiled:
Code: [Select]
ldd /home/tc/lua/bin/luaSee if your version shows any extra dependencies (liblua.so maybe?) that account for the speed increase.
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: Rich on September 16, 2022, 12:52:43 PM
Hi jazzbiker
I notice one other thing.
The compiler flags for http://tinycorelinux.net/12.x/x86/tcz/src/lua/compile_lua are:
Code: [Select]
CC= gcc -flto -march=i486 -mtune=i686 -Os -pipe -std=gnu99
The compiler flags for http://tinycorelinux.net/9.x/x86/tcz/src/lua/compile_lua (TC 10 version) are:
Code: [Select]
CC= gcc -mtune=generic -Os -pipe -std=gnu99
CFLAGS= -fPIC -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_COMPAT_5_1 $(SYSCFLAGS) $(MYCFLAGS)
I suspect the lack of a  -march=i486  might mean it defaulted to the processor of the compiling machine.

I think  -mtune=generic  tries to avoid code that would be slow on some AMD CPUs and avoid other
code that would be slow on some Intel CPUs, producing code that is not optimized for any CPU.

The  DLUA_COMPAT_5  variables allow for backward compatibility with older versions which may or may not
impact performance.

Or maybe the compiler version you used generated faster code.
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 01:28:37 PM
Hi Rich,

ldd was the first to check and dependencies are the same. It was not surprise because lua binary is linked with liblua.a:
Code: [Select]
gcc -flto -march=i486 -mtune=i686 -Os -pipe  -std=gnu99 -o lua   lua.o liblua.a -lm -Wl,-E -ldl -lreadline -lncursesw

Fast incarnation of lua-5.3.6 in TC13 x86 was compiled literally following recipe, using the same -march=i486 and -mtune=i686.

But!!!

I've made the thing I was to do before - compiled the same lua source using the same recipe, but under TC12 x86. And this lua binary is slowpoke! The only difference is gcc version.
I've compiled lua extensions for my needs in TC13 x86 using -no-unwind-tables and -fno-asynchronous-unwind-tables as You proposed earlier for pure C code, binary sizes decreased significantly.

Should we dig further? It may be useful, but I have no ideas in what direction. It may be useful for another extensions packing too.

Thanks!
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 01:30:09 PM

But maybe there are some differences between the compilations.
Like using -o3 and maybe using clang... and maybe using more sophisticated options so gcc compile the code to use the simd instructions.

There are lots of things we don't see as (app) users/programmer, behind the curtain.
How our C, C++ code being compiled to asm instructions.
And how that asm instructions being treated in the processor, with cache alignment and so on.
:)

Well, well, well... See previous post :)
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: Rich on September 16, 2022, 01:36:02 PM
Hi jazzbiker
... The only difference is gcc version. ...
And older versions of dependencies.

Quote
... I've made the thing I was to do before - compiled the same lua source using the same recipe, but under TC12 x86. And this lua binary is slowpoke! ...
If you copy that to TC13 does it run faster?
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 02:23:45 PM

And older versions of dependencies.


But the only non-system dependence is readline and it undoubtedly is not engaged in the test chunk execution - empty cycle.

I've compiled vanilla lua-5.3.6 in TC12 x86 without any patches and editing - simply
Code: [Select]
make linux
and the result is the same slow.
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 02:31:33 PM
I've built lua-5.3.6 in TC12 x86 with tcc. The changes made in src/Makefile:
Code: [Select]
CC= tcc -std=gnu99
linux:
        $(MAKE) $(ALL) SYSCFLAGS="-DLUA_USE_LINUX" SYSLIBS="-ldl -lreadline"

And

Code: [Select]
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real    0m 0.12s
user    0m 0.11s
sys     0m 0.00s

tcc produced the fast code )
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 02:39:13 PM
Looks like something in Lua sources makes gcc 10 and earlier insane, while gcc 11 keeps conscious )
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 02:50:11 PM
TC12 x86 lua-5.3.6 gcc, no optimization.
src/Makefile
Code: [Select]
CFLAGS= -Wall -Wextra -DLUA_COMPAT_5_2 $(SYSCFLAGS) $(MYCFLAGS)

 Testing:
Code: [Select]
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real    0m 0.12s
user    0m 0.11s
sys     0m 0.00s
fast, still slower than gcc 11 in TC13 x86 with -Os.
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 03:11:25 PM
TC12 x86 gcc 10.2 lua-5.3.6

tcc
Code: [Select]
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r--    1 tc       staff       359332 Sep 16 22:00 liblua.a
-rwxr-xr-x    1 tc       staff       243008 Sep 16 22:00 lua
-rwxr-xr-x    1 tc       staff       170488 Sep 16 22:00 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real    0m 0.14s
user    0m 0.13s
sys     0m 0.00s

gcc 10 no optimizations
Code: [Select]
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r--    1 tc       staff       342388 Sep 16 22:03 liblua.a
-rwxr-xr-x    1 tc       staff       270888 Sep 16 22:03 lua
-rwxr-xr-x    1 tc       staff       186408 Sep 16 22:03 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real    0m 0.12s
user    0m 0.11s
sys     0m 0.00s

gcc 10 -O2
Code: [Select]
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r--    1 tc       staff       359420 Sep 16 22:06 liblua.a
-rwxr-xr-x    1 tc       staff       284032 Sep 16 22:06 lua
-rwxr-xr-x    1 tc       staff       181220 Sep 16 22:06 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real    0m 1.20s
user    0m 1.19s
sys     0m 0.00s

gcc 10 -Os
Code: [Select]
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r--    1 tc       staff       269664 Sep 16 22:09 liblua.a
-rwxr-xr-x    1 tc       staff       200160 Sep 16 22:09 lua
-rwxr-xr-x    1 tc       staff       129348 Sep 16 22:09 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real    0m 1.20s
user    0m 1.20s
sys     0m 0.00s
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: patrikg on September 16, 2022, 03:14:59 PM
@Rich

Hello, isn't the -Os option in the gcc line for small size ?
Have you tried to use -Ofast ??
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 03:16:31 PM
gcc 10 -O3
Code: [Select]
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r--    1 tc       staff       401796 Sep 16 22:14 liblua.a
-rwxr-xr-x    1 tc       staff       319660 Sep 16 22:14 lua
-rwxr-xr-x    1 tc       staff       209232 Sep 16 22:14 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real    0m 1.18s
user    0m 1.18s
sys     0m 0.00s
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 03:19:48 PM
gcc 10 -Ofast
Code: [Select]
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r--    1 tc       staff       401020 Sep 16 22:18 liblua.a
-rwxr-xr-x    1 tc       staff       319112 Sep 16 22:18 lua
-rwxr-xr-x    1 tc       staff       209196 Sep 16 22:18 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real    0m 1.20s
user    0m 1.19s
sys     0m 0.00s
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: patrikg on September 16, 2022, 03:20:30 PM
@jazzbiker
*lol*

You have to insert all your values into some type of spreadsheet, to get some grip of all variants.

Have you tried to use clang ??
I don't know if someone have make an extension of clang.

But i have seen some youtuber struggle with compiling clang.
Making LLVM and so on.
Make his own Linux dist...https://t2sde.org/ for many processors architecture.
Like it alot.

Here's the links:
https://www.youtube.com/user/renerebe/videos
https://www.youtube.com/c/MoreReneRebe/videos
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 03:25:12 PM
@patrikg
And the title of the spreadsheet will be "Don't use gcc 10 optimizations" :) :) :)
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: Rich on September 16, 2022, 03:38:16 PM
Hi jazzbiker
TC12 x86 lua-5.3.6 gcc, no optimization.
src/Makefile
Code: [Select]
CFLAGS= -Wall -Wextra -DLUA_COMPAT_5_2 $(SYSCFLAGS) $(MYCFLAGS) ...
fast, still slower than gcc 11 in TC13 x86 with -Os.
Since you show  -Os  and  -O2  slow down the code, it makes me wonder what values  SYSCFLAGS
and  MYCFLAGS  are set to.
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 04:47:45 PM
Since you show  -Os  and  -O2  slow down the code, it makes me wonder what values  SYSCFLAGS
and  MYCFLAGS  are set to.
@Rich
Here are lines copied from the build output:
Code: [Select]
gcc -std=gnu99 -O2 -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_USE_LINUX    -c -o lbaselib.o lbaselib.c
gcc -std=gnu99 -o lua   lua.o liblua.a -lm -Wl,-E -ldl -lreadline
just from vanilla make linux
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 05:10:24 PM
Have you tried to use clang ??
No, gcc and tcc (like it). Planning to try https://github.com/michaelforney/cproc, frontend for https://c9x.me/compile/
I don't know if someone have make an extension of clang.
clang.tcz I see in repo. Isn't it?
Make his own Linux dist...https://t2sde.org/ for many processors architecture.
Thanks for the link! One more link in reply - https://github.com/oasislinux/oasis.
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: patrikg on September 16, 2022, 05:31:20 PM
@jazzbiker sorry but am using arch for my desktop.

Not using tc for my desktop, so i don't know so much about x86_64 and i386.

But have in the past used tc for my rpi. 
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 16, 2022, 06:49:28 PM
TC12 x86 gcc 10 lua-5.3.6

Applying -fno-unwind-tables and -fno-asynchronous-unwind-tables reverts the strange increase of binary size if -O2 used (see above)

gcc -O2
Code: [Select]
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r--    1 tc       staff       359420 Sep 17 01:27 liblua.a
-rwxr-xr-x    1 tc       staff       288128 Sep 17 01:27 lua
-rwxr-xr-x    1 tc       staff       189412 Sep 17 01:27 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real    0m 1.19s
user    0m 1.16s
sys     0m 0.01s

gcc -O2 no unwind tables
Code: [Select]
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r--    1 tc       staff       280756 Sep 17 01:30 liblua.a
-rwxr-xr-x    1 tc       staff       210304 Sep 17 01:30 lua
-rwxr-xr-x    1 tc       staff       144356 Sep 17 01:30 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real    0m 1.17s
user    0m 1.16s
sys     0m 0.00s

gcc -O0 no unwind tables
Code: [Select]
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r--    1 tc       staff       302524 Sep 17 01:31 liblua.a
-rwxr-xr-x    1 tc       staff       238120 Sep 17 01:31 lua
-rwxr-xr-x    1 tc       staff       170024 Sep 17 01:31 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real    0m 0.11s
user    0m 0.10s
sys     0m 0.00s

tcc
Code: [Select]
tc@box:/tmp/lua-5.3.6/src$ ls -l liblua.a lua luac
-rw-r--r--    1 tc       staff       359332 Sep 17 01:34 liblua.a
-rwxr-xr-x    1 tc       staff       243008 Sep 17 01:34 lua
-rwxr-xr-x    1 tc       staff       170488 Sep 17 01:34 luac
tc@box:/tmp/lua-5.3.6/src$ time ./lua -e 'for i=1,1000000 do end'
real    0m 0.11s
user    0m 0.11s
sys     0m 0.00s
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 17, 2022, 03:48:25 AM
TC12 x86 gcc 10 lua-5.4.4

vanilla "make linux-readline" with -O2
Code: [Select]
tc@box:/tmp/lua-5.4.4$ time src/lua -e 'for i=1,1000000 do end'
real    0m 0.05s
user    0m 0.05s
sys     0m 0.00s
tc@box:/tmp/lua-5.4.4$ ls -l src/liblua.a src/lua src/luac
-rw-r--r--    1 tc       staff       406946 Sep 17 10:22 src/liblua.a
-rwxr-xr-x    1 tc       staff       331708 Sep 17 10:22 src/lua
-rwxr-xr-x    1 tc       staff       228748 Sep 17 10:22 src/luac

MYCFLAGS= -fno-unwind-tables -fno-asynchronous-unwind-tables
Code: [Select]
tc@box:/tmp/lua-5.4.4$ time src/lua -e 'for i=1,1000000 do end'
real    0m 0.07s
user    0m 0.06s
sys     0m 0.00s
tc@box:/tmp/lua-5.4.4$ ls -l src/liblua.a src/lua src/luac
-rw-r--r--    1 tc       staff       319930 Sep 17 10:29 src/liblua.a
-rwxr-xr-x    1 tc       staff       245692 Sep 17 10:29 src/lua
-rwxr-xr-x    1 tc       staff       179596 Sep 17 10:29 src/luac

CFLAGS= -Os -Wall -Wextra -DLUA_COMPAT_5_3 $(SYSCFLAGS) $(MYCFLAGS)
MYCFLAGS= -fno-unwind-tables -fno-asynchronous-unwind-tables
Code: [Select]
tc@box:/tmp/lua-5.4.4$ time src/lua -e 'for i=1,1000000 do end'
real    0m 0.07s
user    0m 0.07s
sys     0m 0.00s
tc@box:/tmp/lua-5.4.4$ ls -l src/liblua.a src/lua src/luac
-rw-r--r--    1 tc       staff       265170 Sep 17 10:31 src/liblua.a
-rwxr-xr-x    1 tc       staff       203164 Sep 17 10:31 src/lua
-rwxr-xr-x    1 tc       staff       144480 Sep 17 10:31 src/luac

As we can see everything is all right in Lua 5.4.

Code: [Select]
tc@box:/tmp/lua-5.4.4$ src/luac -l -
for i=1,1000000 do end

main <stdin:0,0> (7 instructions at 0x9510490)
0+ params, 4 slots, 1 upvalue, 4 locals, 1 constant, 0 functions
        1       [1]     VARARGPREP      0
        2       [1]     LOADI           0 1
        3       [1]     LOADK           1 0     ; 1000000
        4       [1]     LOADI           2 1
        5       [1]     FORPREP         0 0     ; exit to 7
        6       [1]     FORLOOP         0 1     ; to 6
        7       [1]     RETURN          0 1 1   ; 0 out

In Lua 5.3 implementation of FORPREP or FORLOOP bytecodes includes some stuff making gcc 10 optimizations insane. Maybe this can be explored and allow to find what exactly cause the trouble. But gcc 11 works better, at least concerning the described circumstances.

@Rich, what's Your opinion, should we dive deeper? Will it be useful?
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: Rich on September 17, 2022, 11:01:16 AM
Hi jazzbiker
I played around a bit with this under TC10 x86, gcc 8.2, lua 5.3.6. I created a build script from:
http://tinycorelinux.net/12.x/x86/tcz/src/lua/compile_lua
I attached a copy to this post. The script has comments added.
After it builds lua, it shows its size before and after running  sstrip lua. Then it runs your benchmark:
Code: [Select]
-rwxr-xr-x 1 tc staff 213056 Sep 17 10:33 lua
-rwxr-xr-x 1 tc staff 180252 Sep 17 10:33 lua
real    0m 0.02s
user    0m 0.01s
sys     0m 0.00s
tc@E310:~/BuildLua$
This result was with optimization set to  -O1. Settings of -O0 and -Og show similar results.

To figure out whats happening would probably require figuring out which optimizations -O2, -O3, and -Os share
that are not present in -O0, -O1, and -Og. Then try disabling them one at a time to see what changes.
Personally, I have doubts its worth the effort.
Title: Re: Strange sluggishness of Lua in x86 TinyCore
Post by: jazzbiker on September 17, 2022, 01:00:27 PM
Hi Rich,

Thank You for the handy script :)

To figure out whats happening would probably require figuring out which optimizations -O2, -O3, and -Os share that are not present in -O0, -O1, and -Og. Then try disabling them one at a time to see what changes.

Ok, this will work nice. If the trouble is caused by one of optimization methods. Not by some combination of them :) . I see 40+ optimization various options being turned on after -O1 :( .

Personally, I have doubts its worth the effort.

Probably, yes. The problem is not in Lua code. It is written in absolutely plain C. Lua even has build target "c89". No pragmas, __attribute__ and another stuff alike. It means that absolutely plain C code can make gcc crazy, and we don't know in what way :( . Not very comfortable to know such things, especially encountering that nearly all extensions in TinyCore are threatened.

Anyway, I want to ask You to mark the topic as solved.
Title: Re: [Solved] Strange sluggishness of Lua in x86 TinyCore
Post by: Rich on September 17, 2022, 05:23:15 PM
Hi jazzbiker
Thank You for the handy script :) ...
You are quite welcome. Just be aware since I was only looking at building the interpreter I
placed an exit command before it builds lua.so.

Quote
... Anyway, I want to ask You to mark the topic as solved.
Done.  :)