TC12 x86 gcc 10 lua-5.4.4
vanilla "make linux-readline" with -O2
tc@box:/tmp/lua-5.4.4$ time src/lua -e 'for i=1,1000000 do end'
real 0m 0.05s
user 0m 0.05s
sys 0m 0.00s
tc@box:/tmp/lua-5.4.4$ ls -l src/liblua.a src/lua src/luac
-rw-r--r-- 1 tc staff 406946 Sep 17 10:22 src/liblua.a
-rwxr-xr-x 1 tc staff 331708 Sep 17 10:22 src/lua
-rwxr-xr-x 1 tc staff 228748 Sep 17 10:22 src/luac
MYCFLAGS= -fno-unwind-tables -fno-asynchronous-unwind-tables
tc@box:/tmp/lua-5.4.4$ time src/lua -e 'for i=1,1000000 do end'
real 0m 0.07s
user 0m 0.06s
sys 0m 0.00s
tc@box:/tmp/lua-5.4.4$ ls -l src/liblua.a src/lua src/luac
-rw-r--r-- 1 tc staff 319930 Sep 17 10:29 src/liblua.a
-rwxr-xr-x 1 tc staff 245692 Sep 17 10:29 src/lua
-rwxr-xr-x 1 tc staff 179596 Sep 17 10:29 src/luac
CFLAGS= -Os -Wall -Wextra -DLUA_COMPAT_5_3 $(SYSCFLAGS) $(MYCFLAGS)
MYCFLAGS= -fno-unwind-tables -fno-asynchronous-unwind-tables
tc@box:/tmp/lua-5.4.4$ time src/lua -e 'for i=1,1000000 do end'
real 0m 0.07s
user 0m 0.07s
sys 0m 0.00s
tc@box:/tmp/lua-5.4.4$ ls -l src/liblua.a src/lua src/luac
-rw-r--r-- 1 tc staff 265170 Sep 17 10:31 src/liblua.a
-rwxr-xr-x 1 tc staff 203164 Sep 17 10:31 src/lua
-rwxr-xr-x 1 tc staff 144480 Sep 17 10:31 src/luac
As we can see everything is all right in Lua 5.4.
tc@box:/tmp/lua-5.4.4$ src/luac -l -
for i=1,1000000 do end
main <stdin:0,0> (7 instructions at 0x9510490)
0+ params, 4 slots, 1 upvalue, 4 locals, 1 constant, 0 functions
1 [1] VARARGPREP 0
2 [1] LOADI 0 1
3 [1] LOADK 1 0 ; 1000000
4 [1] LOADI 2 1
5 [1] FORPREP 0 0 ; exit to 7
6 [1] FORLOOP 0 1 ; to 6
7 [1] RETURN 0 1 1 ; 0 out
In Lua 5.3 implementation of FORPREP or FORLOOP bytecodes includes some stuff making gcc 10 optimizations insane. Maybe this can be explored and allow to find what exactly cause the trouble. But gcc 11 works better, at least concerning the described circumstances.
@Rich, what's Your opinion, should we dive deeper? Will it be useful?