Tiny Core Extensions > TCE Tips & Tricks

Faster packages

<< < (2/2)

Vaguiner:

--- Quote from: nick65go on July 18, 2025, 07:37:07 AM ---When SPEED is your goal, why would you use TC if you have a "modern" machine?

--- End quote ---
Because:

Speed is my goal in the subject Speed. Speed is not my goal in general regarding my use of the system.


--- Quote from: nick65go on July 18, 2025, 11:51:18 AM ---...do you think about MEASURED speed increase...

--- End quote ---

Unfortunately I don't have any synthetic numbers to offer you at the moment. The only thing I know is that the gain, for me, is VISUALLY noticeable in the responsiveness of the programs.

Vaguiner:
It's also worth remembering that

http://tinycorelinux.net/dCore/x86_64/import/src/kernel-4.8.17/README

dCore was compiled correctly, with -O2, which allows the correct use of mtune=generic.

gcc -Os == tcc


While at http://tinycorelinux.net/16.x/x86_64/release/src/toolchain/compile_tc16_x86_64

We see exaggeratedly repeated mentions of the “-mtune=generic -Os” sequence

CNK:

--- Quote from: Vaguiner on July 18, 2025, 03:51:02 PM ---Unfortunately I don't have any synthetic numbers to offer you at the moment. The only thing I know is that the gain, for me, is VISUALLY noticeable in the responsiveness of the programs.

--- End quote ---

I see you've built MicroPython. There are benchmark scripts like this one you could try with it.

It's not a problem for me personally, I like that TCL keeps things as small as possible. But I still run TCL on an Intel Core 2 and nothing nearly as new as Intel Alder Lake.

Vaguiner:

--- Quote from: CNK on July 19, 2025, 03:56:32 AM ---you could try with it...

--- End quote ---

For unrealistic synthetic tests I definitely wouldn't use micropython. However, I present you with a more interesting battery of tests.


--- Code: ---gcc -o test test.c -Os -mtune=generic -fopt-info-vec -lm ; du test ; time ./test
20.0K   test
Test               | Instructions     | Cycles          | IPC
--------------------+------------------+------------------+--------
Vector Add         |         60000073 |         28784599 | 2.08
Vector Multiply    |         60000070 |         29766303 | 2.02
FMA Operation      |         70000072 |         34251292 | 2.04
Multi Operations   |         90000071 |         34479365 | 2.61
Polynomial         |        150000072 |         41241200 | 3.64
Linear Interp      |        100000073 |         34798249 | 2.87
Sqrt Approx        |        180000193 |        112369036 | 1.60
Scale Offset       |         70000070 |         25323119 | 2.76
Horizontal Add     |         60000071 |         31063022 | 1.93
Vector Min         |         70000071 |         31168168 | 2.25
Mask Blend         |        135010074 |         47303963 | 2.85
Memory Bandwidth   |         80000080 |        120310011 | 0.66
Conditional Sum    |         74990073 |         41280248 | 1.82
Loop Fusion        |        120000075 |         54537656 | 2.20
Prefix Sum         |         60000074 |        120051800 | 0.50
Mul Add Const      |         70000070 |         24822415 | 2.82
Linear Blend       |        100000072 |         34598789 | 2.89
--------------------+------------------+------------------+--------
TOTAL              |       1550001354 |        846149235 | 1.83
real    0m 1.42s
user    0m 1.36s
sys     0m 0.02s
--- End code ---


--- Code: ---gcc -o test test.c -O2 -s -mtune=generic -fopt-info-vec -lm ; du test ; time ./test
test.c:73:1: optimized: loop vectorized using 16 byte vectors
test.c:79:1: optimized: loop vectorized using 16 byte vectors
test.c:85:1: optimized: loop vectorized using 16 byte vectors
test.c:91:1: optimized: loop vectorized using 16 byte vectors
test.c:97:1: optimized: loop vectorized using 16 byte vectors
test.c:104:1: optimized: loop vectorized using 16 byte vectors
test.c:116:1: optimized: loop vectorized using 16 byte vectors
test.c:122:1: optimized: loop vectorized using 16 byte vectors
test.c:130:1: optimized: loop vectorized using 16 byte vectors
test.c:136:1: optimized: loop vectorized using 16 byte vectors
test.c:167:1: optimized: loop vectorized using 16 byte vectors
test.c:167:1: optimized: loop vectorized using 16 byte vectors
test.c:185:1: optimized: loop vectorized using 16 byte vectors
test.c:213:23: optimized: loop vectorized using 16 byte vectors
20.0K   test
Test               | Instructions     | Cycles          | IPC
--------------------+------------------+------------------+--------
Vector Add         |         15000074 |         25512282 | 0.59
Vector Multiply    |         15000072 |         26700252 | 0.56
FMA Operation      |         17500072 |         32290420 | 0.54
Multi Operations   |         22500073 |         32629001 | 0.69
Polynomial         |         37500075 |         21941139 | 1.71
Linear Interp      |         25000074 |         32907018 | 0.76
Sqrt Approx        |        120000078 |        110427209 | 1.09
Scale Offset       |         17500075 |         21367514 | 0.82
Horizontal Add     |         35000073 |         30905302 | 1.13
Vector Min         |         15000072 |         26653637 | 0.56
Mask Blend         |         30000078 |         32798693 | 0.91
Memory Bandwidth   |         80000083 |        120064720 | 0.67
Conditional Sum    |         70000073 |         32663454 | 2.14
Loop Fusion        |         30000079 |         47302674 | 0.63
Prefix Sum         |         50000067 |         31552173 | 1.58
Mul Add Const      |         17500072 |         21308674 | 0.82
Linear Blend       |         25000074 |         32729043 | 0.76
--------------------+------------------+------------------+--------
TOTAL              |        622501264 |        679753205 | 0.92
real    0m 1.07s
user    0m 1.04s
sys     0m 0.01s
--- End code ---


--- Code: ---gcc -o test test.c -Ofast -march=native -fopt-info-vec-optimized -fmerge-all-constants -fno-semantic-interposition -ftree-vecto
rize -fipa-pta -funroll-loops -floop-nest-optimize -lm ; du test ; time ./test
test.c:104:1: optimized: loop vectorized using 32 byte vectors
test.c:185:1: optimized: loop vectorized using 32 byte vectors
test.c:167:1: optimized: loop vectorized using 32 byte vectors
test.c:136:1: optimized: loop vectorized using 32 byte vectors
test.c:130:1: optimized: loop vectorized using 32 byte vectors
test.c:122:1: optimized: loop vectorized using 32 byte vectors
test.c:116:1: optimized: loop vectorized using 32 byte vectors
test.c:104:1: optimized: loop vectorized using 32 byte vectors
test.c:97:1: optimized: loop vectorized using 32 byte vectors
test.c:91:1: optimized: loop vectorized using 32 byte vectors
test.c:85:1: optimized: loop vectorized using 32 byte vectors
test.c:79:1: optimized: loop vectorized using 32 byte vectors
test.c:73:1: optimized: loop vectorized using 32 byte vectors
test.c:110:1: optimized: loop vectorized using 32 byte vectors
test.c:47:12: optimized: basic block part vectorized using 8 byte vectors
test.c:213:23: optimized: loop vectorized using 32 byte vectors
28.0K   test
Test               | Instructions     | Cycles          | IPC
--------------------+------------------+------------------+--------
Vector Add         |          4218816 |         26519003 | 0.16
Vector Multiply    |          4218812 |         27630546 | 0.15
FMA Operation      |          5468814 |         33934769 | 0.16
Multi Operations   |          4218812 |         27554213 | 0.15
Polynomial         |          6718814 |         21462602 | 0.31
Linear Interp      |          6718815 |         33417605 | 0.20
Sqrt Approx        |          9218813 |         21445641 | 0.43
Scale Offset       |          4218813 |         21344567 | 0.20
Horizontal Add     |          4218818 |         13024590 | 0.32
Vector Min         |          4218812 |         27421033 | 0.15
Mask Blend         |          5468815 |         33027008 | 0.17
Memory Bandwidth   |         53750070 |        120068496 | 0.45
Conditional Sum    |         49380064 |         34571463 | 1.43
Loop Fusion        |          6875064 |         41591792 | 0.17
Prefix Sum         |         23333394 |         31672422 | 0.74
Mul Add Const      |          4218812 |         21295353 | 0.20
Linear Blend       |          6718815 |         33483167 | 0.20
--------------------+------------------+------------------+--------
TOTAL              |        203183173 |        569464270 | 0.36
real    0m 0.89s
user    0m 0.85s
sys     0m 0.01s
--- End code ---

There was no change in the size of the binary or the number of compatible devices between compiling with -Os and -O2 -s, but there was a significant performance increase.

Rich:
Hi Vaguiner

Please use  Code Tags  when posting commands and responses seen in a terminal. To use  Code Tags  click on the  #  icon
above the reply box and paste your text between the  Code Tags  as shown in this example:


--- Quote ---[code][   36.176529] pcm512x 1-004d: Failed to get supply 'AVDD': -517
[   36.176536] pcm512x 1-004d: Failed to get supplies: -517
[   36.191753] pcm512x 1-004d: Failed to get supply 'AVDD': -517[/code]
--- End quote ---

It will appear like this in your post:

--- Code: ---[   36.176529] pcm512x 1-004d: Failed to get supply 'AVDD': -517
[   36.176536] pcm512x 1-004d: Failed to get supplies: -517
[   36.191753] pcm512x 1-004d: Failed to get supply 'AVDD': -517
--- End code ---

Code Tags  serve as visual markers between what you are trying to say and the information you are posting. They also preserve
spacing so column aligned data displays properly. Code tags also automatically add horizontal and or vertical scrollbars
to accommodate long lines and listings.

Navigation

[0] Message Index

[*] Previous page

Go to full version