export CFLAGS="-mtune=generic -Os -pipe"
export CXXFLAGS="-mtune=generic -Os -pipe"
Missing those advanced instructions, if you are right, is a bottleneck for any normal/modern CPU made in the last 5-10 years (assuming you buy a new machine in less than 10 years - because capitalism intentional planing for obsolescence / oblivion).
When SPEED is your goal, why would you use TC if you have a "modern" machine?Because:
...do you think about MEASURED speed increase...
Unfortunately I don't have any synthetic numbers to offer you at the moment. The only thing I know is that the gain, for me, is VISUALLY noticeable in the responsiveness of the programs.
you could try with it...
gcc -o test test.c -Os -mtune=generic -fopt-info-vec -lm ; du test ; time ./test
20.0K test
Test | Instructions | Cycles | IPC
--------------------+------------------+------------------+--------
Vector Add | 60000073 | 28784599 | 2.08
Vector Multiply | 60000070 | 29766303 | 2.02
FMA Operation | 70000072 | 34251292 | 2.04
Multi Operations | 90000071 | 34479365 | 2.61
Polynomial | 150000072 | 41241200 | 3.64
Linear Interp | 100000073 | 34798249 | 2.87
Sqrt Approx | 180000193 | 112369036 | 1.60
Scale Offset | 70000070 | 25323119 | 2.76
Horizontal Add | 60000071 | 31063022 | 1.93
Vector Min | 70000071 | 31168168 | 2.25
Mask Blend | 135010074 | 47303963 | 2.85
Memory Bandwidth | 80000080 | 120310011 | 0.66
Conditional Sum | 74990073 | 41280248 | 1.82
Loop Fusion | 120000075 | 54537656 | 2.20
Prefix Sum | 60000074 | 120051800 | 0.50
Mul Add Const | 70000070 | 24822415 | 2.82
Linear Blend | 100000072 | 34598789 | 2.89
--------------------+------------------+------------------+--------
TOTAL | 1550001354 | 846149235 | 1.83
real 0m 1.42s
user 0m 1.36s
sys 0m 0.02s
gcc -o test test.c -O2 -s -mtune=generic -fopt-info-vec -lm ; du test ; time ./test
test.c:73:1: optimized: loop vectorized using 16 byte vectors
test.c:79:1: optimized: loop vectorized using 16 byte vectors
test.c:85:1: optimized: loop vectorized using 16 byte vectors
test.c:91:1: optimized: loop vectorized using 16 byte vectors
test.c:97:1: optimized: loop vectorized using 16 byte vectors
test.c:104:1: optimized: loop vectorized using 16 byte vectors
test.c:116:1: optimized: loop vectorized using 16 byte vectors
test.c:122:1: optimized: loop vectorized using 16 byte vectors
test.c:130:1: optimized: loop vectorized using 16 byte vectors
test.c:136:1: optimized: loop vectorized using 16 byte vectors
test.c:167:1: optimized: loop vectorized using 16 byte vectors
test.c:167:1: optimized: loop vectorized using 16 byte vectors
test.c:185:1: optimized: loop vectorized using 16 byte vectors
test.c:213:23: optimized: loop vectorized using 16 byte vectors
20.0K test
Test | Instructions | Cycles | IPC
--------------------+------------------+------------------+--------
Vector Add | 15000074 | 25512282 | 0.59
Vector Multiply | 15000072 | 26700252 | 0.56
FMA Operation | 17500072 | 32290420 | 0.54
Multi Operations | 22500073 | 32629001 | 0.69
Polynomial | 37500075 | 21941139 | 1.71
Linear Interp | 25000074 | 32907018 | 0.76
Sqrt Approx | 120000078 | 110427209 | 1.09
Scale Offset | 17500075 | 21367514 | 0.82
Horizontal Add | 35000073 | 30905302 | 1.13
Vector Min | 15000072 | 26653637 | 0.56
Mask Blend | 30000078 | 32798693 | 0.91
Memory Bandwidth | 80000083 | 120064720 | 0.67
Conditional Sum | 70000073 | 32663454 | 2.14
Loop Fusion | 30000079 | 47302674 | 0.63
Prefix Sum | 50000067 | 31552173 | 1.58
Mul Add Const | 17500072 | 21308674 | 0.82
Linear Blend | 25000074 | 32729043 | 0.76
--------------------+------------------+------------------+--------
TOTAL | 622501264 | 679753205 | 0.92
real 0m 1.07s
user 0m 1.04s
sys 0m 0.01s
gcc -o test test.c -Ofast -march=native -fopt-info-vec-optimized -fmerge-all-constants -fno-semantic-interposition -ftree-vecto
rize -fipa-pta -funroll-loops -floop-nest-optimize -lm ; du test ; time ./test
test.c:104:1: optimized: loop vectorized using 32 byte vectors
test.c:185:1: optimized: loop vectorized using 32 byte vectors
test.c:167:1: optimized: loop vectorized using 32 byte vectors
test.c:136:1: optimized: loop vectorized using 32 byte vectors
test.c:130:1: optimized: loop vectorized using 32 byte vectors
test.c:122:1: optimized: loop vectorized using 32 byte vectors
test.c:116:1: optimized: loop vectorized using 32 byte vectors
test.c:104:1: optimized: loop vectorized using 32 byte vectors
test.c:97:1: optimized: loop vectorized using 32 byte vectors
test.c:91:1: optimized: loop vectorized using 32 byte vectors
test.c:85:1: optimized: loop vectorized using 32 byte vectors
test.c:79:1: optimized: loop vectorized using 32 byte vectors
test.c:73:1: optimized: loop vectorized using 32 byte vectors
test.c:110:1: optimized: loop vectorized using 32 byte vectors
test.c:47:12: optimized: basic block part vectorized using 8 byte vectors
test.c:213:23: optimized: loop vectorized using 32 byte vectors
28.0K test
Test | Instructions | Cycles | IPC
--------------------+------------------+------------------+--------
Vector Add | 4218816 | 26519003 | 0.16
Vector Multiply | 4218812 | 27630546 | 0.15
FMA Operation | 5468814 | 33934769 | 0.16
Multi Operations | 4218812 | 27554213 | 0.15
Polynomial | 6718814 | 21462602 | 0.31
Linear Interp | 6718815 | 33417605 | 0.20
Sqrt Approx | 9218813 | 21445641 | 0.43
Scale Offset | 4218813 | 21344567 | 0.20
Horizontal Add | 4218818 | 13024590 | 0.32
Vector Min | 4218812 | 27421033 | 0.15
Mask Blend | 5468815 | 33027008 | 0.17
Memory Bandwidth | 53750070 | 120068496 | 0.45
Conditional Sum | 49380064 | 34571463 | 1.43
Loop Fusion | 6875064 | 41591792 | 0.17
Prefix Sum | 23333394 | 31672422 | 0.74
Mul Add Const | 4218812 | 21295353 | 0.20
Linear Blend | 6718815 | 33483167 | 0.20
--------------------+------------------+------------------+--------
TOTAL | 203183173 | 569464270 | 0.36
real 0m 0.89s
user 0m 0.85s
sys 0m 0.01s
[code][ 36.176529] pcm512x 1-004d: Failed to get supply 'AVDD': -517
[ 36.176536] pcm512x 1-004d: Failed to get supplies: -517
[ 36.191753] pcm512x 1-004d: Failed to get supply 'AVDD': -517[/code]
[ 36.176529] pcm512x 1-004d: Failed to get supply 'AVDD': -517
[ 36.176536] pcm512x 1-004d: Failed to get supplies: -517
[ 36.191753] pcm512x 1-004d: Failed to get supply 'AVDD': -517