General TC > Programming & Scripting - Unofficial
GCC compiler and linker options
Rich:
This is a continuation on the subject of executable sizes produced by GCC discussed here:
http://forum.tinycorelinux.net/index.php/topic,23134.msg144872.html#msg144872
The linker (ld) uses a script to determine how it will create the output file. Those scripts are in:
--- Code: ---/usr/local/lib/ldscripts/
--- End code ---
If you don't specify which linker script to use, it defaults to the built in script. To determine which
which script is built in, first find the scripts description string:
--- Code: ---tc@E310:~$ ld --verbose | grep -F "/* Script for "
/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
--- End code ---
A search for that string in the linker scripts directory tells us it defaults to the .xce script:
--- Code: ---tc@E310:~$ grep -F "/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */" /usr/local/lib/ldscripts/*
/usr/local/lib/ldscripts/elf32_x86_64.xce:/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
/usr/local/lib/ldscripts/elf_i386.xce:/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
/usr/local/lib/ldscripts/elf_iamcu.xce:/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
/usr/local/lib/ldscripts/elf_k1om.xce:/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
/usr/local/lib/ldscripts/elf_l1om.xce:/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
/usr/local/lib/ldscripts/elf_x86_64.xce:/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
tc@E310:~$
--- End code ---
The "descriptions" of the available scripts are:
--- Code: ---tc@E310:~$ grep -F "/* Script for" /usr/local/lib/ldscripts/elf_i386.* | cut -d'/' -f6- | column -s: -t
elf_i386.xbn /* Script for -N mix text and data on same page; don't align data */
elf_i386.xc /* Script for -z combreloc combine and sort reloc sections */
elf_i386.xce /* Script for -z combreloc -z separate-code combine and sort reloc sections with separate code segment */
elf_i386.xd /* Script for ld -pie link position independent executable */
elf_i386.xdc /* Script for -pie -z combreloc position independent executable, combine & sort relocs */
elf_i386.xdce /* Script for -pie -z combreloc -z separate-code position independent executable, combine & sort relocs with separate code segment */
elf_i386.xde /* Script for ld -pie -z separate-code link position independent executable with separate code segment */
elf_i386.xdw /* Script for -pie -z combreloc -z now -z relro position independent executable, combine & sort relocs */
elf_i386.xdwe /* Script for -pie -z combreloc -z now -z relro -z separate-code position independent executable, combine & sort relocs with separate code segment */
elf_i386.xe /* Script for -z separate-code generate normal executables with separate code segment */
elf_i386.xn /* Script for -n mix text and data on same page */
elf_i386.xr /* Script for ld -r link without relocation */
elf_i386.xs /* Script for ld --shared link shared library */
elf_i386.xsc /* Script for --shared -z combreloc shared library, combine & sort relocs */
elf_i386.xsce /* Script for --shared -z combreloc -z separate-code shared library, combine & sort relocs with separate code segment */
elf_i386.xse /* Script for ld --shared -z separate-code link shared library with separate code segment */
elf_i386.xsw /* Script for --shared -z combreloc -z now -z relro shared library, combine & sort relocs */
elf_i386.xswe /* Script for --shared -z combreloc -z now -z relro -z separate-code shared library, combine & sort relocs with separate code segment */
elf_i386.xu /* Script for ld -Ur link w/out relocation, do create constructors */
elf_i386.xw /* Script for -z combreloc -z now -z relro combine and sort reloc sections */
elf_i386.xwe /* Script for -z combreloc -z now -z relro -z separate-code combine and sort reloc sections with separate code segment */
tc@E310:~$
--- End code ---
I couldn't find anything that explained when and why you might want to use one over another.
What I think I know is:
1. elf_i386.xce This is the current default for the linker in TC10. The "with separate code segment"
part of the description probably accounts for the larger executables.
2. elf_i386.xc This is the version the linker used in TC4. It produced smaller executables. The most
dramatic difference is when compiling programs that are small to begin with.
3. elf_i386.xbn This version appears to produce the smallest executables. The description reminds me of the
old DOS .com programs that were limited to 64K because they shared a code and data segment.
The "don't align data" part of the description may indicate it's not the best choice
for high performance applications.
I ran a few test compilations with the 3 scripts. The columns show the unstripped and sstripped sizes
with a plain, an added -fno-plt optimization, and an added -fno-plt -flto optimization.
This is a very small program that generates the rotating dash for Tinycore scripts:
--- Code: ---gcc -flto -march=i486 -mtune=i686 -Os -pipe -Wall -fno-plt rotdash.c -o rotdash -Wl,-T/usr/local/lib/ldscripts/elf_i386.xbn
script rotdash sstripped -fno-plt sstripped -flto sstripped
elf_i386.xce 15324 12332 15304 12312 15276 12312
elf_i386.xc 7132 4140 7112 4120 7084 4120
elf_i386.xbn 4704 1656 4640 1592 4612 1592
--- End code ---
This is a small program for hiding/moving the mouse cursor. It's also linked to some X libraries:
--- Code: ---gcc -flto -fuse-linker-plugin -march=i486 -mtune=i686 -Os -g -pipe -Wall -Wextra -fno-plt -c HideMouse.c
gcc -Wl,-T/usr/local/lib/ldscripts/elf_i386.xbn -I. -L. HideMouse.o -o HideMouse -lX11 -lXfixes
script HideMouse sstripped -fno-plt sstripped -flto sstripped
elf_i386.xce 23660 12376 23608 12312 21540 12312
elf_i386.xc 15468 4184 15416 4120 13348 4120
elf_i386.xbn 14560 3284 14352 3064 12272 3048
--- End code ---
This is a modestly sized screen grabber program. The -fwhole-program doesn't leave much for -flto to do:
--- Code: ---gcc -flto -march=i486 -mtune=i686 -Os -g -pipe -Wall -Wextra -fno-plt -fwhole-program -c grabber.c
gcc -Wl,-T/usr/local/lib/ldscripts/elf_i386.xbn -I. -L. grabber.o -o grabber -lImlib2 -lX11 -lXfixes
script grabber sstripped -fno-plt sstripped -flto sstripped
elf_i386.xce 48192 16624 47984 16408 40052 16408
elf_i386.xc 44096 12528 43888 12312 35956 12312
elf_i386.xbn 42184 10620 41392 9820 33444 9804
--- End code ---
Given a screw size and depth of thread, returns a chart containing the closest sized drills and the depth
of thread they would produce. It includes a large data structure of fractional, number, and letter drill sizes:
--- Code: ---gcc -flto -march=i486 -mtune=i686 -Os -g -pipe -Wall -Wextra -fno-plt TapDrill.c -o TapDrill -Wl,-T/usr/local/lib/ldscripts/elf_i386.xbn
script TapDrill sstripped -fno-plt sstripped -flto sstripped
elf_i386.xce 31360 18592 31344 18560 31920 18528
elf_i386.xc 27264 14496 27248 14464 27824 14432
elf_i386.xbn 26872 14048 26792 13952 27368 13920
--- End code ---
gpicview is described as:
A Simple and Fast GTK2 based Image Viewer
It is from the LXDE desktop package.
--- Code: ---CFLAGS=" -flto -march=i486 -mtune=i686 -Os -pipe -fno-plt"
LDFLAGS="-Wl,-T/usr/local/lib/ldscripts/elf_i386.xbn"
script gpicview sstripped -fno-plt sstripped -flto sstripped
elf_i386.xce 93856 72278 88480 66902 82136 62452
elf_i386.xc 89760 68182 84384 62806 78040 58356
elf_i386.xbn 87416 65782 83256 61622 75728 55988
--- End code ---
[EDIT]: Added gpicview table. Rich
Rich:
I decided to rerun the tests using CorePure64-10.1 to see how these options fare in a 64 bit environment.
--- Code: ---gcc -flto -mtune=generic -Os -pipe -Wall -fno-plt rotdash.c -o rotdash -Wl,-T/usr/local/lib/ldscripts/elf_x86_64.xbn
script rotdash sstripped -fno-plt sstripped -flto sstripped
elf_x86_64.xce 16656 12368 16432 12328 16400 12328
elf_x86_64.xc 8464 4176 8240 4136 8208 4136
elf_x86_64.xbn 6952 2576 6576 2384 6544 2384
--- End code ---
--- Code: ---gcc -flto -mtune=generic -Os -g -pipe -Wall -Wextra -fno-plt -c HideMouse.c
gcc -Wl,-T/usr/local/lib/ldscripts/elf_x86_64.xbn -I. -L. HideMouse.o -o HideMouse -lX11 -lXfixes
script HideMouse sstripped -fno-plt sstripped -flto sstripped
elf_x86_64.xce 26056 12456 25752 12328 23056 12328
elf_x86_64.xc 21960 8360 21656 8232 18960 8232
elf_x86_64.xbn 18264 4672 17712 4296 15008 4288
--- End code ---
--- Code: ---gcc -flto -mtune=generic -Os -g -pipe -Wall -Wextra -fno-plt -fwhole-program -c grabber.c
gcc -Wl,-T/usr/local/lib/ldscripts/elf_x86_64.xbn -I. -L. grabber.o -o grabber -lImlib2 -lX11 -lXfixes
script grabber sstripped -fno-plt sstripped -flto sstripped
elf_x86_64.xce 57584 21296 56968 20848 46360 20520
elf_x86_64.xc 53488 17200 52872 16752 42264 16424
elf_x86_64.xbn 49832 13552 48800 12688 38488 12656
--- End code ---
--- Code: ---gcc -flto -mtune=generic-Os -g -pipe -Wall -Wextra -fno-plt TapDrill.c -o TapDrill -Wl,-T/usr/local/lib/ldscripts/elf_x86_64.xbn
script TapDrill sstripped -fno-plt sstripped -flto sstripped
elf_x86_64.xce 41136 25840 40896 25776 41680 25712
elf_x86_64.xc 37040 21744 36800 21680 37584 21616
elf_x86_64.xbn 33576 18192 33112 17904 33928 17872
--- End code ---
I didn't feel like running all the combinations for gpicview, so here are the highlights, sstripped numbers only:
--- Code: ---CFLAGS=" -mtune=generic -Os -pipe "
LDFLAGS="-Wl,-O1"
90530
CFLAGS=" -mtune=generic -Os -pipe "
LDFLAGS="-Wl,-T/usr/local/lib/ldscripts/elf_x86_64.xbn"
83362
CFLAGS=" -flto -mtune=generic -Os -pipe -fno-plt"
LDFLAGS="-Wl,-T/usr/local/lib/ldscripts/elf_x86_64.xbn"
70096
--- End code ---
The improvements were very similar to the 32 bit tests.
I would say using the .xbn linker script and sstrip by far provided the largest reduction in executable size.
Adding -fno-plt to that saved between 64 and 800 bytes for 32 bits and between 192 and 864 bytes for 64 bits.
Adding -flto to the mix only saved between 0 and 32 additional bytes for both 32 and 64 bits.
[EDIT]: Added results for gpicview. Rich
andyj:
Have you done any speed tests?
How about with larger executable images?
Does space saving scale or flatten out?
How is compile time affected?
Have you tried this on the kernel?
It would be interesting to see how Xorg and DRM/3D would be affected. I could probably get around to trying it out on some databases or web servers, but I'd need to find some benchmark suites first.
Juanito:
In testing a few years back, I found -flto made large apps 5-10% smaller (except for static libs, which it makes 10x bigger).
Removing "-g -O2", which autotools puts in most Makefiles also makes things smaller.
I see librsvg now needs rustc and is 3x larger...
Rich:
Hi andyj
--- Quote from: andyj on March 17, 2020, 07:44:10 AM ---Have you done any speed tests?
--- End quote ---
No.
--- Quote ---How about with larger executable images?
--- End quote ---
Not yet. I'm going to try with lshw and gtk-lshw. The current executables are about 944K and 1228K respectively and were compiled
with the -flto option.
On a side note, I read on StackOverflow that -flto could be used in place of -flto -fuse-linker-plugin because it now causes
-fuse-linker-plugin to be used automatically. I tested it and adding -fuse-linker-plugin to -flto made no difference.
--- Quote ---Does space saving scale or flatten out?
--- End quote ---
The last 2 programs (grabber and TapDrill) are similar in size and complexity. One difference is TapDrill contains an array of
386 16 byte structures (6167 Bytes) preloaded with text and floating point values. I suspect there's not much that can be
optimized in that part of the program.
--- Quote ---How is compile time affected?
--- End quote ---
I didn't time it.
I wouldn't expect the linker script to make any difference. It just provides guidance on gaps between sections, location of sections,
section alignment, etc. It doesn't deal with code optimization.
The -fno-plt just strips out some stub code. The end effect is (from gcc.org) "all external symbols are resolved at load time". That
means load time may be slower. Whether that's noticeable is another question. It also frees up a register in 32 bit x86.
--- Quote ---Have you tried this on the kernel?
--- End quote ---
The only kernel tinkering I've done is through make menuconfig.
--- Quote ---It would be interesting to see how Xorg and DRM/3D would be affected. I could probably get around to trying it out on some databases or web servers, but I'd need to find some benchmark suites first.
--- End quote ---
Remember what I said about the .xbn script:
--- Quote from: Rich on March 16, 2020, 01:44:22 AM ---3. elf_i386.xbn This version appears to produce the smallest executables. The description reminds me of the
old DOS .com programs that were limited to 64K because they shared a code and data segment.
The "don't align data" part of the description may indicate it's not the best choice
for high performance applications.
--- End quote ---
If there is a serious performance hit, go with the .xc. script.
Navigation
[0] Message Index
[#] Next page
Go to full version