WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: GCC compiler and linker options  (Read 271 times)

Offline Rich

  • TinyCore Moderator
  • Hero Member
  • *****
  • Posts: 7328
GCC compiler and linker options
« on: March 15, 2020, 10:44:22 PM »
This is a continuation on the subject of executable sizes produced by GCC discussed here:
http://forum.tinycorelinux.net/index.php/topic,23134.msg144872.html#msg144872

The linker (ld) uses a script to determine how it will create the output file. Those scripts are in:
Code: [Select]
/usr/local/lib/ldscripts/
If you don't specify which linker script to use, it defaults to the built in script. To determine which
which script is built in, first find the scripts description string:
Code: [Select]
tc@E310:~$ ld --verbose | grep -F "/* Script for "
/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */

A search for that string in the linker scripts directory tells us it defaults to the  .xce  script:
Code: [Select]
tc@E310:~$ grep -F "/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */" /usr/local/lib/ldscripts/*
/usr/local/lib/ldscripts/elf32_x86_64.xce:/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
/usr/local/lib/ldscripts/elf_i386.xce:/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
/usr/local/lib/ldscripts/elf_iamcu.xce:/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
/usr/local/lib/ldscripts/elf_k1om.xce:/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
/usr/local/lib/ldscripts/elf_l1om.xce:/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
/usr/local/lib/ldscripts/elf_x86_64.xce:/* Script for -z combreloc -z separate-code: combine and sort reloc sections with separate code segment */
tc@E310:~$

The "descriptions" of the available scripts are:
Code: [Select]
tc@E310:~$ grep -F "/* Script for" /usr/local/lib/ldscripts/elf_i386.* | cut -d'/' -f6- | column -s: -t
elf_i386.xbn   /* Script for -N                                                       mix text and data on same page; don't align data */
elf_i386.xc    /* Script for -z combreloc                                             combine and sort reloc sections */
elf_i386.xce   /* Script for -z combreloc -z separate-code                            combine and sort reloc sections with separate code segment */
elf_i386.xd    /* Script for ld -pie                                                  link position independent executable */
elf_i386.xdc   /* Script for -pie -z combreloc                                        position independent executable, combine & sort relocs */
elf_i386.xdce  /* Script for -pie -z combreloc -z separate-code                       position independent executable, combine & sort relocs with separate code segment */
elf_i386.xde   /* Script for ld -pie -z separate-code                                 link position independent executable with separate code segment */
elf_i386.xdw   /* Script for -pie -z combreloc -z now -z relro                        position independent executable, combine & sort relocs */
elf_i386.xdwe  /* Script for -pie -z combreloc -z now -z relro -z separate-code       position independent executable, combine & sort relocs with separate code segment */
elf_i386.xe    /* Script for -z separate-code                                         generate normal executables with separate code segment */
elf_i386.xn    /* Script for -n                                                       mix text and data on same page */
elf_i386.xr    /* Script for ld -r                                                    link without relocation */
elf_i386.xs    /* Script for ld --shared                                              link shared library */
elf_i386.xsc   /* Script for --shared -z combreloc                                    shared library, combine & sort relocs */
elf_i386.xsce  /* Script for --shared -z combreloc -z separate-code                   shared library, combine & sort relocs with separate code segment */
elf_i386.xse   /* Script for ld --shared -z separate-code                             link shared library with separate code segment */
elf_i386.xsw   /* Script for --shared -z combreloc -z now -z relro                    shared library, combine & sort relocs */
elf_i386.xswe  /* Script for --shared -z combreloc -z now -z relro -z separate-code   shared library, combine & sort relocs with separate code segment */
elf_i386.xu    /* Script for ld -Ur                                                   link w/out relocation, do create constructors */
elf_i386.xw    /* Script for -z combreloc -z now -z relro                             combine and sort reloc sections */
elf_i386.xwe   /* Script for -z combreloc -z now -z relro -z separate-code            combine and sort reloc sections with separate code segment */
tc@E310:~$

I couldn't find anything that explained when and why you might want to use one over another.

What I think I know is:
1. elf_i386.xce  This is the current default for the linker in TC10. The  "with separate code segment"
                 part of the description probably accounts for the larger executables.

2. elf_i386.xc   This is the version the linker used in TC4. It produced smaller executables. The most
                 dramatic difference is when compiling programs that are small to begin with.

3. elf_i386.xbn  This version appears to produce the smallest executables. The description reminds me of the
                 old DOS  .com  programs that were limited to 64K because they shared a code and data segment.
                 The  "don't align data"  part of the description may indicate it's not the best choice
                 for high performance applications.


I ran a few test compilations with the 3 scripts. The columns show the unstripped and sstripped sizes
with a plain, an added -fno-plt optimization, and an added -fno-plt -flto optimization.

This is a very small program that generates the rotating dash for Tinycore scripts:
Code: [Select]
gcc -flto -march=i486 -mtune=i686 -Os -pipe -Wall -fno-plt rotdash.c -o rotdash -Wl,-T/usr/local/lib/ldscripts/elf_i386.xbn

script          rotdash  sstripped  -fno-plt  sstripped    -flto  sstripped
elf_i386.xce      15324      12332     15304      12312    15276      12312
elf_i386.xc        7132       4140      7112       4120     7084       4120
elf_i386.xbn       4704       1656      4640       1592     4612       1592

This is a small program for hiding/moving the mouse cursor. It's also linked to some X libraries:
Code: [Select]
gcc -flto -fuse-linker-plugin -march=i486 -mtune=i686 -Os -g -pipe -Wall -Wextra -fno-plt -c HideMouse.c
gcc -Wl,-T/usr/local/lib/ldscripts/elf_i386.xbn -I. -L. HideMouse.o -o HideMouse -lX11 -lXfixes

script        HideMouse  sstripped  -fno-plt  sstripped    -flto  sstripped
elf_i386.xce      23660      12376     23608      12312    21540      12312
elf_i386.xc       15468       4184     15416       4120    13348       4120
elf_i386.xbn      14560       3284     14352       3064    12272       3048

This is a modestly sized screen grabber program. The  -fwhole-program  doesn't leave much for  -flto  to do:
Code: [Select]
gcc -flto -march=i486 -mtune=i686 -Os -g -pipe -Wall -Wextra -fno-plt -fwhole-program -c grabber.c
gcc -Wl,-T/usr/local/lib/ldscripts/elf_i386.xbn -I. -L. grabber.o -o grabber -lImlib2 -lX11 -lXfixes

script          grabber  sstripped  -fno-plt  sstripped    -flto  sstripped
elf_i386.xce      48192      16624     47984      16408    40052      16408
elf_i386.xc       44096      12528     43888      12312    35956      12312
elf_i386.xbn      42184      10620     41392       9820    33444       9804

Given a screw size and depth of thread, returns a chart containing the closest sized drills and the depth
of thread they would produce. It includes a large data structure of fractional, number, and letter drill sizes:
Code: [Select]
gcc -flto -march=i486 -mtune=i686 -Os -g -pipe -Wall -Wextra -fno-plt TapDrill.c -o TapDrill -Wl,-T/usr/local/lib/ldscripts/elf_i386.xbn

script         TapDrill  sstripped  -fno-plt  sstripped    -flto  sstripped
elf_i386.xce      31360      18592     31344      18560    31920      18528
elf_i386.xc       27264      14496     27248      14464    27824      14432
elf_i386.xbn      26872      14048     26792      13952    27368      13920

gpicview is described as:
A Simple and Fast GTK2 based Image Viewer
It is from the LXDE desktop package.
Code: [Select]
CFLAGS=" -flto -march=i486 -mtune=i686 -Os -pipe -fno-plt"
LDFLAGS="-Wl,-T/usr/local/lib/ldscripts/elf_i386.xbn"

script         gpicview  sstripped  -fno-plt  sstripped    -flto  sstripped
elf_i386.xce      93856      72278     88480      66902    82136      62452
elf_i386.xc       89760      68182     84384      62806    78040      58356
elf_i386.xbn      87416      65782     83256      61622    75728      55988

    [EDIT]: Added gpicview table.  Rich
« Last Edit: March 28, 2020, 07:44:32 AM by Rich »

Offline Rich

  • TinyCore Moderator
  • Hero Member
  • *****
  • Posts: 7328
Re: GCC compiler and linker options
« Reply #1 on: March 16, 2020, 09:16:09 PM »
I decided to rerun the tests using CorePure64-10.1 to see how these options fare in a 64 bit environment.

Code: [Select]
gcc -flto -mtune=generic -Os -pipe -Wall -fno-plt rotdash.c -o rotdash -Wl,-T/usr/local/lib/ldscripts/elf_x86_64.xbn

script            rotdash  sstripped  -fno-plt  sstripped    -flto  sstripped
elf_x86_64.xce      16656      12368     16432      12328    16400      12328
elf_x86_64.xc        8464       4176      8240       4136     8208       4136
elf_x86_64.xbn       6952       2576      6576       2384     6544       2384

Code: [Select]
gcc -flto -mtune=generic -Os -g -pipe -Wall -Wextra -fno-plt -c HideMouse.c
gcc -Wl,-T/usr/local/lib/ldscripts/elf_x86_64.xbn -I. -L. HideMouse.o -o HideMouse -lX11 -lXfixes

script          HideMouse  sstripped  -fno-plt  sstripped    -flto  sstripped
elf_x86_64.xce      26056      12456     25752      12328    23056      12328
elf_x86_64.xc       21960       8360     21656       8232    18960       8232
elf_x86_64.xbn      18264       4672     17712       4296    15008       4288

Code: [Select]
gcc -flto -mtune=generic -Os -g -pipe -Wall -Wextra -fno-plt -fwhole-program -c grabber.c
gcc -Wl,-T/usr/local/lib/ldscripts/elf_x86_64.xbn -I. -L. grabber.o -o grabber -lImlib2 -lX11 -lXfixes

script            grabber  sstripped  -fno-plt  sstripped    -flto  sstripped
elf_x86_64.xce      57584      21296     56968      20848    46360      20520
elf_x86_64.xc       53488      17200     52872      16752    42264      16424
elf_x86_64.xbn      49832      13552     48800      12688    38488      12656

Code: [Select]
gcc -flto -mtune=generic-Os -g -pipe -Wall -Wextra -fno-plt TapDrill.c -o TapDrill -Wl,-T/usr/local/lib/ldscripts/elf_x86_64.xbn

script           TapDrill  sstripped  -fno-plt  sstripped    -flto  sstripped
elf_x86_64.xce      41136      25840     40896      25776    41680      25712
elf_x86_64.xc       37040      21744     36800      21680    37584      21616
elf_x86_64.xbn      33576      18192     33112      17904    33928      17872

I didn't feel like running all the combinations for gpicview, so here are the highlights, sstripped numbers only:
Code: [Select]
CFLAGS=" -mtune=generic -Os -pipe "
LDFLAGS="-Wl,-O1"
90530

CFLAGS=" -mtune=generic -Os -pipe "
LDFLAGS="-Wl,-T/usr/local/lib/ldscripts/elf_x86_64.xbn"
83362

CFLAGS=" -flto -mtune=generic -Os -pipe -fno-plt"
LDFLAGS="-Wl,-T/usr/local/lib/ldscripts/elf_x86_64.xbn"
70096

The improvements were very similar to the 32 bit tests.

I would say using the  .xbn  linker script and  sstrip  by far provided the largest reduction in executable size.
Adding  -fno-plt  to that saved between 64 and 800 bytes for 32 bits and between 192 and 864 bytes for 64 bits.
Adding  -flto  to the mix only saved between 0 and 32 additional bytes for both 32 and 64 bits.

    [EDIT]: Added results for gpicview.  Rich
« Last Edit: March 28, 2020, 08:05:56 AM by Rich »

Offline andyj

  • Hero Member
  • *****
  • Posts: 789
Re: GCC compiler and linker options
« Reply #2 on: March 17, 2020, 04:44:10 AM »
Have you done any speed tests?
How about with larger executable images?
Does space saving scale or flatten out?
How is compile time affected?
Have you tried this on the kernel?
It would be interesting to see how Xorg and DRM/3D would be affected. I could probably get around to trying it out on some databases or web servers, but I'd need to find some benchmark suites first.

Offline Juanito

  • Administrator
  • Hero Member
  • *****
  • Posts: 11471
Re: GCC compiler and linker options
« Reply #3 on: March 17, 2020, 07:10:45 AM »
In testing a few years back, I found -flto made large apps 5-10% smaller (except for static libs, which it makes 10x bigger).

Removing "-g -O2", which autotools puts in most Makefiles also makes things smaller.

I see librsvg now needs rustc and is 3x larger...

Offline Rich

  • TinyCore Moderator
  • Hero Member
  • *****
  • Posts: 7328
Re: GCC compiler and linker options
« Reply #4 on: March 17, 2020, 07:53:52 AM »
Hi andyj
Have you done any speed tests?
No.

Quote
How about with larger executable images?
Not yet. I'm going to try with  lshw  and  gtk-lshw.  The current executables are about 944K and 1228K respectively and were compiled
with the  -flto  option.

On a side note, I read on StackOverflow that  -flto  could be used in place of  -flto -fuse-linker-plugin  because it now causes
-fuse-linker-plugin  to be used automatically.  I tested it and adding  -fuse-linker-plugin  to  -flto  made no difference.


Quote
Does space saving scale or flatten out?
The last 2 programs (grabber and TapDrill) are similar in size and complexity. One difference is TapDrill contains an array of
386  16 byte structures (6167 Bytes) preloaded with text and floating point values. I suspect there's not much that can be
optimized in that part of the program.

Quote
How is compile time affected?
I didn't time it.

I wouldn't expect the linker script to make any difference. It just provides guidance on gaps between sections, location of sections,
section alignment, etc. It doesn't deal with code optimization.

The  -fno-plt  just strips out some stub code. The end effect is (from gcc.org) "all external symbols are resolved at load time". That
means load time may be slower. Whether that's noticeable is another question. It also frees up a register in 32 bit x86.

Quote
Have you tried this on the kernel?
The only kernel tinkering I've done is through  make menuconfig.

Quote
It would be interesting to see how Xorg and DRM/3D would be affected. I could probably get around to trying it out on some databases or web servers, but I'd need to find some benchmark suites first.
Remember what I said about the  .xbn  script:
3. elf_i386.xbn  This version appears to produce the smallest executables. The description reminds me of the
                 old DOS  .com  programs that were limited to 64K because they shared a code and data segment.
                 The  "don't align data"  part of the description may indicate it's not the best choice
                 for high performance applications.
If there is a serious performance hit, go with the  .xc.  script.

Offline Rich

  • TinyCore Moderator
  • Hero Member
  • *****
  • Posts: 7328
Re: GCC compiler and linker options
« Reply #5 on: March 28, 2020, 08:40:43 AM »
Hi andyj
Quote
How about with larger executable images?
Not yet. I'm going to try with  lshw  and  gtk-lshw.  The current executables are about 944K and 1228K respectively and were compiled
with the  -flto  option.
Turns out the  lshw  package ignores any  CFLAGS  and  CXXFLAGS  values you set, and I don't feel like breaking the  Makefile.

I did do some tests on  gpicview.  Results are appended to the first 2 posts. There was 1 interesting surprise.

Previous examples had this type of result:

script          grabber  sstripped  -fno-plt  sstripped    -flto  sstripped
elf_i386.xce      48192      16624     47984      16408    40052      16408
elf_i386.xc       44096      12528     43888      12312    35956      12312
elf_i386.xbn      42184      10620     41392       9820    33444       9804

Big gain going from  .xce (16624)  to  .xbn (10620).  Adding  -fno-plt  and  -flto  shrunk it a tiny bit more (9804).

The  gpicview  program responded differently:

script         gpicview  sstripped  -fno-plt  sstripped    -flto  sstripped
elf_i386.xce      93856      72278     88480      66902    82136      62452
elf_i386.xc       89760      68182     84384      62806    78040      58356
elf_i386.xbn      87416      65782     83256      61622    75728      55988

Modest gain going from  .xce (72278)  to  .xbn (65782).  Adding  -fno-plt  and  -flto  shrunk it quite a bit more (55988).

The 64 bit version of  gpicview  had similar results:
Modest gain going from  .xce (90530)  to  .xbn (83362).  Adding  -fno-plt  and  -flto  shrunk it quite a bit more (70096).