Tiny Core Linux

Tiny Core Extensions => TCE Talk => Topic started by: GNUser on May 16, 2024, 02:41:04 PM

Title: upx for large executables?
Post by: GNUser on May 16, 2024, 02:41:04 PM

neonix mentioned compressing with upx -9 in a different thread. I decided to investigate.

The largest extension on my system is brave-browser.tcz at 180 MiB.

If I unsquash brave-browser.tcz, compress just the main brave executable using upx -9, and recreate brave-browser.tcz, size of the extension drops to 152 MiB. I'm testing the shrunken extension now and it's working fine. Importantly, upx.tcz does not need to be loaded for the shrunken brave-browser.tcz to work.

Should I be using the upx (or similar) compression tool to reduce the size of the large extensions that I maintain (e.g., brave-browser.tcz)? Are there any downsides? I'm very interested in hearing the opinion of the TCL developers on this.

Title: Re: upx for large executables?
Post by: GNUser on May 16, 2024, 02:45:16 PM

In the real world there's no such thing as a free lunch, so there must be a downside somewhere...

Title: Re: upx for large executables?
Post by: hiro on May 16, 2024, 05:09:19 PM

generally, most higher compression methods also take much longer to decompress, especially when the implementation isn't multithreaded...
personally i don't like to go harder than lz4. this algorithm is efficient enough that i don't waste too many cpu cycles, but it compresses most stuff well enough for my needs.

Title: Re: upx for large executables?
Post by: Vaguiner on May 16, 2024, 05:22:52 PM

Perhaps it might be allowed to change the block size, maybe even the compression to zstd, for specific extensions that would benefit from this.

Brave-broser with 64kb (default is 4kb) gets 155MB.

Title: Re: upx for large executables?
Post by: GNUser on May 16, 2024, 05:36:33 PM

Hi hiro. Decompression overhead is a good point. I saw your post in the other thread. You're right--having upx inside a gzip-compressed squashfs sounds overcomplicated.

It seems best compression algorithm for tcz has been discussed at length over the years, including in the "Into the Core" book (chapter 18). I'm sorry for beating on this dead horse some more ;D

Hi CardealRusso. Tweaking block size has been discussed before but not pursued due to added complexity (keeping track of which block size to use on which architecture).

I'm going to continue to keep it simple with the extensions that I create and avoid being too clever.

Title: Re: upx for large executables?
Post by: Rich on May 16, 2024, 05:39:30 PM

Hi GNUser
I think an executable in a squash file system can load
only the code it needs as it's executing because the
the driver decompresses and fetches the data for the
operating system transparently.

I suspect UPX basically creates a self extracting compressed
file with a loader. This suggests the entire file gets loaded
immediately into RAM even if all sections of code are not
yet (or ever) needed.

Title: Re: upx for large executables?
Post by: GNUser on May 16, 2024, 05:46:56 PM

Quote from: Rich on May 16, 2024, 05:39:30 PM

I suspect UPX basically creates a self extracting compressed
file with a loader.

Hi Rich. Based on my cursory reasearch that sounds about right.

Quote from: Rich on May 16, 2024, 05:39:30 PM

This suggests the entire file gets loaded
immediately into RAM even if all sections of code are not
yet (or ever) needed.

Yikes. We definitely don't want that.

It's so easy to just focus on the size of the extension and get carried away into all kinds of trouble!

Title: Re: upx for large executables?
Post by: Vaguiner on May 16, 2024, 05:54:45 PM

Quote from: GNUser on May 16, 2024, 05:36:33 PM

...Tweaking block size has been discussed before...

I may be mistaken but the discussion (which I took part in) was about changing the default block size, not specific cases.
I hope to hear from an administrator, as I'm also interested.

Title: Re: upx for large executables?
Post by: hiro on May 16, 2024, 08:28:57 PM

increasing the block size will lead to more stuff being loaded into ram unnecessarily. there's a good reason it's at 4k right now.
more useful would be to figure out how to minimize data getting cached twice (once in compressed form and once uncompressed).
but there's diminishing returns in all of this. our current system is plenty good already, maybe find something else to hack on :P

Title: Re: upx for large executables?
Post by: curaga on May 17, 2024, 04:01:03 AM

Quote from: CardealRusso on May 16, 2024, 05:54:45 PM

I may be mistaken but the discussion (which I took part in) was about changing the default block size, not specific cases.
I hope to hear from an administrator, as I'm also interested.

The block size was fixed at 4k for the 32bit platforms, but larger sizes are acceptable on the 64-bit ones. The RAM hit is quite different on a 64mb system vs a 4gb one. That is, if you're maintaining some big extension for x86-64 or arm64, and it significantly benefits, it's fine to send it with a larger block size.

Title: Re: upx for large executables?
Post by: neonix on May 17, 2024, 10:34:27 AM

upx -9 is gzip compression. All x86_64 cpus are so powerfull that you don't see a difference with gzip.

Try upx --ultra-brute and tell us what is the result when you use browser for many hours. How much RAM you have in your PC? How much RAM typical 64-bit TC user has?

On old wiki there was recomendation to use upx when you create new tczs. But it was in upx-9 era, now we have lzma era.

If you want for sure, how upx works, you need to read documentation or upx source code.
As far as I remember, compressed (upxed) program is decompressed in RAM on the fly (I don' sure about lzma) and it don't impact RAM in negative way. More pros than cons.

There's no way to change anything in tcz block compression, there's more cons than pros.

Title: Re: upx for large executables?
Post by: hiro on May 17, 2024, 10:38:49 AM

can you back it up with real numbers?
i still didn't see any meaningful benchmarks.
i assume it's because you didn't try it out at all, you're just running your ideas by us. but next time, if you don't intend to run the experiment please at least put this into a disclaimer and don't pretend you know already the outcome of the experiment.

Title: Re: upx for large executables?
Post by: GNUser on May 17, 2024, 10:49:45 AM

Hi CardealRusso. I can reproduce your experiment: If I squash brave-browser without specifying block size (i.e., using the 4K default), extension size is 180 MiB. If I squash using 64K block size, extension size is 155 MiB.

Hi curaga. I will follow your advice. For large extensions being submitted for x86_64 repo, I will check if there is significant benefit to using a block size larger than the 4K default. Is there a block size you would recommend as a sweet spot for x86_64 architecture? 64K? 128K?

Title: Re: upx for large executables?
Post by: curaga on May 17, 2024, 11:07:44 AM

It depends on the data being compressed, can't really give a number.

Title: Re: upx for large executables?
Post by: GNUser on May 17, 2024, 11:15:21 AM

Fair enough. Is there at least a range of sizes that you would consider to be reasonable?

EDIT: I just need a range to guide experimentation. I don't want to go above a block size that would be considered ridiculous ;D

Title: Re: upx for large executables?
Post by: curaga on May 17, 2024, 11:18:27 AM

The mksquashfs program supports up to 1mb, and if 1mb gives a good benefit over 512k, it's fine to use. The entire range is ok for 64-bit really.

Title: Re: upx for large executables?
Post by: GNUser on May 17, 2024, 11:24:41 AM

Duly noted. Thanks!

Title: Re: upx for large executables?
Post by: hiro on May 17, 2024, 11:55:28 AM

it depends on how the data is going to be accessed. as an example, if there's a huge video file embedded in the binary that plays an intro every time you open the application, in raw video format, it will be accessed sequentially in a big batch at startup and would benefit from an efficient high blocksize form of compression. even more, it would benefit from a lossy video codec.

these are 2 extremes.

a third extreme is if a binary is full of pre-computed lookuptables or something like that, and only a few entries are ever being looked at. then it would make sense to keep the blocksize small enough that most blocks with actual runnable instructions is mostly containing just those, and not whatever other never accessed data.

or, a program is extremely bloated and it contains everything and their grandmother: then you might be unlucky&lucky and most stuff is never accessed, or you might be just unlucky&unlucky and you'll page in everything bec. every second half-block has to be used!

Title: Re: upx for large executables?
Post by: Vaguiner on May 17, 2024, 03:52:21 PM

Quote from: GNUser on May 17, 2024, 11:15:21 AM

Fair enough. Is there at least a range of sizes that you would consider to be reasonable?

a hint

Quote from: CardealRusso on June 23, 2023, 05:38:04 PM

Comp Size Ram Decomp (Real)
zstd(128k) 119MB 391MB 0.37s
lz4(128k) 181MB 381MB 0.35s
gzip(128k) 131MB 379MB 0.32s
zstd(4k) 154MB 385MB 1.12s
gzip(4k) 160MB 374MB 1.15s

Title: Re: upx for large executables?
Post by: GNUser on May 17, 2024, 07:35:39 PM

Interesting, but that's only two block sizes (so not enough to find a sweet spot). But this data suggests a larger block size results in smaller extension and faster decompression, with no significant difference in RAM usage.

Title: Re: upx for large executables?
Post by: hiro on May 17, 2024, 07:37:43 PM

it's not sure what kind of "ram usage" this is measuring.
but i presume it's most likely during bullk extraction of the whole fine (which we do not do), and not the one we actually care about (ram overhead due to block overhead).

though now that i think about it, i don't even know whether either of these ram usages would actually be significant enough for us to even begin bothering about, versus for example performance effects (due to large block access overhead/inefficiencies during actual smaller partial block highly random access scenarios for example).

less third-party random numbers, more first-party testing please. and explain your numbers.

Title: Re: upx for large executables?
Post by: GNUser on May 17, 2024, 07:43:36 PM

Quote from: hiro on May 17, 2024, 07:37:43 PM

it's not sure what kind of "ram usage" this is measuring.

Good point, hiro.

At the end of the day, the amount of effort in benchmarking and defining terms might be much greater than the technical gains.

Title: Re: upx for large executables?
Post by: hiro on May 17, 2024, 07:47:57 PM

if you try it and the benefit can be felt even without careful measurement, then it's likely worth measuring to optimize further down the same route.
in other words: if you find hints, it's good to use that newly found knowledge and investigate further and try to understand the full situation ;)

Title: Re: upx for large executables?
Post by: Vaguiner on May 17, 2024, 08:15:37 PM

Quote from: hiro on May 17, 2024, 07:37:43 PM

it's not sure what kind of "ram usage" this is measuring.

This is the total RAM usage of the system, with different algorithms and block sizes for ungoogled-chromium. This is the memory usage of a fresh start of the system, without running any program other than top.
It's not very accurate, but it definitely gives us some idea.

Title: Re: upx for large executables?
Post by: hiro on May 18, 2024, 02:51:03 AM

tested on tinycorelinux?

Title: Re: upx for large executables?
Post by: Vaguiner on May 18, 2024, 05:53:48 AM

Quote from: hiro on May 18, 2024, 02:51:03 AM

tested on tinycorelinux?

yes

Title: Re: upx for large executables?
Post by: hiro on May 18, 2024, 08:12:12 AM

ah, you did your homework a year ago already :O

trying to analyze the results from your table from your example extension

1) going from gzip at 4k to zstd at 128k potentially reduces the size by 1/4th, at the cost of a bigger change for all existing extensions, giving up the former consistency.
if size isn't super important, i would say in this instance the change isn't worth it at all.
if size is super important you should probably make a dedicated single file with high blocksize and strong compression, for all your extensions and mydata. for this, you could put a script in your exittc process or so...

2) going from 4k to 128k seems to generally halve the time needed for decompression, regardless of compression codec

3) ram usage stays mostly the same for all tested combinations, so that one highly irrelevant.

bonus: on the one hand i'm surprised to see no direct benefit for lz4, on the other hand, i think it's not super realistic to load applications and not use them.

i would propose a complete end-to-end test, where you measure the boot time plus loading that webbrowser extension plus how long it takes to also automatically navigate a webbrowser to some bloated website...

Title: Re: upx for large executables?
Post by: neonix on May 20, 2024, 05:56:25 AM

Does upx can compress dll files?

Title: Re: upx for large executables?
Post by: neonix on May 24, 2024, 06:50:01 AM

It looks like upx use it's own ucl algorithm not gzip.

Not many people know that upx can also compress shared libraries and static libraries.

I tried to compress libfltk.so.1.3 with new version of upx -9 and I get about 50% reduction. After compression with squashfs the differences of new tcz are not great 50kb.

upx --lzma doesn' t work with libfktk.so.1.3 because it result with Trace/breakpoint trap. I also tried compress libraries in /lib and /usr/lib with --android-shlib argument but get PT_Note error.

I also discovered that Linux kernel supports ko.xz compression.

Title: Re: upx for large executables?
Post by: nick65go on May 24, 2024, 04:02:55 PM

IMHO, we should first define the "environment" where we want to use the "optimizations", like UPX-compression which was intended for small size of executables (initially was used in Windows for pseudo protected/obfuscated executables, to hide their resources and code against disassembling/debugging).

If we want "common" scripts etc for 32 bits and 64 bits (admin hiro style), then will be AVERAGE (not optimum) for a specific environment.

1. if we use a powerful environment, like SMP -- multithreading, multicore etc 64 bits CPU, with fast SDD --not HDD, lot of RAM (over 4GB).. then UPX, zstd, gzip, does NOT matter too much. The differences are not worth the effort. Time wasted to test/benchmark, re-build TCZ, etc will never be recovered, even when used by 100 users and 100 applications. If you do not believe me, then lets try to estimate/agree how much hypothetically you want to GAIN in speed/size/RAM-usage etc (be it relative %, or absolute values), versus how much you want to INVEST to succeed as time/money/pride etc. So basically to define the target and budget.

2. if we use 32 bits, slow 486 CPU (tiny core user compatible), not SMP, not multithreading, with slow media like HDD/CDROM, then maybe UPX can be discussed with better success. Because here, in this environment, the small/minuscule gains should matter, for some particular BIG applications. For already small size tcz, it does not matter anyway too much the algorithm or block size.

PS: I hope I did not offend anyone with my comment. For me is about efficiency, small effort for small reward, or big effort for big reward, but not big effort for small reward. YMMV.

Title: Re: upx for large executables?
Post by: neonix on December 30, 2024, 02:38:44 PM

I was tested upx with vivaldi or firefox or opera with 100 MB compressed executible. It looks like when I use lzma compression it use twice RAM space compared to no compression at all, which is unpractical. Probably upx is efficient only when you use ucl algorithm. I don't know how to check this becouse TC have its own complicated cached RAM mechanism.

Comp	Size	Ram	Decomp (Real)
zstd(128k)	119MB	391MB	0.37s
lz4(128k)	181MB	381MB	0.35s
gzip(128k)	131MB	379MB	0.32s
zstd(4k)	154MB	385MB	1.12s
gzip(4k)	160MB	374MB	1.15s