Tiny Core Linux
General TC => General TC Talk => Topic started by: CentralWare on April 17, 2025, 02:42:35 AM
-
Good morning everyone!
I need a little brain-storming to create a function whose job is to determine how to extract an archive file based on the filename.
However, of course, there's a million archive formats out there, so we need to build a laundry list of file extensions along with the command to extract with.
For example:
case $1 in
*.tar) CMD="tar -xf" ;;
*.tar.gz) CMD="tar -zxf" ;;
*.tar.bz) CMD="tar -jxf" ;;
*.tar.bz2) CMD="tar -jxf" ;;
*.7z) CMD="7z x -so" ;;
*.rar) CMD="unrar x -r" ;;
*.zip) CMD="unzip -a" ;;
esac
The goal here is to have a working list of commands of supported software, create extensions if necessary for software we don't already have in the repository and have this function detect whether or not said extensions are currently installed. Please list all of the file extensions you can think of along with their associated command to EXTRACT into the CURRENT directory. The above examples were "off the top of my head" so they may not be completely accurate.
The "ultimate" goal is to have the fewest number of extensions needing to be installed in order to support the largest number of archive formats.
This function will also be supporting ISO, RPM, DEB, CPIO and other containers, we probably won't need to support extensions like CAB for obvious reasons, but there's archives out there such as ACE, ALZ, LZH, etc. which aren't overly common today, but worthwhile to implement if they don't require closed source software.
Any file extensions you can think of that you know how to extract... please feel free to contribute!
LOL - if five people have already listed *.xz) CMD="tar -xf" ;; please refrain from adding a sixth! :)
-
My first comment on this, that you can use the pipe char to reducing repetition in your case statement which works as an "OR" like this:
*.tar.bz|*.tar.bz2) CMD="tar -jxf" ;;
-
I use wimextract to extract some fonts from the iso.
Using a computer that have a M$ license ;)
I installed wimlib in my Arch linux system to get the wimextract program.
So the format wim and cab can be usefull, even if "we" don't want to support the M$ Corp.
wimextract install.wim 1 /Windows/{Fonts/"*".{ttf,ttc},System32/Licenses/neutral/"*"/"*"/license.rtf} --dest-dir /home/patrik/fonts
-
My first comment on this, that you can use the pipe char to reducing repetition in your case statement which works as an "OR" like this:
*.tar.bz|*.tar.bz2) CMD="tar -jxf" ;;
Yes, but for sake of being able to quickly "see" what packages we have on the list vs. what needs to be completed, I currently have two lists going; one for *.tar.gz and other fully listed archives, and a separate list for hybrid/combined extensions such as .tgz - they're the same commands, I'm just trying to cover all bases at the start and we'll combine them once we're ready to push the changes upstream. Or not... We'll see where things land in the end! :)
When combining similar archives, LZH and LHA would go together, but the initial list will be sorted alphabetically to ensure we get all of the necessary packages in here so "OR"ing the list is a little premature just yet.
As for M$, I'm doubtful they'll be putting out too many 'nix based source code packages (which is what this archive project is focused on - extracting people's source code and support packages) so if I (ahem) accidentally leave out a couple which weren't intended for our side of the fence, I'm pretty certain they won't be missed. :)
-
IMHO, I think 7-zip (ver 24.09) linux version, covers "everything". https://www.7-zip.org/download.html
Please read its help for confirmation that any format can be manipulated from command line.
Supported formats
Format Creation Filename Extensions
7z X 7z
BZIP2 X bz2 bzip2 tbz2 tbz
GZIP X gz gzip tgz
TAR X tar
WIM X wim swm esd
XZ X xz txz
ZIP X zip zipx jar xpi odt ods docx xlsx epub
APFS apfs
APM apm
AR ar a deb lib
ARJ arj
Base64 b64
CAB cab
CHM chm chw chi chq
COMPOUND msi msp doc xls ppt
CPIO cpio
CramFS cramfs
DMG dmg
Ext ext ext2 ext3 ext4 img
FAT fat img
HFS hfs hfsx
HXS hxs hxi hxr hxq hxw lit
iHEX ihex
ISO iso img
LZH lzh lha
LZMA lzma
MBR mbr
MsLZ mslz
Mub mub
NSIS nsis
NTFS ntfs img
MBR mbr
RAR rar r00
RPM rpm
PPMD ppmd
QCOW2 qcow qcow2 qcow2c
SPLIT 001 002 ...
SquashFS squashfs
UDF udf iso img
UEFIc scap
UEFIs uefif
VDI vdi
VHD vhd
VHDX vhdx
VMDK vmdk
XAR xar pkg
Z z taz
ZSTD zst tzst
==FYI:== has minimum dependencies and its size =1.9MB on x86 arch, lovely :)
https://pkgs.alpinelinux.org/package/edge/main/x86/7zip
-
IMHO, I think 7-zip (ver 24.09) linux version, covers "everything". https://www.7-zip.org/download.html
I already have a completed builder for 7zip but honestly never looked at the final file size ???
Version: 24.09 Stable
Patched...
...compiling... (ding!)
Stripped (x86_64) looks to be about 3.5Mb - but even so... might be worth packing into the builder system!?
-
UPX that and you may make it even smaller :)
-
UPX that and you may make it even smaller :)
Last I knew, UPX isn't lossless. Any problems experienced in its use with 7Z and/or 7Z compiled with ASM injections?
Yes, I get about a 64% reduction in physical file size using -9 but I'm curious at what cost, if any.
Appreciate the ongoing communications!
-
Hi CentralWare
Their github page is titled:
UPX: the Ultimate Packer for eXecutables
If that's true, the compression must be lossless, otherwise the
executables would never execute again.
-
Stripped (x86_64) looks to be about 3.5Mb - but even so... might be worth packing into the builder system!?
I do not know how "others" compile/link/optimize (gcc 15?, llvm?), but their size is less than yours 3.5MB (not UPX-ed)
[root@HP17 Arch]# du -h /usr/lib/7zip/7z
644K /usr/lib/7zip/7z
[root@HP17 Arch]# ls -al /usr/lib/7zip/7z
-rwxr-xr-x 1 root root 657704 Dec 25 16:53 /usr/lib/7zip/7z
[root@HP17 Arch]# ldd /usr/lib/7zip/7z
linux-vdso.so.1 (0x00007ffce31ec000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x000077a44aa00000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x000077a44ae26000)
libc.so.6 => /usr/lib/libc.so.6 (0x000077a44a80e000)
libm.so.6 => /usr/lib/libm.so.6 (0x000077a44ad2e000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x000077a44af13000)
HP17 [/home/Alpine]# du -h /usr/bin/7zz
1.7M /usr/bin/7zz
HP17 [/home/Alpine]# ls -al /usr/bin/7zz
-rwxr-xr-x 1 root root 1780088 Dec 25 20:02 /usr/bin/7zz
HP17 [/home/Alpine]# ldd /usr/bin/7zz
/lib/ld-musl-x86_64.so.1 (0x7ac413c8b000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7ac413800000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7ac4137d4000)
libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7ac413c8b000)
-
HP17 [/home/Alpine]# 7z i
7-Zip (z) 24.09 (x64) : Copyright (c) 1999-2024 Igor Pavlov : 2024-11-29
64-bit locale=en_US.UTF-8 Threads:12 OPEN_MAX:1024
Formats:
C...F..........c.a.m+.. 7z 7z 7 z BC AF ' 1C
...................... APFS apfs img offset=32 N X S B 00
...................... APM apm E R
...................... Ar ar a deb udeb lib ! < a r c h > 0A
...................... Arj arj ` EA
K.....O.....X......... Base64 b64
......O............... COFF obj
...F.................. Cab cab M S C F 00 00 00 00
...................... Chm chm chi chq chw I T S F 03 00 00 00 ` 00 00 00
...................... Compound msi msp doc xls ppt D0 CF 11 E0 A1 B1 1A E1
....M................. Cpio cpio 0 7 0 7 0 || C7 q || q C7
...................... CramFS cramfs offset=16 C o m p r e s s e d 20 R O M F S
.....G..B............. Dmg dmg k o l y 00 00 00 04 00 00 02 00
.........E............ ELF elf E L F
...................... Ext ext ext2 ext3 ext4 img offset=1080 S EF
...................... FAT fat img offset=510 U AA
...................... FLV flv F L V 01
...................... GPT gpt mbr offset=512 E F I 20 P A R T 00 00 01 00
....M................. HFS hfs hfsx offset=1024 B D || H + 00 04 || H X 00 05
...F.................. Hxs hxs hxi hxr hxq hxw lit I T O L I T L S 01 00 00 00 ( 00 00 00
......O............... IHex ihex
...................... Iso iso img offset=32769 C D 0 0 1
...................... LP lpimg img offset=4096 g D l a 4 00 00 00
...................... Lzh lzh lha offset=2 - l h
.......P.............. MBR mbr
....M....E............ MachO macho CE FA ED FE || CF FA ED FE || FE ED FA CE || FE ED FA CF
...................... MsLZ mslz S Z D D 88 F0 ' 3 A
....M................. Mub mub CA FE BA BE 00 00 00 || B9 FA F1 0E
...................... NTFS ntfs img offset=3 N T F S 20 20 20 20 00
...F.G................ Nsis nsis offset=4 EF BE AD DE N u l l s o f t I n s t
.........E............ PE exe dll sys M Z
...................... Ppmd pmd 8F AF AC 84
...................... QCOW qcow qcow2 qcow2c Q F I FB 00 00 00
...................... Rpm rpm ED AB EE DB
K..................... SWF swf F W S
....M................. SWFc swf (~.swf) C W S || Z W S
...................... Sparse simg img : FF & ED 01 00
...................... Split 001
....M................. SquashFS squashfs h s q s || s q s h || s h s q || q s h s
.........E............ TE te V Z
...FM................. UEFIc scap BD 86 f ; v 0D 0 @ B7 0E B5 Q 9E / C5 A0 || 8B A6 < J # w FB H 80 = W 8C C1 FE C4 M || B9 82 91 S B5 AB 91 C B6 9A E3 A9 C F7 / CC
...FM................. UEFIf uefif offset=16 D9 T 93 z h 04 J D 81 CE 0B F6 17 D8 90 DF || x E5 8C 8C = 8A 1C O 99 5 89 a 85 C3 - D3
....M.O............... Udf udf iso img offset=32768 00 B E A 0 1 01 00 || 01 C D 0 0 1
...................... VDI vdi offset=64 10 DA BE
.....G................ VHD vhd c o n e c t i x 00 00
...................... VHDX vhdx avhdx v h d x f i l e
...................... VMDK vmdk K D M V
...................... Xar xar pkg xip x a r ! 00
...................... Z z taz (.tar) 1F 9D
CK..................... bzip2 bz2 bzip2 tbz2 (.tar) tbz (.tar) B Z h
CK.................m+.. gzip gz gzip tgz (.tar) tpz (.tar) apk (.tar) 1F 8B 08
K.....O............... lzma lzma
K..................... lzma86 lzma86
C......O...LH......m+.. tar tar ova offset=257 u s t a r
C.SN.......LH..c.a.m+.. wim wim swm esd ppkg M S W I M 00 00 00
CK..................... xz xz txz (.tar) FD 7 z X Z 00
C...FMG........c.a.m+.. zip zip z01 zipx jar xpi odt ods docx xlsx epub ipa apk appx P K 03 04 || P K 05 06 || P K 06 06 || P K 07 08 P K || P K 0 0 P K
K..................... zstd zst tzst (.tar) ( B5 / FD
CK.....O.....XC........ Hash sha256 sha512 sha384 sha224 sha3-256 sha1 sha md5 blake2sp xxh64 crc32 crc64 asc cksum
-
@nick65go: The only thing I disabled in the compilation was RAR+ (DISABLE_RAR_COMPRESSION), everything else is retained.
TCL14x64, GCC 12.2.0
7zz 3,485,048 Bytes
UPX'ed 1,088,756 Bytes
7-Zip (z) 24.09 (x64) : Copyright (c) 1999-2024 Igor Pavlov : 2024-11-29
64-bit locale=C UTF8=- Threads:40 OPEN_MAX:1024
Formats:
C...F..........c.a.m+.. 7z 7z 7 z BC AF ' 1C
...................... APFS apfs img offset=32 N X S B 00
...................... APM apm E R
...................... Ar ar a deb udeb lib ! < a r c h > 0A
...................... Arj arj ` EA
K.....O.....X......... Base64 b64
......O............... COFF obj
...F.................. Cab cab M S C F 00 00 00 00
...................... Chm chm chi chq chw I T S F 03 00 00 00 ` 00 00 00
...................... Compound msi msp doc xls ppt D0 CF 11 E0 A1 B1 1A E1
....M................. Cpio cpio 0 7 0 7 0 || C7 q || q C7
...................... CramFS cramfs offset=16 C o m p r e s s e d 20 R O M F S
.....G..B............. Dmg dmg k o l y 00 00 00 04 00 00 02 00
.........E............ ELF elf E L F
...................... Ext ext ext2 ext3 ext4 img offset=1080 S EF
...................... FAT fat img offset=510 U AA
...................... FLV flv F L V 01
...................... GPT gpt mbr offset=512 E F I 20 P A R T 00 00 01 00
....M................. HFS hfs hfsx offset=1024 B D || H + 00 04 || H X 00 05
...F.................. Hxs hxs hxi hxr hxq hxw lit I T O L I T L S 01 00 00 00 ( 00 00 00
......O............... IHex ihex
...................... Iso iso img offset=32769 C D 0 0 1
...................... LP lpimg img offset=4096 g D l a 4 00 00 00
...................... Lzh lzh lha offset=2 - l h
.......P.............. MBR mbr
....M....E............ MachO macho CE FA ED FE || CF FA ED FE || FE ED FA CE || FE ED FA CF
...................... MsLZ mslz S Z D D 88 F0 ' 3 A
....M................. Mub mub CA FE BA BE 00 00 00 || B9 FA F1 0E
...................... NTFS ntfs img offset=3 N T F S 20 20 20 20 00
...F.G................ Nsis nsis offset=4 EF BE AD DE N u l l s o f t I n s t
.........E............ PE exe dll sys M Z
...................... Ppmd pmd 8F AF AC 84
...................... QCOW qcow qcow2 qcow2c Q F I FB 00 00 00
...F.................. Rar rar r00 R a r ! 1A 07 00
...F.................. Rar5 rar r00 R a r ! 1A 07 01 00
...................... Rpm rpm ED AB EE DB
K..................... SWF swf F W S
....M................. SWFc swf (~.swf) C W S || Z W S
...................... Sparse simg img : FF & ED 01 00
...................... Split 001
....M................. SquashFS squashfs h s q s || s q s h || s h s q || q s h s
.........E............ TE te V Z
...FM................. UEFIc scap BD 86 f ; v 0D 0 @ B7 0E B5 Q 9E / C5 A0 || 8B A6 < J # w FB H 80 = W 8C C1 FE C4 M || B9 82 91 S B5 AB 91 C B6 9A E3 A9 C F7 / CC
...FM................. UEFIf uefif offset=16 D9 T 93 z h 04 J D 81 CE 0B F6 17 D8 90 DF || x E5 8C 8C = 8A 1C O 99 5 89 a 85 C3 - D3
....M.O............... Udf udf iso img offset=32768 00 B E A 0 1 01 00 || 01 C D 0 0 1
...................... VDI vdi offset=64 10 DA BE
.....G................ VHD vhd c o n e c t i x 00 00
...................... VHDX vhdx avhdx v h d x f i l e
...................... VMDK vmdk K D M V
...................... Xar xar pkg xip x a r ! 00
...................... Z z taz (.tar) 1F 9D
CK..................... bzip2 bz2 bzip2 tbz2 (.tar) tbz (.tar) B Z h
CK.................m+.. gzip gz gzip tgz (.tar) tpz (.tar) apk (.tar) 1F 8B 08
K.....O............... lzma lzma
K..................... lzma86 lzma86
C......O...LH......m+.. tar tar ova offset=257 u s t a r
C.SN.......LH..c.a.m+.. wim wim swm esd ppkg M S W I M 00 00 00
CK..................... xz xz txz (.tar) FD 7 z X Z 00
C...FMG........c.a.m+.. zip zip z01 zipx jar xpi odt ods docx xlsx epub ipa apk appx P K 03 04 || P K 05 06 || P K 06 06 || P K 07 08 P K || P K 0 0 P K
K..................... zstd zst tzst (.tar) ( B5 / FD
CK.....O.....XC........ Hash sha256 sha512 sha384 sha224 sha3-256 sha1 sha md5 blake2sp xxh64 crc32 crc64 asc cksum
Codecs:
4ED 303011B BCJ2
EDF 3030103 BCJ
EDF 3030205 PPC
EDF 3030401 IA64
EDF 3030501 ARM
EDF 3030701 ARMT
EDF 3030805 SPARC
EDF A ARM64
EDF B RISCV
EDF 20302 Swap2
EDF 20304 Swap4
ED 40202 BZip2
ED 0 Copy
ED 40109 Deflate64
ED 40108 Deflate
EDF 3 Delta
ED 21 LZMA2
ED 30101 LZMA
ED 30401 PPMD
EDF 6F10701 7zAES
EDF 6F00181 AES256CBC
Hashers:
4 1 CRC32
16 208 MD5
20 201 SHA1
32 A SHA256
256 231 SHA3-256
48 222 SHA384
64 223 SHA512
8 211 XXH64
8 4 CRC64
32 202 BLAKE2sp
@Rich: =my= definition of lossless is byte for byte. If an audio codec/container claims lossless, it must be able to replicate the original exactly and when converted back out of that container, the resulting file must match the original. File compression... same concept - what goes in, must come out - identical. (My understanding of UPX is it's a one-way street where a file that goes in... can never come out the same as it started. At least not that I've ever read about... but then again, I'm not "on top" of all topics at the moment... I wasn't even aware until a couple hours ago of the Raspberry Pi Pico 2 release - which launched back in August. :) )
-
Hi @CentralWare, you can check yourself with upx.
1. md5 .so or elf or even pe (windows exe) file.
2. upx to compress the file.
3. decompress the compressed file(yes you can also decompress files in upx)
4. md5 the decompressed file, and you can see that the same checksum will be there.
Downside of upx, I think when you upx files, you need little more memory in your computer.
-
3. decompress the compressed file(yes you can also decompress files in upx)
It's been quite a while since I looked at UPX and I don't remember the -d flag way back when, but good to know!
Downside of upx, I think when you upx files, you need little more memory in your computer.
I would have figured as much... as with any compressed system, you need the means to decompress/translate.
The machines this would be used on start at 64GB (12~18 Cores) and work their way up to 256GB (20~36 Cores) so we should be covered there.
In my mind, memory spend has to be roughly the original file size that was compressed along with the stub size to support the extraction.
Okay, my friends, I think we're going to run with this direction! (7zz + UPX)
We'll have it rip through all of the Arch/Alpine extensions and see if it chokes up on anything - if not, it looks like we have a winner!
-
@CentralWare: I am glad for the path forward (7zz).
If you use this 7zz as a BUILDER, on a machine with 12+ cores AND 64GB, then IMHO it a shame that you do not compile 7zz as STATICALY LINKED and with march=mtune=native. The "server" machine is not the "user/destination" machine. BTW: I use also CachyOS, did you read about it (+5%..15% speed)?
On final machine (where you deploy) the packages you can use 7zz suitable for that machine (libc, gcc12, 486 arch, etc). But 7zz compiled for "modern' CPU (x86-64-v3) has beter cpu operations: SSSE3, AVX, so better speed! I read 7z author change log.
BTW: the UPX stub (header to decompress the ELF) is small, and UPX uses in-place decompresion, not use compresed and decompresed size in RAM. Anyway you should not need UPX on a 12_core + 64_GB RAM machine :)
-
Good afternoon, @nick65go!
The machines in question are "21st Century" - so they go both ways!
As needed, they're booted into an x86 kernel and/or a virtualized one. There's also the hopes to build these up into DistCC nodes for cross-compiling, but that depends as to whether it's justified.
Here's the process:
There are a number (currently 7) of Dell and HP "mini" workstations (4 and 8 core units, 16 and 32GB each) intended to handle x86 "on demand."
The 12+ core machines (currently 7 also) are Dell Precision Multi-Xeon v4 and above, intended to manage the x86_64's "on demand."
There's also an array of RasPi and similar SOCs which are lined up for the ARM's Race. Since they're much less powerful, more units were dedicated for the job.
In the event the x86 queue were falling behind, the Precision machines can be tasked to reboot into x86 mode (active) or launch a virtual x86 (still being built.)
The 4/8 Cores are set up for the same - if x64 fell behind, the workstations can boot into x86_64 accordingly.
If we utilize 7zz, it has to be compiled for all of the above platforms and the copy that WE use can be tuned Native... but the builders' job is to compile kernels/extensions that are more "General" population and since each extension has its own list of build dependencies, I'd probably create 7-Zip as a normal extension so the builders can just load it in along with the compiler. (Instead of having a "special" seven and then a "public" one.)
Each extension that's being built forcefully unloads all existing extensions (save for kernel modules, etc. which would be actively in use) to ensure as close to a "clean slate" as possible without rebooting. Considering the Precision servers take almost 45 seconds just in P.O.S.T., rebooting isn't desirable if we can avoid it.
UPX vs. 12+ core machine... no, RAM and Cores aren't an issue for x64 builds so the couple megs difference isn't concerning - and even the "baby boxes" (x86 mini's) as I've come to call them have enough RAM to where we don't have to conserve TOO much even if we had to boot native x86 with a 3.x GB ceiling, but this entire thread has me pondering a few UPX related experiments! (If UPX is what you've noted, why not UPX-compress virtually every binary in every TCL extension? Due to how SquashFS works, to my understanding, a smaller binary footprint would technically allow for a smaller memory spend??)
In any event, thanks for the feedback and take care!
-
Taking about upx, the big binaries from go compiled code, is very compressible.
Tried out that in school when I compiled the go application gitea.
With very good progress.