WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: Archive List  (Read 609 times)

Offline CentralWare

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 843
Archive List
« on: April 17, 2025, 02:42:35 AM »
Good morning everyone!
I need a little brain-storming to create a function whose job is to determine how to extract an archive file based on the filename.
However, of course, there's a million archive formats out there, so we need to build a laundry list of file extensions along with the command to extract with.

For example:
Code: [Select]
case $1 in
   *.tar)       CMD="tar -xf" ;;
   *.tar.gz)    CMD="tar -zxf" ;;
   *.tar.bz)    CMD="tar -jxf" ;;
   *.tar.bz2)   CMD="tar -jxf" ;;
   *.7z)        CMD="7z x -so" ;;
   *.rar)       CMD="unrar x -r" ;;
   *.zip)       CMD="unzip -a" ;;
esac

The goal here is to have a working list of commands of supported software, create extensions if necessary for software we don't already have in the repository and have this function detect whether or not said extensions are currently installed.  Please list all of the file extensions you can think of along with their associated command to EXTRACT into the CURRENT directory.  The above examples were "off the top of my head" so they may not be completely accurate.

The "ultimate" goal is to have the fewest number of extensions needing to be installed in order to support the largest number of archive formats.

This function will also be supporting ISO, RPM, DEB, CPIO and other containers, we probably won't need to support extensions like CAB for obvious reasons, but there's archives out there such as ACE, ALZ, LZH, etc. which aren't overly common today, but worthwhile to implement if they don't require closed source software.

Any file extensions you can think of that you know how to extract...  please feel free to contribute!
LOL - if five people have already listed *.xz)   CMD="tar -xf" ;; please refrain from adding a sixth! :)

Offline patrikg

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 751
Re: Archive List
« Reply #1 on: April 17, 2025, 03:50:32 AM »
My first comment on this, that you can use the pipe char to reducing repetition in your case statement which works as an "OR" like this:

Code: [Select]
   *.tar.bz|*.tar.bz2)      CMD="tar -jxf" ;;

Offline patrikg

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 751
Re: Archive List
« Reply #2 on: April 17, 2025, 04:33:29 AM »
I use wimextract to extract some fonts from the iso.
Using a computer that have a M$ license ;)

I installed wimlib in my Arch linux system to get the wimextract program.

So the format wim and cab can be usefull, even if "we" don't want to support the M$ Corp.

Code: [Select]
wimextract install.wim 1 /Windows/{Fonts/"*".{ttf,ttc},System32/Licenses/neutral/"*"/"*"/license.rtf} --dest-dir /home/patrik/fonts
« Last Edit: April 17, 2025, 04:44:50 AM by patrikg »

Offline CentralWare

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 843
Re: Archive List
« Reply #3 on: April 17, 2025, 05:18:16 AM »
My first comment on this, that you can use the pipe char to reducing repetition in your case statement which works as an "OR" like this:
Code: [Select]
   *.tar.bz|*.tar.bz2)      CMD="tar -jxf" ;;

Yes, but for sake of being able to quickly "see" what packages we have on the list vs. what needs to be completed, I currently have two lists going; one for *.tar.gz and other fully listed archives, and a separate list for hybrid/combined extensions such as .tgz - they're the same commands, I'm just trying to cover all bases at the start and we'll combine them once we're ready to push the changes upstream.  Or not...  We'll see where things land in the end! :)

When combining similar archives, LZH and LHA would go together, but the initial list will be sorted alphabetically to ensure we get all of the necessary packages in here so "OR"ing the list is a little premature just yet.

As for M$, I'm doubtful they'll be putting out too many 'nix based source code packages (which is what this archive project is focused on - extracting people's source code and support packages) so if I (ahem) accidentally leave out a couple which weren't intended for our side of the fence, I'm pretty certain they won't be missed. :)

Offline nick65go

  • Hero Member
  • *****
  • Posts: 862
Re: Archive List
« Reply #4 on: April 17, 2025, 06:48:33 AM »
IMHO, I think 7-zip (ver 24.09) linux version, covers "everything". https://www.7-zip.org/download.html
Please read its help for confirmation that any format can be manipulated from command line.
Supported formats
Format Creation Filename Extensions
7z X 7z
BZIP2 X bz2 bzip2 tbz2 tbz
GZIP X gz gzip tgz
TAR X tar
WIM X wim swm esd
XZ X xz txz
ZIP X zip zipx jar xpi odt ods docx xlsx epub
APFS  apfs
APM  apm
AR  ar a deb lib
ARJ  arj
Base64  b64
CAB  cab
CHM  chm chw chi chq
COMPOUND  msi msp doc xls ppt
CPIO  cpio
CramFS  cramfs
DMG  dmg
Ext  ext ext2 ext3 ext4 img
FAT  fat img
HFS  hfs hfsx
HXS  hxs hxi hxr hxq hxw lit
iHEX  ihex
ISO  iso img
LZH  lzh lha
LZMA  lzma
MBR  mbr
MsLZ  mslz
Mub  mub
NSIS  nsis
NTFS  ntfs img
MBR  mbr
RAR  rar r00
RPM  rpm
PPMD  ppmd
QCOW2  qcow qcow2 qcow2c
SPLIT  001 002 ...
SquashFS  squashfs
UDF  udf iso img
UEFIc  scap
UEFIs  uefif
VDI  vdi
VHD  vhd
VHDX  vhdx
VMDK  vmdk
XAR  xar pkg
Z  z taz
ZSTD  zst tzst

==FYI:== has minimum dependencies and its size =1.9MB on x86 arch, lovely :)
https://pkgs.alpinelinux.org/package/edge/main/x86/7zip


« Last Edit: April 17, 2025, 07:04:37 AM by nick65go »

Offline CentralWare

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 843
Re: Archive List
« Reply #5 on: April 17, 2025, 11:25:46 AM »
IMHO, I think 7-zip (ver 24.09) linux version, covers "everything". https://www.7-zip.org/download.html
I already have a completed builder for 7zip but honestly never looked at the final file size  ???
Version: 24.09 Stable
Patched...
...compiling...  (ding!)

Stripped (x86_64) looks to be about 3.5Mb - but even so...  might be worth packing into the builder system!?

Offline patrikg

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 751
Re: Archive List
« Reply #6 on: April 17, 2025, 12:33:49 PM »
UPX that and you may make it even smaller :)

Offline CentralWare

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 843
Re: Archive List
« Reply #7 on: April 17, 2025, 03:20:43 PM »
UPX that and you may make it even smaller :)
Last I knew, UPX isn't lossless.  Any problems experienced in its use with 7Z and/or 7Z compiled with ASM injections?
Yes, I get about a 64% reduction in physical file size using -9 but I'm curious at what cost, if any.
Appreciate the ongoing communications!

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12096
Re: Archive List
« Reply #8 on: April 17, 2025, 03:28:08 PM »
Hi CentralWare
Their github page is titled:
Quote
UPX: the Ultimate Packer for eXecutables

If that's true, the compression must be lossless, otherwise the
executables would never execute again.

Offline nick65go

  • Hero Member
  • *****
  • Posts: 862
Re: Archive List
« Reply #9 on: April 17, 2025, 05:27:04 PM »
Stripped (x86_64) looks to be about 3.5Mb - but even so...  might be worth packing into the builder system!?
I do not know how "others" compile/link/optimize (gcc 15?, llvm?), but their size is less than yours 3.5MB (not UPX-ed)
Code: [Select]
[root@HP17 Arch]# du -h /usr/lib/7zip/7z
644K    /usr/lib/7zip/7z
[root@HP17 Arch]# ls -al /usr/lib/7zip/7z
-rwxr-xr-x 1 root root 657704 Dec 25 16:53 /usr/lib/7zip/7z
[root@HP17 Arch]# ldd /usr/lib/7zip/7z
        linux-vdso.so.1 (0x00007ffce31ec000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x000077a44aa00000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x000077a44ae26000)
        libc.so.6 => /usr/lib/libc.so.6 (0x000077a44a80e000)
        libm.so.6 => /usr/lib/libm.so.6 (0x000077a44ad2e000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x000077a44af13000)
Code: [Select]
HP17 [/home/Alpine]# du -h /usr/bin/7zz
1.7M    /usr/bin/7zz
HP17 [/home/Alpine]# ls -al /usr/bin/7zz
-rwxr-xr-x    1 root     root       1780088 Dec 25 20:02 /usr/bin/7zz
HP17 [/home/Alpine]# ldd /usr/bin/7zz
        /lib/ld-musl-x86_64.so.1 (0x7ac413c8b000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7ac413800000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7ac4137d4000)
        libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7ac413c8b000)
« Last Edit: April 17, 2025, 05:32:28 PM by nick65go »

Offline nick65go

  • Hero Member
  • *****
  • Posts: 862
Re: Archive List
« Reply #10 on: April 17, 2025, 05:33:55 PM »
Code: [Select]
HP17 [/home/Alpine]# 7z i
7-Zip (z) 24.09 (x64) : Copyright (c) 1999-2024 Igor Pavlov : 2024-11-29
 64-bit locale=en_US.UTF-8 Threads:12 OPEN_MAX:1024
Formats:
   C...F..........c.a.m+..  7z       7z            7 z BC AF ' 1C
    ......................  APFS     apfs img      offset=32 N X S B 00
    ......................  APM      apm           E R
    ......................  Ar       ar a deb udeb lib ! < a r c h > 0A
    ......................  Arj      arj           ` EA
    K.....O.....X.........  Base64   b64
    ......O...............  COFF     obj
    ...F..................  Cab      cab           M S C F 00 00 00 00
    ......................  Chm      chm chi chq chw I T S F 03 00 00 00 ` 00 00 00
    ......................  Compound msi msp doc xls ppt D0 CF 11 E0 A1 B1 1A E1
    ....M.................  Cpio     cpio          0 7 0 7 0  ||  C7 q  ||  q C7
    ......................  CramFS   cramfs        offset=16 C o m p r e s s e d 20 R O M F S
    .....G..B.............  Dmg      dmg           k o l y 00 00 00 04 00 00 02 00
    .........E............  ELF      elf            E L F
    ......................  Ext      ext ext2 ext3 ext4 img offset=1080 S EF
    ......................  FAT      fat img       offset=510 U AA
    ......................  FLV      flv           F L V 01
    ......................  GPT      gpt mbr       offset=512 E F I 20 P A R T 00 00 01 00
    ....M.................  HFS      hfs hfsx      offset=1024 B D  ||  H + 00 04  ||  H X 00 05
    ...F..................  Hxs      hxs hxi hxr hxq hxw lit I T O L I T L S 01 00 00 00 ( 00 00 00
    ......O...............  IHex     ihex
    ......................  Iso      iso img       offset=32769 C D 0 0 1
    ......................  LP       lpimg img     offset=4096 g D l a 4 00 00 00
    ......................  Lzh      lzh lha       offset=2 - l h
    .......P..............  MBR      mbr
    ....M....E............  MachO    macho         CE FA ED FE  ||  CF FA ED FE  ||  FE ED FA CE  ||  FE ED FA CF
    ......................  MsLZ     mslz          S Z D D 88 F0 ' 3 A
    ....M.................  Mub      mub           CA FE BA BE 00 00 00  ||  B9 FA F1 0E
    ......................  NTFS     ntfs img      offset=3 N T F S 20 20 20 20 00
    ...F.G................  Nsis     nsis          offset=4 EF BE AD DE N u l l s o f t I n s t
    .........E............  PE       exe dll sys   M Z
    ......................  Ppmd     pmd           8F AF AC 84
    ......................  QCOW     qcow qcow2 qcow2c Q F I FB 00 00 00
    ......................  Rpm      rpm           ED AB EE DB
    K.....................  SWF      swf           F W S
    ....M.................  SWFc     swf (~.swf)   C W S  ||  Z W S
    ......................  Sparse   simg img      : FF & ED 01 00
    ......................  Split    001
    ....M.................  SquashFS squashfs      h s q s  ||  s q s h  ||  s h s q  ||  q s h s
    .........E............  TE       te            V Z
    ...FM.................  UEFIc    scap          BD 86 f ; v 0D 0 @ B7 0E B5 Q 9E / C5 A0  ||  8B A6 < J # w FB H 80 = W 8C C1 FE C4 M  ||  B9 82 91 S B5 AB 91 C B6 9A E3 A9 C F7 / CC
    ...FM.................  UEFIf    uefif         offset=16 D9 T 93 z h 04 J D 81 CE 0B F6 17 D8 90 DF  ||  x E5 8C 8C = 8A 1C O 99 5 89 a 85 C3 - D3
    ....M.O...............  Udf      udf iso img   offset=32768 00 B E A 0 1 01 00  ||  01 C D 0 0 1
    ......................  VDI      vdi           offset=64  10 DA BE
    .....G................  VHD      vhd           c o n e c t i x 00 00
    ......................  VHDX     vhdx avhdx    v h d x f i l e
    ......................  VMDK     vmdk          K D M V
    ......................  Xar      xar pkg xip   x a r ! 00
    ......................  Z        z taz (.tar)  1F 9D
   CK.....................  bzip2    bz2 bzip2 tbz2 (.tar) tbz (.tar) B Z h
   CK.................m+..  gzip     gz gzip tgz (.tar) tpz (.tar) apk (.tar) 1F 8B 08
    K.....O...............  lzma     lzma
    K.....................  lzma86   lzma86
   C......O...LH......m+..  tar      tar ova       offset=257 u s t a r
   C.SN.......LH..c.a.m+..  wim      wim swm esd ppkg M S W I M 00 00 00
   CK.....................  xz       xz txz (.tar) FD 7 z X Z 00
   C...FMG........c.a.m+..  zip      zip z01 zipx jar xpi odt ods docx xlsx epub ipa apk appx P K 03 04  ||  P K 05 06  ||  P K 06 06  ||  P K 07 08 P K  ||  P K 0 0 P K
    K.....................  zstd     zst tzst (.tar) ( B5 / FD
   CK.....O.....XC........  Hash     sha256 sha512 sha384 sha224 sha3-256 sha1 sha md5 blake2sp xxh64 crc32 crc64 asc cksum

Offline CentralWare

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 843
Re: Archive List
« Reply #11 on: April 18, 2025, 02:39:16 AM »
@nick65go: The only thing I disabled in the compilation was RAR+ (DISABLE_RAR_COMPRESSION), everything else is retained.
TCL14x64, GCC 12.2.0
7zz       3,485,048 Bytes
UPX'ed  1,088,756 Bytes

Code: [Select]
7-Zip (z) 24.09 (x64) : Copyright (c) 1999-2024 Igor Pavlov : 2024-11-29
 64-bit locale=C UTF8=- Threads:40 OPEN_MAX:1024

Formats:
   C...F..........c.a.m+..  7z       7z            7 z BC AF ' 1C
    ......................  APFS     apfs img      offset=32 N X S B 00
    ......................  APM      apm           E R
    ......................  Ar       ar a deb udeb lib ! < a r c h > 0A
    ......................  Arj      arj           ` EA
    K.....O.....X.........  Base64   b64
    ......O...............  COFF     obj
    ...F..................  Cab      cab           M S C F 00 00 00 00
    ......................  Chm      chm chi chq chw I T S F 03 00 00 00 ` 00 00 00
    ......................  Compound msi msp doc xls ppt D0 CF 11 E0 A1 B1 1A E1
    ....M.................  Cpio     cpio          0 7 0 7 0  ||  C7 q  ||  q C7
    ......................  CramFS   cramfs        offset=16 C o m p r e s s e d 20 R O M F S
    .....G..B.............  Dmg      dmg           k o l y 00 00 00 04 00 00 02 00
    .........E............  ELF      elf           E L F
    ......................  Ext      ext ext2 ext3 ext4 img offset=1080 S EF
    ......................  FAT      fat img       offset=510 U AA
    ......................  FLV      flv           F L V 01
    ......................  GPT      gpt mbr       offset=512 E F I 20 P A R T 00 00 01 00
    ....M.................  HFS      hfs hfsx      offset=1024 B D  ||  H + 00 04  ||  H X 00 05
    ...F..................  Hxs      hxs hxi hxr hxq hxw lit I T O L I T L S 01 00 00 00 ( 00 00 00
    ......O...............  IHex     ihex
    ......................  Iso      iso img       offset=32769 C D 0 0 1
    ......................  LP       lpimg img     offset=4096 g D l a 4 00 00 00
    ......................  Lzh      lzh lha       offset=2 - l h
    .......P..............  MBR      mbr
    ....M....E............  MachO    macho         CE FA ED FE  ||  CF FA ED FE  ||  FE ED FA CE  ||  FE ED FA CF
    ......................  MsLZ     mslz          S Z D D 88 F0 ' 3 A
    ....M.................  Mub      mub           CA FE BA BE 00 00 00  ||  B9 FA F1 0E
    ......................  NTFS     ntfs img      offset=3 N T F S 20 20 20 20 00
    ...F.G................  Nsis     nsis          offset=4 EF BE AD DE N u l l s o f t I n s t
    .........E............  PE       exe dll sys   M Z
    ......................  Ppmd     pmd           8F AF AC 84
    ......................  QCOW     qcow qcow2 qcow2c Q F I FB 00 00 00
    ...F..................  Rar      rar r00       R a r ! 1A 07 00
    ...F..................  Rar5     rar r00       R a r ! 1A 07 01 00
    ......................  Rpm      rpm           ED AB EE DB
    K.....................  SWF      swf           F W S
    ....M.................  SWFc     swf (~.swf)   C W S  ||  Z W S
    ......................  Sparse   simg img      : FF & ED 01 00
    ......................  Split    001
    ....M.................  SquashFS squashfs      h s q s  ||  s q s h  ||  s h s q  ||  q s h s
    .........E............  TE       te            V Z
    ...FM.................  UEFIc    scap          BD 86 f ; v 0D 0 @ B7 0E B5 Q 9E / C5 A0  ||  8B A6 < J # w FB H 80 = W 8C C1 FE C4 M  ||  B9 82 91 S B5 AB 91 C B6 9A E3 A9 C F7 / CC
    ...FM.................  UEFIf    uefif         offset=16 D9 T 93 z h 04 J D 81 CE 0B F6 17 D8 90 DF  ||  x E5 8C 8C = 8A 1C O 99 5 89 a 85 C3 - D3
    ....M.O...............  Udf      udf iso img   offset=32768 00 B E A 0 1 01 00  ||  01 C D 0 0 1
    ......................  VDI      vdi           offset=64 10 DA BE
    .....G................  VHD      vhd           c o n e c t i x 00 00
    ......................  VHDX     vhdx avhdx    v h d x f i l e
    ......................  VMDK     vmdk          K D M V
    ......................  Xar      xar pkg xip   x a r ! 00
    ......................  Z        z taz (.tar)  1F 9D
   CK.....................  bzip2    bz2 bzip2 tbz2 (.tar) tbz (.tar) B Z h
   CK.................m+..  gzip     gz gzip tgz (.tar) tpz (.tar) apk (.tar) 1F 8B 08
    K.....O...............  lzma     lzma
    K.....................  lzma86   lzma86
   C......O...LH......m+..  tar      tar ova       offset=257 u s t a r
   C.SN.......LH..c.a.m+..  wim      wim swm esd ppkg M S W I M 00 00 00
   CK.....................  xz       xz txz (.tar) FD 7 z X Z 00
   C...FMG........c.a.m+..  zip      zip z01 zipx jar xpi odt ods docx xlsx epub ipa apk appx P K 03 04  ||  P K 05 06  ||  P K 06 06  ||  P K 07 08 P K  ||  P K 0 0 P K
    K.....................  zstd     zst tzst (.tar) ( B5 / FD
   CK.....O.....XC........  Hash     sha256 sha512 sha384 sha224 sha3-256 sha1 sha md5 blake2sp xxh64 crc32 crc64 asc cksum

Codecs:
   4ED   303011B BCJ2
    EDF  3030103 BCJ
    EDF  3030205 PPC
    EDF  3030401 IA64
    EDF  3030501 ARM
    EDF  3030701 ARMT
    EDF  3030805 SPARC
    EDF        A ARM64
    EDF        B RISCV
    EDF    20302 Swap2
    EDF    20304 Swap4
    ED     40202 BZip2
    ED         0 Copy
    ED     40109 Deflate64
    ED     40108 Deflate
    EDF        3 Delta
    ED        21 LZMA2
    ED     30101 LZMA
    ED     30401 PPMD
    EDF  6F10701 7zAES
    EDF  6F00181 AES256CBC

Hashers:
      4        1 CRC32
     16      208 MD5
     20      201 SHA1
     32        A SHA256
    256      231 SHA3-256
     48      222 SHA384
     64      223 SHA512
      8      211 XXH64
      8        4 CRC64
     32      202 BLAKE2sp

@Rich: =my= definition of lossless is byte for byte.  If an audio codec/container claims lossless, it must be able to replicate the original exactly and when converted back out of that container, the resulting file must match the original.  File compression...  same concept - what goes in, must come out - identical.  (My understanding of UPX is it's a one-way street where a file that goes in...  can never come out the same as it started. At least not that I've ever read about... but then again, I'm not "on top" of all topics at the moment...  I wasn't even aware until a couple hours ago of the Raspberry Pi Pico 2 release - which launched back in August. :) )

Offline patrikg

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 751
Re: Archive List
« Reply #12 on: April 18, 2025, 02:48:39 AM »
Hi @CentralWare, you can check yourself with upx.

1. md5 .so or elf or even pe (windows exe) file.
2. upx to compress the file.
3. decompress the compressed file(yes you can also decompress files in upx)
4. md5 the decompressed file, and you can see that the same checksum will be there.

Downside of upx, I think when you upx files, you need little more memory in your computer.
« Last Edit: April 18, 2025, 02:55:17 AM by patrikg »

Offline CentralWare

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 843
Re: Archive List
« Reply #13 on: April 18, 2025, 03:38:24 AM »
3. decompress the compressed file(yes you can also decompress files in upx)
It's been quite a while since I looked at UPX and I don't remember the -d flag way back when, but good to know!

Downside of upx, I think when you upx files, you need little more memory in your computer.
I would have figured as much...  as with any compressed system, you need the means to decompress/translate.
The machines this would be used on start at 64GB (12~18 Cores) and work their way up to 256GB (20~36 Cores) so we should be covered there.
In my mind, memory spend has to be roughly the original file size that was compressed along with the stub size to support the extraction.

Okay, my friends, I think we're going to run with this direction!  (7zz + UPX)
We'll have it rip through all of the Arch/Alpine extensions and see if it chokes up on anything - if not, it looks like we have a winner!

Offline nick65go

  • Hero Member
  • *****
  • Posts: 862
Re: Archive List
« Reply #14 on: April 18, 2025, 11:56:57 AM »
@CentralWare: I am glad for the path forward (7zz).
If you use this 7zz as a BUILDER, on a machine with 12+ cores AND 64GB, then IMHO it a shame that you do not compile 7zz as STATICALY LINKED and with march=mtune=native. The "server" machine is not the "user/destination" machine. BTW: I use also CachyOS, did you read about it (+5%..15% speed)?

On final machine (where you deploy) the packages you can use 7zz suitable for that machine (libc, gcc12, 486 arch, etc). But 7zz compiled for "modern' CPU (x86-64-v3) has beter cpu operations: SSSE3, AVX, so better speed! I read 7z author change log.

BTW: the UPX stub (header to decompress the ELF) is small, and UPX uses in-place decompresion, not use compresed and decompresed size in RAM. Anyway you should not need UPX on a 12_core + 64_GB RAM machine :)
« Last Edit: April 18, 2025, 12:01:09 PM by nick65go »