WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: [Solved]: unsquash + squash changes md5sum  (Read 1790 times)

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1433
[Solved]: unsquash + squash changes md5sum
« on: March 18, 2023, 07:05:16 AM »
I was recently surprised when I noticed that unsquashing and re-squashing a squashfs (e.g., a TCL extension) causes its md5sum to change, even if no changes have been made to the squashfs. Give it a try if you don't believe me ;)

Given how much our favorite distro relies on squashfs and md5sums, I decided to investigate.

It turns out that mksquashfs generates some timestamp metadata, which causes the md5sum to change. Given that metadata usually does not have an impact on md5sum (e.g., touching a file and changing its ownership and/or permissions does not cause a change in md5sum), mksquashfs's behavior is counterintuitive and feels like a bug in its design.

Some people have created a patch to work around this (see here, for example). I'm going to start using a patched version of mksquashfs that behaves as expected.

Would the TCL developers like me to update the squashfs-tools.tcz extension with patched version of mksquashfs, where not changing md5sum (i.e., -no-date option) is the default behavior, unless -date is specified?

« Last Edit: March 18, 2023, 02:34:18 PM by Rich »

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 11489
Re: unsquash + squash changes md5sum
« Reply #1 on: March 18, 2023, 10:20:26 AM »
Hi GNUser
I think the behavior is correct. The md5sum is used to verify the contents of
the file (container) have not changed. External characteristics of the file such
as name, timestamps, owner, permissions, etc. have no effect because they
do not affect the contents. They do affect the contents of the drive (container)
that it resides on (in):
Code: [Select]
tc@E310:~$ md5sum /dev/sda7
f6a5521c1a9121fbb75773794b3e3fb9  /dev/sda7
tc@E310:~$ ls -l /mnt/sda7/home/tc/.local/bin/startNFS
-rwxr-xr-x 1 tc staff 99 Mar 18 10:05 /mnt/sda7/home/tc/.local/bin/startNFS
tc@E310:~$ md5sum /dev/sda7
f6a5521c1a9121fbb75773794b3e3fb9  /dev/sda7
tc@E310:~$ touch /mnt/sda7/home/tc/.local/bin/startNFS
tc@E310:~$ md5sum /dev/sda7
657be8e63e265bc8f7cd27412f871f4e  /dev/sda7
tc@E310:~$ cat /mnt/sda7/home/tc/.local/bin/startNFS > /dev/null
tc@E310:~$ md5sum /dev/sda7
53250db15baeed958304f7dc36cc5729  /dev/sda7
tc@E310:~$

The same holds true for a squashfs. Like a drive, the external characteristics
of the files it contains would and should change the md5sum. The purpose
of the md5sum is not to determine what changed, only that something changed.

Here's my 2 cents. If you unpack a squashfs just to examine contents, don't
re-squash it. The original will still be present, keep it.

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1433
Re: unsquash + squash changes md5sum
« Reply #2 on: March 18, 2023, 11:10:27 AM »
Hi Rich. Thank you for your thoughtful response, as always. If you think there's value in the timestamp being included in squashfs's md5sum, I'll leave this alone. I was just not what I was expecting.

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1433
Re: unsquash + squash changes md5sum
« Reply #3 on: March 18, 2023, 02:32:42 PM »
Hi Rich. Thank you for the shell commands. They prove that a change in a file's metadata (e.g., mtime) causes a change in md5sum of the filesystem that contains the file, even though it does not cause a change in the md5sum of the file itself. It makes sense, come to think of it. So I guess what I reported in OP is expected, correct behavior, after all.

Topic may be marked as solved  :)

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 11489
Re: [Solved]: unsquash + squash changes md5sum
« Reply #4 on: March 18, 2023, 02:43:44 PM »
Hi GNUser
... Topic may be marked as solved  :)
Done.

Just remember:
... Here's my 2 cents. If you unpack a squashfs just to examine contents, don't
re-squash it. The original will still be present, keep it.
If you haven't made any changes, there's no point in repackaging it.

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1433
Re: [Solved]: unsquash + squash changes md5sum
« Reply #5 on: March 18, 2023, 03:57:44 PM »
Yes, that's an excellent tip. Thank you.

Offline mocore

  • Hero Member
  • *****
  • Posts: 566
  • ~.~
Re: [Solved]: unsquash + squash changes md5sum
« Reply #6 on: March 19, 2023, 04:04:03 AM »
It turns out that mksquashfs generates some timestamp metadata, which causes the md5sum to change. Given that metadata usually does not have an impact on md5sum (e.g., touching a file and changing its ownership and/or permissions does not cause a change in md5sum), mksquashfs's behavior is counterintuitive and feels like a bug in its design.

Some people have created a patch to work around this (see here, for example).

thanks for the interesting topic / link
reading through i recalled that both nix and guix
have chosen for one reason or other
a some what different method of avoiding embedded date info
changing # of their package(s) (dep tree(s) )

modification date of /gnu/store files is 1970-01-01 - https://www.mail-archive.com/help-guix@gnu.org/msg10274.html
Quote
     what is wrong with the file system entries of /gnu/store. All entries there have a timestamp of UNIX epoch 0.


Well spotted! This is intentional. The Guix daemon[0] changes all timestamps of all files added to the store to a known value. Same for other metadata like ownership and some permission bits.


If it did not, certain software could (and does) behave differently between two different machines with otherwise identical stores.

Change default date of nix store to increase compatibility https://github.com/NixOS/nixpkgs/issues/60446
 the above link mentions set-source-date-epoch-to-latest.sh (bash)script is used by nix and that the default had to be changed to accommodate pythons zipfile.py
https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-support/setup-hooks/set-source-date-epoch-to-latest.sh
 :o