WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: boot time SW corruption check  (Read 10002 times)

Offline medvedm

  • Newbie
  • *
  • Posts: 48
boot time SW corruption check
« on: August 31, 2010, 10:48:12 AM »
Hi - This post probably may end up being a general Linux/Grub question, but given I"m probably going  to use TC for my project, I thought I'd put it here and see if anyone can point me in the right direction.

I'm planning on using linux (TC specifically) on a NASA project I'm working on that is going to go on the International Space Station.  One interesting side effect of the environment is that radiation hits can cause bit flips and other errors in data stored on HDDs or non-volatile RAMs.

So what I'd like to do, because TC is so small (figuring I can have a working copy with my application at around 30MB max) is keep redundant copies of the kernel and tinycore.gz on the hard disk. 

Is there a way to check on bootup if any of the files is corrupt (say via a checksum?) and switch to a different bzImage/tinycore.gz pair?  Does GRUB have anything for this built in?

Thanks for any info.
M

Offline gerald_clark

  • TinyCore Moderator
  • Hero Member
  • *****
  • Posts: 4254
Re: boot time SW corruption check
« Reply #1 on: August 31, 2010, 11:33:59 AM »
And if GRUB got hit?
Google for radiation hardened storage.

Offline medvedm

  • Newbie
  • *
  • Posts: 48
Re: boot time SW corruption check
« Reply #2 on: August 31, 2010, 01:00:26 PM »
yeah, that is certainly a risk.  Rad-hard is out of the question, cost cost cost.  Hard disks do better than most other electronics, but I'm just trying to figure out if there is something I can do at all. 

If GRUB gets hit, get a replacement disk.  At least we would be limiting the risk.

M

Offline tinypoodle

  • Hero Member
  • *****
  • Posts: 3857
Re: boot time SW corruption check
« Reply #3 on: August 31, 2010, 01:58:56 PM »
A possible approach I could imagine would be booting to a minimal FreeDOS install, running md5sum on kernels and initrd's until the first correct match and have linld.com boot the desirable bzImage/tinycore.gz pair.

Of course what gerald_clark said about GRUB applies in this case as well (and would for any software - bootloader or other - to be run before booting TC).
"Software gets slower faster than hardware gets faster." Niklaus Wirth - A Plea for Lean Software (1995)

Offline robc

  • Sr. Member
  • ****
  • Posts: 447
Re: boot time SW corruption check
« Reply #4 on: August 31, 2010, 05:13:50 PM »
grub does have a fallback option that can be used. Some info: http://www.gnu.org/software/grub/manual/legacy/Booting-fallback-systems.html

doesn't address protecting grub though...multiple boot disks?
"Never give up! Never surrender!" - Commander Peter Quincy Taggart

"Make it so." - Captain Picard

Offline tinypoodle

  • Hero Member
  • *****
  • Posts: 3857
Re: boot time SW corruption check
« Reply #5 on: September 01, 2010, 03:44:17 AM »
As far as I can understand this could neither address protection of the integrity of the initrd.
"Software gets slower faster than hardware gets faster." Niklaus Wirth - A Plea for Lean Software (1995)

Offline medvedm

  • Newbie
  • *
  • Posts: 48
Re: boot time SW corruption check
« Reply #6 on: September 01, 2010, 02:25:38 PM »
Yeah, I'm not going for foolproof here.  I understand that GRUB (or whatever bootloader used) could still get zapped.  However, that would leave only one small piece of software vulnerable on the system as opposed to all of it.  I think most projects don't do anything to protect against this - if you get hit, just put in another HDD). 

I'm trying to see if there is a relatively easy way of protecting a huge majority of the SW loaded on the disk.  And since disk space isn't an issue and TC is so small, redundant copies would work well, the problem is just how do I figure out if a different copy is needed.

The fallback option is interesting, but not exactly what I'm looking for.  There could be a problem in an executable that doesn't prevent the kernel from booting correctly but would still make the system useless.  A boot time check would be best because it would check all of it.

Offline robc

  • Sr. Member
  • ****
  • Posts: 447
Re: boot time SW corruption check
« Reply #7 on: September 01, 2010, 03:22:44 PM »
Quote
Yeah, I'm not going for foolproof here.  I understand that GRUB (or whatever bootloader used) could still get zapped.  However, that would leave only one small piece of software vulnerable on the system as opposed to all of it.  I think most projects don't do anything to protect against this - if you get hit, just put in another HDD). 
your bios can also get zapped

Quote
The fallback option is interesting, but not exactly what I'm looking for.  There could be a problem in an executable that doesn't prevent the kernel from booting correctly but would still make the system useless.  A boot time check would be best because it would check all of it.
You could use a bzip2 compressed image instead of gzip. bzip2 has a built in checksum (crc32) so the file system integrity of the image would be checked every boot. Also once you have a running system it would be good practice to schedule integrity checks for all system images on boot and on an interval.
"Never give up! Never surrender!" - Commander Peter Quincy Taggart

"Make it so." - Captain Picard

Offline tinypoodle

  • Hero Member
  • *****
  • Posts: 3857
Re: boot time SW corruption check
« Reply #8 on: September 01, 2010, 04:34:46 PM »
Yeah, I'm not going for foolproof here.  I understand that GRUB (or whatever bootloader used) could still get zapped.  However, that would leave only one small piece of software vulnerable on the system as opposed to all of it.  I think most projects don't do anything to protect against this - if you get hit, just put in another HDD). 

I'm trying to see if there is a relatively easy way of protecting a huge majority of the SW loaded on the disk.  And since disk space isn't an issue and TC is so small, redundant copies would work well, the problem is just how do I figure out if a different copy is needed.

The fallback option is interesting, but not exactly what I'm looking for.  There could be a problem in an executable that doesn't prevent the kernel from booting correctly but would still make the system useless.  A boot time check would be best because it would check all of it.

Hence my suggestion:
Size of total software to run a checksum and a Linux bootloader under DOS could be kept very small and would be efficient in case only kernel and/or initrd would get corrupted. Not foolproof, but if you want to rely on software to check integrity of kernel and initrd, then it's a mathematical issue of probabilities.

You could use a bzip2 compressed image instead of gzip. bzip2 has a built in checksum (crc32) so the file system integrity of the image would be checked every boot.

What would be the expectable consequences if the crc32 check would fail, in such a case?
Can initrd's which are compressed with bzip2 be booted without any further, just as if they were gzip'ped?
"Software gets slower faster than hardware gets faster." Niklaus Wirth - A Plea for Lean Software (1995)

Offline ixbrian

  • Retired Admins
  • Sr. Member
  • *****
  • Posts: 436
Re: boot time SW corruption check
« Reply #9 on: September 01, 2010, 06:08:52 PM »
Would optical media (CDROM) be more reliable than flash or magnetic storage in this kind of environment to boot the OS?  If you used optical to boot the system, maybe you could have a RAID 1 mirror (hardware or software) setup on flash drives or hard drives to store data you need to save.

Brian

Offline maro

  • Hero Member
  • *****
  • Posts: 1228
Re: boot time SW corruption check
« Reply #10 on: September 01, 2010, 10:16:06 PM »
I just did some fairly quick tests (with TC 3.1rc2) as I found the whole question rather interesting. I was using QEMU (v0.12.5 running on a XP host) with the -kernel bzImage -initrd ... parameters instead of "real" HW:

  • I created a 'tinycore.bz2' initrd (with gzip -cd tinycore.gz | bzip2 -c > tinycore.bz2) and tried to boot it, which ended in: 'Kernel panic - not syncing: ...'. Looking back in the output (after choosing boot code 'vga=6' to get more lines on the screen) I saw RAMDISK: bzip2 image found at block 0 and RAMDISK: bzip2 decompressor not configured!
    So I guess if one would like to use this format a kernel re-compile would be needed (probably with 'CONFIG_RD_BZIP2=y').

  • I also tried to see what happens when 'tinycore.gz' gets "damaged". I therefore choose to "flip" just one bit with the help of a hex editor in 'tinycore.gz' (offset: 0x300). I had mixed results when trying to un-compress this 'tinycore_faulty.gz':

    • The BusyBox 'gzip' failed to un-compress the file when I tried gzip -d tinycore_faulty.gz (with gzip: crc error and gzip: error inflating)

    • But it produced an output when I did gzip -cd tinycore_faulty.gz > tinycore_faulty.cpio (again spitting out the same warnings as before)

    • Likewise on XP I found most tools I tested refused to un-compress, when they spotted the CRC error. But some produced a file (albeit "under protest").

    • The RFC 1952 (in '2.3.1.2. Compliance') does not spell out what to the decompressor should do in the case of a CRC error. It merely demands a correct CRC32 from a compressor.

    Anyway, back to the question how the faulty initrd was was coping when the system booted: Well, it seem to do just fine. Further analysis showed that the "1-bit damage" in the '.gz' file resulted in a "3-byte damage" in the '.cpio' file. Which in this case where one byte each in three files in '/var/cache/fontconfig'. But clearly that was just pretty lucky and I would not know how this fault could manifest itself on the running system.

What I've learned (again) is that things are sometimes not as simple as they appear on first thought.  ;D

Offline gerald_clark

  • TinyCore Moderator
  • Hero Member
  • *****
  • Posts: 4254
Re: boot time SW corruption check
« Reply #11 on: September 01, 2010, 10:43:11 PM »
Look for storage with built-in ECC.
Use redundant copies of all files, and have a background process constantly running checksums.

Offline medvedm

  • Newbie
  • *
  • Posts: 48
Re: boot time SW corruption check
« Reply #12 on: September 02, 2010, 08:20:32 AM »
Would optical media (CDROM) be more reliable than flash or magnetic storage in this kind of environment to boot the OS?  If you used optical to boot the system, maybe you could have a RAID 1 mirror (hardware or software) setup on flash drives or hard drives to store data you need to save.

Brian

Yeah, it probably would be, but size is a limiting factor there, I've only got room in my package for one storage medium, the HDD.

Offline medvedm

  • Newbie
  • *
  • Posts: 48
Re: boot time SW corruption check
« Reply #13 on: September 02, 2010, 08:23:16 AM »
your bios can also get zapped

True, but if I remember correctly, the chips the BIOS resides on are much less prone to problems w/ radiation.  We've had boards operating in space for years without having BIOS issues, but we have had occasional hard drive and FLASH issues.

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11044
Re: boot time SW corruption check
« Reply #14 on: September 02, 2010, 08:30:28 AM »
Supposedly ZFS can detect and correct bit-level errors. Then again, I don't think anything can boot linux from ZFS. TC can read & write using the zfs-fuse extension, though.
The only barriers that can stop you are the ones you create yourself.