Tiny Core Linux

General TC => General TC Talk => Topic started by: medvedm on August 31, 2010, 07:48:12 AM

Title: boot time SW corruption check
Post by: medvedm on August 31, 2010, 07:48:12 AM
Hi - This post probably may end up being a general Linux/Grub question, but given I"m probably going  to use TC for my project, I thought I'd put it here and see if anyone can point me in the right direction.

I'm planning on using linux (TC specifically) on a NASA project I'm working on that is going to go on the International Space Station.  One interesting side effect of the environment is that radiation hits can cause bit flips and other errors in data stored on HDDs or non-volatile RAMs.

So what I'd like to do, because TC is so small (figuring I can have a working copy with my application at around 30MB max) is keep redundant copies of the kernel and tinycore.gz on the hard disk. 

Is there a way to check on bootup if any of the files is corrupt (say via a checksum?) and switch to a different bzImage/tinycore.gz pair?  Does GRUB have anything for this built in?

Thanks for any info.
M
Title: Re: boot time SW corruption check
Post by: gerald_clark on August 31, 2010, 08:33:59 AM
And if GRUB got hit?
Google for radiation hardened storage.
Title: Re: boot time SW corruption check
Post by: medvedm on August 31, 2010, 10:00:26 AM
yeah, that is certainly a risk.  Rad-hard is out of the question, cost cost cost.  Hard disks do better than most other electronics, but I'm just trying to figure out if there is something I can do at all. 

If GRUB gets hit, get a replacement disk.  At least we would be limiting the risk.

M
Title: Re: boot time SW corruption check
Post by: tinypoodle on August 31, 2010, 10:58:56 AM
A possible approach I could imagine would be booting to a minimal FreeDOS install, running md5sum on kernels and initrd's until the first correct match and have linld.com boot the desirable bzImage/tinycore.gz pair.

Of course what gerald_clark said about GRUB applies in this case as well (and would for any software - bootloader or other - to be run before booting TC).
Title: Re: boot time SW corruption check
Post by: robc on August 31, 2010, 02:13:50 PM
grub does have a fallback option that can be used. Some info: http://www.gnu.org/software/grub/manual/legacy/Booting-fallback-systems.html (http://www.gnu.org/software/grub/manual/legacy/Booting-fallback-systems.html)

doesn't address protecting grub though...multiple boot disks?
Title: Re: boot time SW corruption check
Post by: tinypoodle on September 01, 2010, 12:44:17 AM
As far as I can understand this could neither address protection of the integrity of the initrd.
Title: Re: boot time SW corruption check
Post by: medvedm on September 01, 2010, 11:25:38 AM
Yeah, I'm not going for foolproof here.  I understand that GRUB (or whatever bootloader used) could still get zapped.  However, that would leave only one small piece of software vulnerable on the system as opposed to all of it.  I think most projects don't do anything to protect against this - if you get hit, just put in another HDD). 

I'm trying to see if there is a relatively easy way of protecting a huge majority of the SW loaded on the disk.  And since disk space isn't an issue and TC is so small, redundant copies would work well, the problem is just how do I figure out if a different copy is needed.

The fallback option is interesting, but not exactly what I'm looking for.  There could be a problem in an executable that doesn't prevent the kernel from booting correctly but would still make the system useless.  A boot time check would be best because it would check all of it.
Title: Re: boot time SW corruption check
Post by: robc on September 01, 2010, 12:22:44 PM
Quote
Yeah, I'm not going for foolproof here.  I understand that GRUB (or whatever bootloader used) could still get zapped.  However, that would leave only one small piece of software vulnerable on the system as opposed to all of it.  I think most projects don't do anything to protect against this - if you get hit, just put in another HDD). 
your bios can also get zapped

Quote
The fallback option is interesting, but not exactly what I'm looking for.  There could be a problem in an executable that doesn't prevent the kernel from booting correctly but would still make the system useless.  A boot time check would be best because it would check all of it.
You could use a bzip2 compressed image instead of gzip. bzip2 has a built in checksum (crc32) so the file system integrity of the image would be checked every boot. Also once you have a running system it would be good practice to schedule integrity checks for all system images on boot and on an interval.
Title: Re: boot time SW corruption check
Post by: tinypoodle on September 01, 2010, 01:34:46 PM
Yeah, I'm not going for foolproof here.  I understand that GRUB (or whatever bootloader used) could still get zapped.  However, that would leave only one small piece of software vulnerable on the system as opposed to all of it.  I think most projects don't do anything to protect against this - if you get hit, just put in another HDD). 

I'm trying to see if there is a relatively easy way of protecting a huge majority of the SW loaded on the disk.  And since disk space isn't an issue and TC is so small, redundant copies would work well, the problem is just how do I figure out if a different copy is needed.

The fallback option is interesting, but not exactly what I'm looking for.  There could be a problem in an executable that doesn't prevent the kernel from booting correctly but would still make the system useless.  A boot time check would be best because it would check all of it.

Hence my suggestion:
Size of total software to run a checksum and a Linux bootloader under DOS could be kept very small and would be efficient in case only kernel and/or initrd would get corrupted. Not foolproof, but if you want to rely on software to check integrity of kernel and initrd, then it's a mathematical issue of probabilities.

You could use a bzip2 compressed image instead of gzip. bzip2 has a built in checksum (crc32) so the file system integrity of the image would be checked every boot.

What would be the expectable consequences if the crc32 check would fail, in such a case?
Can initrd's which are compressed with bzip2 be booted without any further, just as if they were gzip'ped?
Title: Re: boot time SW corruption check
Post by: ixbrian on September 01, 2010, 03:08:52 PM
Would optical media (CDROM) be more reliable than flash or magnetic storage in this kind of environment to boot the OS?  If you used optical to boot the system, maybe you could have a RAID 1 mirror (hardware or software) setup on flash drives or hard drives to store data you need to save.

Brian
Title: Re: boot time SW corruption check
Post by: maro on September 01, 2010, 07:16:06 PM
I just did some fairly quick tests (with TC 3.1rc2) as I found the whole question rather interesting. I was using QEMU (v0.12.5 running on a XP host) with the -kernel bzImage -initrd ... parameters instead of "real" HW:


What I've learned (again) is that things are sometimes not as simple as they appear on first thought.  ;D
Title: Re: boot time SW corruption check
Post by: gerald_clark on September 01, 2010, 07:43:11 PM
Look for storage with built-in ECC.
Use redundant copies of all files, and have a background process constantly running checksums.
Title: Re: boot time SW corruption check
Post by: medvedm on September 02, 2010, 05:20:32 AM
Would optical media (CDROM) be more reliable than flash or magnetic storage in this kind of environment to boot the OS?  If you used optical to boot the system, maybe you could have a RAID 1 mirror (hardware or software) setup on flash drives or hard drives to store data you need to save.

Brian

Yeah, it probably would be, but size is a limiting factor there, I've only got room in my package for one storage medium, the HDD.
Title: Re: boot time SW corruption check
Post by: medvedm on September 02, 2010, 05:23:16 AM
your bios can also get zapped

True, but if I remember correctly, the chips the BIOS resides on are much less prone to problems w/ radiation.  We've had boards operating in space for years without having BIOS issues, but we have had occasional hard drive and FLASH issues.
Title: Re: boot time SW corruption check
Post by: curaga on September 02, 2010, 05:30:28 AM
Supposedly ZFS can detect and correct bit-level errors. Then again, I don't think anything can boot linux from ZFS. TC can read & write using the zfs-fuse extension, though.
Title: Re: boot time SW corruption check
Post by: tinypoodle on September 02, 2010, 08:57:33 AM
I just did some fairly quick tests (with TC 3.1rc2) as I found the whole question rather interesting.

Rather interesting indeed... Thanks for sharing results of your tests.

Quote
I also tried to see what happens when 'tinycore.gz' gets "damaged". I therefore choose to "flip" just one bit with the help of a hex editor in 'tinycore.gz' (offset: 0x300). I had mixed results when trying to un-compress this 'tinycore_faulty.gz':

  • The BusyBox 'gzip' failed to un-compress the file when I tried gzip -d tinycore_faulty.gz (with gzip: crc error and gzip: error inflating)
  • But it produced an output when I did gzip -cd tinycore_faulty.gz > tinycore_faulty.cpio (again spitting out the same warnings as before)

Makes me wonder what would happen with:

Code: [Select]
gzip -df tinycore_faulty.gz or
Code: [Select]
gunzip -f tinycore_faulty.gzin case you still have the bit flipped initrd.

Quote
  • Likewise on XP I found most tools I tested refused to un-compress, when they spotted the CRC error. But some produced a file (albeit "under protest").

FWIW, I can remember that I had some corrupted rar archives under DOS which would only decompress with an added switch.
Title: Re: boot time SW corruption check
Post by: gadget42 on March 30, 2022, 02:15:15 AM
Supposedly ZFS can detect and correct bit-level errors. Then again, I don't think anything can boot linux from ZFS. TC can read & write using the zfs-fuse extension, though.

concerning zfs and booting linux:
http://forum.tinycorelinux.net/index.php/topic,25280.msg164654.html#msg164654