WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: Kernel Khaos :)  (Read 1921 times)

Offline CentralWare

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 778
Kernel Khaos :)
« on: January 02, 2015, 09:27:57 AM »
I was thinking about this last night and figured I would have to ask as someone out there must already have done this just for curiosity's sake.
"What would be the down-side to building a kernel with virtually everything turned on (included or at least modularized)?"

I know the kernel would grow in size if we included everything - that is a given.
There are also likely areas where one mod would clash with another - this would have to be sorted out.

BUT...  let's just (for sake of theory) take a three-point-something kernel as the example and configure it to build "everything" and then take a vacation while it compiles.  Even if everything is set to MODULE, which in my opinion would be the best scenario taking into account TC's methodology, would there be any foreseen caveats in doing so?  In my mind, I could launch a machine to compile tonight and come back when ever it finishes and voila'...  a couple hundred modules are sitting there waiting for something to do...  package them up and life is grand with a large portion of hardware and software options that haven't been implemented yet because it's so time consuming that nobody wants to spend that kind of time.  Any problems anyone can think of with this theory?

The second piece of this inquiry revolves around hardware detection.  My scenario is based on a Cisco CE server I have here which utilizes a network controller which is a module in 3.0.21 thus it's obviously not included with the core image for 4.x and when starting, it hangs for a bit complaining about the missing Tigon (sp?) module before being allowed to continue booting.

Let's say we boot a "master" kernel such as the one listed above...  where everything is tagged as a module save for the defaults needed to keep TC "as is."  Each driver/module has to be detected by the kernel using some means of a central database (actually, more than one I imagine) thus a vendor and hardware ID has to be available when the kernel goes to look for the associated software.  Wouldn't it be possible to revamp the kernel a little and instead of complaining about a missing module and hanging while trying to initialize it, save the IDs in memory or to a log file and quietly (and quickly?) move onto the next device in the list?  In my mind, this would allow us to use that list to update the system in an automated fashion after the kernel and core have both loaded.  Granted, dependencies might be a bit more complicated as we'd likely have to install the missing module(s), re-trigger the hardware layer, wait for any more missing modules to be detected, etc. but in my mind, that's part of the installation/boot process depending on what type of media we're working on.

Example: Let's say we're booting from CD or USB.  Both media types have a known storage capacity and based on the media type, we know where to look for mounts.  When using USB, I tend to use SD cards which allow me to lock writing, so for argument's sake let's assume the media type is r/o.

I do not know how large a complete compilation of modules would be - but let's pretend for a moment that both TC and ALL of the compiled modules fit onto a CDR.

The media boots...  we trigger the hardware daemon for the first time and move on...  during the remainder of our boot process the daemon is running in the background digging up addresses and IDs for different hardware and/or support software needed for a given task.

By the time we reach the end of our rcS script and right before we launch bootsync, if the boot code of wait4dev or similar is listed, we sit back for a minute and wait for the hardware detection to finish its first full run.  Once done and the detection cache is clean, we look at the log file and see if there are any devices we didn't have support built in for.  If found:

1) IF /mnt/boot_device contains the directory ./kernel (or similar) look within to see if firmware_VENDORid_HARDWAREid.tcz exists - if so, load it up and probe accordingly.

2) IF ./kernel does NOT exist, jump online (if available) to see if firmware_VENDORid_HARDWAREid.tcz exists - if so, load it up and probe accordingly.

3) Empty the log file, Rinse, Lather, Repeat until the log file is empty after going through the trigger/detect loop.

4) If not found as a boot code, a command-line version should be available to do so manually.

IN MY MIND, this would alleviate so much time and research done manually for hardware devices (and in many cases, software devices) as not everyone knows exactly what they have within a given machine, let alone within the motherboard such as onboard graphics chips, busses, networking, etc.  (I had no clue the Pro-1000 chips were listed as "A" vendor under Windows and "B" under Linux and without Google and some poor sap already going through this previously, I wouldn't have known what to do at all - and that's just one machine type.)

Q: How does this help the general TC population?
A: If a CD or USB device could be packed with modules in this fashion, the media itself would be truly as "universal" as it gets.  It doesn't cost any more in memory/resources as the modules would be dormant unless they're needed.  This would also take a good 90% of the hunting and research out of the picture for chips which already have kernel support and odds are this new conglomeration of modules could be sorted by age and/or technology to where we wouldn't need "them all" depending on what types of systems we're expecting to work with.  (ie: There's no sense having ISA modules on a boot device if we don't work with anything that has any kind of ISA components.)

Q: How do we handle module dependencies?
A: I am guessing we'd have to create tcz.dep files based on firing up this new "master" kernel and just using the details found in lsmod in reverse.  lsmod tells us module_name and depended BY so we'd likely have to just reverse the process, but it looks as though it can be automated - I didn't dig deep enough to know whether or not there are deps listed within modconfig or similar locations.  I also imagine that testing things in this fashion we'll find out the hard way as to which mods dislike which others! :)

Q: We're trying to keep this distro small - this defeats this concept!
A: True and False.  Let's say I take TC and drop it onto a CD-R...  the CD holds up to 640MB to 800MB depending on the type/format.  If TC and a few extensions took up 50MB...  that's a LOT of waste, and the distro still doesn't fit every possible need on a hardware and/or software basis behind the scenes.  I don't think I've heard (in the forum) of anyone exceeding a few hundred megs (desktop+office+etc) so at best we're still not using 1/2 of a standard CDR, so why not put the remaining white-space to work?  The same goes for flash drives...  my local distributor doesn't even CARRY 1GB SD/Flash cards anymore...  4GB is the smallest.  If you dedicated just 1GB to TC and a couple hundred modules...  that's still 3GB remaining for persistence...  more than the average TC'er will likely use.

Q: What about frugal (HDD) installs?
A: In my opinion this would be a blessing.  We still have to boot from some kind of media to install frugal, which is where these modules would come to play.  Once complete, I'd recommend an option (radio button) on tc-install to inquire as to whether or not to copy over USB and similar drivers for future use and disregard copying PCM/PCI/etc. as the odds of someone plugging in new PCI/e cards is more unlikely than someone connecting a new USB device after install.  With the "log file" methodology, this could also be a blessing as the shell could prompt "New hardware detected.  Please insert your CD/USB/etc. and press ENTER" when the module(s) in question are not available on the local boot drive...  copy the needed pieces over and viola'.  (Or use the internet accordingly.)

Q: What about hardware which isn't kernel supported?
A: I've been giving this some thought as well (specifically thinking of Adaptec and Silicon Image) and there's going to be someone out there who needs a hand making a given piece of hardware work under Linux and at very least, on a driver level, this is where the forum could easily add to the power of TC.  I have an adaptec xxxx-xx SCSI adapter which isn't directly supported under 'nix...  SOMEWHERE out there is a good chance someone has already crossed this path for a different flavor of Linux and odds are pretty good that if a solution was found, it can be implemented into our "Master Kernel" with little or no effort.  "By adding this distinct item to our collective, we achieve perfection." :)  Mind you, that's if the given vendor has no problem allowing us to maintain such - and in some cases they may already have kernel specific drivers which can be adapted and added to the masses.  This would allow us to grow in strength and compatibility while still maintaining our figure!  (Or at least, in theory!)

I personally use TC4...  not because it's the latest and greatest, but because it's the most complete (IMO) when comparing extensions and support, thus I have my own 4x mirror, kernel, core images, etc. to speed things along and to ensure I'm not hammering TC mirrors a couple dozen machines at a time.  There are two main flaws which I've been considering doing something about (the above being one) thus I figure it's best to throw this out there and see what kind of responses come in and if enough positive comes back, it would be worthwhile moving in those directions.

The second "flaw" (so to speak) is going to be the limitation of the repo itself.  The mainstream repos (take RH for example) have tens of thousands of software apps, utilities, etc. at their disposal...  I've been working on a way to take advantage of these resources for the betterment of TC.

For example, take IceWM from TC 4.x...  it's old and antiquated.  The authors aren't exactly running to our door with new releases and the TC crew is going to be limited in numbers and man-hours since we're not exactly on a payroll, but my "update" system for IceWM was virtually painless so now I have an extension for icewm_1.2.30 which works as smooth as glass under TC4.  The same "update" system has also been built for firefox, chrome, java (JRE), flash player and quite a few others, which led me to ask myself "...what about on a global scale?"  The update system grabs the latest releases, but has no way to test the releases so there's a good amount of assumption involved when it comes to functionality.  I haven't implemented version control yet, so it's "latest or nothing."  I still have some further research to do with the mainstream distros but if things continue on this path, TC could technically compete with the big boys in many areas, with the exception of obesity.

Feedback is key.  This post is long-winded, this I know, but I need some serious thinking by the community in order to know which direction to take from here, not just for my own projects...  but for all of the TC users in general.

Offline core-user

  • Full Member
  • ***
  • Posts: 195
  • Linux since 1999
Re: Kernel Khaos :)
« Reply #1 on: January 02, 2015, 11:28:04 AM »
It is possible to compile (nearly) everything into a kernel or module, but doing so would leave you open to possible exploits.

It used to be quite common to only include what is needed for a specific machine, therefore creating a small efficient kernel.

However, you have to understand what every bit of kernel code does to tailor a kernel.

Most distros supply kernels with the most common drivers included or as modules which then get loaded on demand at boot up.

Edit: VENDORid & HARDWAREid exist in files used by the kernel to load the necessary drivers.
« Last Edit: January 02, 2015, 11:36:20 AM by core-user »
AMD, ARM, & Intel.

Offline CentralWare

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 778
Re: Kernel Khaos :)
« Reply #2 on: January 03, 2015, 07:19:14 AM »
@core-user: Thanks for your feedback!

Exploits: There are currently ~250 Level 7 exploits/vulnerabilities listed at CVE, the majority of which are older than v3.2.x or software support (such as cifs, pam, etc.) but to be honest, CVE's database is going to be based on known problems...  God knows there's going to be others that simply haven't come to surface as of yet and for every release which includes enhancements, there's going to be something new which was overlooked as most programmers can't predict every possible attack pattern that will be thrown at it.  This seems to come with the territory.

The theory, however, is that if we were to module-build the kernel, the kernel itself still remains as safe as possible and only when a given module is loaded does the exploit exist in the first place.  Thus, if we had revisions specific to the modules instead of the kernel itself, only the module would be revised/updated on a client machine if and when a fix is implemented.  This makes it possible for virtually anyone, whether it be the Kernel crew, TC crew or even an end user with a few minutes of free time and enough know-how to decipher the source and the solution to implement fixes, submit patches, which in turn water-fall back up the ladder for everyone's benefit...  or at least "in a perfect world" one would hope!

Most distros supply kernels with the most common drivers included or as modules which then get loaded on demand at boot up.
I think that was the main point to the post...  just taken up a notch.  The Kernel feeds off a list of support software/drivers/etc. with one of three options...  Module, Include or Disregard.  In [M]odule mode, the kernel is aware of a device, file system, etc. but doesn't know what to do with it and expects to find a module with the same name.  Once loaded, it basically passes control of that item over to the module as if to say "...it's your problem now!" and waits for the module to complete its initialization.  With [ I ]nclude, the module is built into the kernel - which in my opinion has its strengths and weaknesses.  (If an include crashes, the kernel crashes and throws a panic if we're lucky.  If an external module crashes, in theory and I'm guessing sometimes in practice, the module can be released and we move on - though most peeps still throw a panic.)  When an external module isn't found, the kernel hunts for it and eventually whines about what ever is missing and then moves on if it can.  If the module is neither, the kernel just says to itself "What's this thing???" and basically ignores its existence.  In this case, neither the system nor the end user have a clue something was skipped (during boot.)

It's my thinking that with a very minor rewrite of the module_load process, if we redirect output to a log or a space in memory and simply stop trying to initialize the given item, we'll speed up the boot process for unknown hardware AND have the ability to tend to missing items after all of the existing items are tended to.  There are dependencies to handle, but that also should be able to be accomplished post-boot, as long as it's done before control is released.

In some cases, such as file systems, this is easily accomplished.
blkid notes a storage device has been detected which is cifs formatted.  CIFS isn't loaded.  Load it.  (If permitted by the admin/user.)

In cases such as remote storage (iSCSI, AoE, etc.) the system itself doesn't have a clue such things are needed until an attempted mount takes place.  This would likely fall onto the admin/user to flip a switch saying "I need this support" (such as boot codes like aoe=somewhere:device) which is where tc-config comes into play, but even the AoE module doesn't need to exist within the kernel unless/until called upon.

Scenario 1: I have a gigabyte motherboard, 12GB RAM, a dozen or so hard drives, and it simply breaks TC4 Tiny/Plus boot over something foolish.  The machine boots, you pick a window manager, wait for a few seconds...  and blam!  Shell prompt and nothing more.  Running startx reveals the missing link by indicating Raedon support is missing.  Load up firmware_radeon and launch startx...  desktop.  ATI (AMD) cards are not exactly unheard of, but including ATI support in the kernel itself would be wasteful...  so an option during boot to fill in any empty spaces would likely be a golden fix for many out there who have to spend a good amount of time debugging their system if/when they have no idea what comes pre-packaged and what doesn't.

Scenario 2: I have an IBM server motherboard, 8GB RAM, a CF based boot drive and two full size hard drives...  which hangs TC on boot for about 45 seconds.  Eventually, the kernel painfully states a missing (module) driver.  This happens twice as there are two GBE onboard, thus the extended wait time.  Of course, I have no clue what a TIGON device is initially as the nics are Intel Pro-1000 based on the specs, so it's not until I realize there's no network connectivity that I figure/assume "tigon = Pro1000?"  I dig through the forums and eventually figure out where the Tigon driver is located and viola'...  it has a heartbeat.

Scenario 3: I have terminal kiosks called "Vision Bank" which are integrated motherboard/audio/monitor/etc. (everything except touch-screen) which I set to PXE boot.  I had to actually tear one apart to determine they're Geode based once they failed horribly to even boot.  (Hung on the kernel - no panic, just sat there.  I didn't remove "quiet" from the boot codes, so I may have been able to figure things out a "little" faster otherwise.)

It's merely my opinion, thus the post, if we educated the kernel by [m]odule-compiling it with most of the hardware drivers available and a large chunk of the filesystems and virtual devices, we're not going to end up with an obese kernel (I can't imagine it growing much in size at all if we're just adding associations to the kernel itself - the bulk would be in the actual modules) and if anything, we would have information from the kernel as to what was detected and can act accordingly long before the user takes over.  Again...  this is the theory which is what I'm asking people to support or debate.  Yes, I know there are holes and vulnerabilities, they're in black and white and only apply to the ones we're even aware of, but those holes exist in the modules themselves...  not necessarily in the kernel core.

Example: AoE support is built in with the TC design.  Let's pretend for a moment we recompiled with AoE being a module instead.  In the kernel, it's now just a "tag" telling the kernel we're going to possibly support it.  I then modprobe aoe and the kernel goes to look for the AoE module.  If it finds it, viola'...  we're in business.  If not, it complains and quits trying.  If the AoE module had a vulnerability, it wouldn't exist on the system until modprobe was successful.  (Any arguments out there on this?)  Let's then say the module itself were 64KB in size.  My guess (this example will be actually done in the next couple days) is that the kernel would shrink by roughly 64KB and the module itself would likely be a little larger than 64KB as it would also contain additional headers which otherwise were already included and shared throughout the kernel.  On a plus note...  we might even be able to make up for some of that extra bulk storage-wise for the module as we'll be re-compressing the module when it becomes a TCZ, but in memory, it's likely to have the extra bloating.

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11050
Re: Kernel Khaos :)
« Reply #3 on: January 04, 2015, 04:21:31 PM »
That's a lot to read, I'm trying to respond to what I found as the main points. If I miss any please say.

Quote
"What would be the down-side to building a kernel with virtually everything turned on (included or at least modularized)?"

A kernel with everything as modules would boot slowly and use more total RAM (with all loaded, compared to all built-in).

You can easily create an image with everything we ship - remaster your initrd with all module extensions (see the dep file of original-modules) and firmware*. All modules total just a few MB, all firmware is larger.

For your tg3 network module delay, it really would be good to have included the full error message. It's not about a missing module but missing firmware I guess. If you want to hook the firmware-loading process, see firmware.sh under /lib/udev.

Modules can only be updated for fixes/vulnerabilities in a limited scale; small fixes only.

Finally, you propose a kernel fork. It does not make sense to me, it is a huge amount of work.
The only barriers that can stop you are the ones you create yourself.