Tiny Core Linux

Tiny Core Base => Micro Core => Topic started by: Santos on November 05, 2021, 07:34:14 PM

Title: [Solved] MC 11.1 x86 sleeps after 30 days uptime
Post by: Santos on November 05, 2021, 07:34:14 PM
Hello

After ~30 days of uptime  microcore  automatically goes into some kind of hibernation mode, everything halts. My  ssh  sessions just hang and I have to type in my credentials (passwd/user) on the actual computer before "resuming" my system. This is the second time it happens. I'm not quite sure if its due to the set up I have, the laptop itself or  microcore  .

1.
I think it could be the set up. I have a T42 working as a server, microcore 11.1 x86 running on it. With multiple connections over ssh, nfs, sshfs and several USB devices attached to it. On top of that I'm running a  chroot  environment (Alpine Linux) with transmission downloading almost 24/7. No swap partition/file. And Sata device is connected with a cheap adaptor Sata to IDE 2.5".

2.
IBM T42 all powersaving options are disabled. Every option I can change/modify on the BIO to prevent the laptop from turning off by itself is already disabled.

3.
As I said, second time it happens. The first time it just halt for a couple of seconds before resetting by its own. At the time I didn't have any hardisk drive/adaptor plugged in. All I had was a keyboard and the Ethernet cable.

This time I grabbed the  dmesg  output besides some errors with an old USB cable the only thing I could think of is the following:

Code: [Select]
ata1: lost interrupt (Status 0x58)
ata1: drained 4098 bytes to clear DRQ
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata1.00: failed command: READ DMA
ata1.00: cmd c8/00:08:00:f8:4e/00:00:00:00:00/e9 tag 0 dma 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1.00: configured for UDMA/100
ata1.00: device reported invalid CHS sector 0
sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda] tag#0 Sense Key : 0x5 [current]
sd 0:0:0:0: [sda] tag#0 ASC=0x21 ASCQ=0x4
sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 09 4e f8 00 00 00 08 00
blk_update_request: I/O error, dev sda, sector 156170240 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
ata1: EH complete
ata1: lost interrupt (Status 0x58)
ata1: drained 4098 bytes to clear DRQ
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata1.00: failed command: READ DMA
ata1.00: cmd c8/00:08:b0:5c:c4/00:00:00:00:00/e0 tag 0 dma 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1.00: configured for UDMA/100
ata1.00: device reported invalid CHS sector 0
sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda] tag#0 Sense Key : 0x5 [current]
sd 0:0:0:0: [sda] tag#0 ASC=0x21 ASCQ=0x4
sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 c4 5c b0 00 00 08 00
blk_update_request: I/O error, dev sda, sector 12868784 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
ata1: EH complete
usb 3-1.1: USB disconnect, device number 108
usb 3-1.1: new full-speed USB device number 109 using uhci_hcd
usb 3-1.1: not running at top speed; connect to a high speed hub
cdc_acm 3-1.1:1.1: ttyACM0: USB ACM device
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata1.00: BMDMA stat 0x25
ata1.00: failed command: READ DMA
ata1.00: cmd c8/00:08:00:09:83/00:00:00:00:00/e5 tag 0 dma 4096 in
         res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata1: soft resetting link
ata1.00: NODEV after polling detection
ata1.00: revalidation failed (errno=-2)
ata1: soft resetting link
ata1.00: configured for UDMA/100
ata1: EH complete

Uptime:
Code: [Select]
21:18:23 up 34 days, 15:28,  17 users,  load average: 0.34, 0.48, 0.39

Any advice to stop  microcore/my_computer  from hibernating?
Title: Re: MC 11.1 x86 sleeps after 30 days uptime
Post by: Rich on November 05, 2021, 08:53:48 PM
Hi Santos
Looking at that dmesg output, several possibilities come to mind:
1. Hard drive is failing.
2. The  cheap adaptor Sata to IDE 2.5"  is failing or a poorly designed product.
3. Maybe  UDMA/100  is too fast a setting for your setup. Try a slower setting if possible.
4. Cabling issue. Make sure connectors are properly seated. Avoid excessive cable length. Make sure you use the correct
   cable type for for UDMA (40 pin with 80 wire ribbon cable).
Title: Re: MC 11.1 x86 sleeps after 30 days uptime
Post by: Santos on November 05, 2021, 10:17:55 PM
Thank you very much Rich. I think the culprit is the cheap adaptor. The hdd is not failing (I ran a  smartctl  and  badblocks  scan). No cables in between the adapter and the HDD. Besides, I did a quick search on the internet, most of the time the messages above will be showing up when there is a "connection" issue like faulty cables.

The best way to go will be to lower the UDMA setting. How to achieve this?

Edit:
I just rebooted my laptop. Let's see if it works for another month, haha. :)
Title: Re: MC 11.1 x86 sleeps after 30 days uptime
Post by: gadget42 on November 06, 2021, 02:01:25 AM
refreshed my memory regarding the T42 by visiting:
https://www.anandtech.com/show/4052/ibms-thinkpad-t42-lcd-a-blast-from-the-past

notes/thoughts to add to Rich's:
the unit has reached the age where board-level components become unreliable and prone to random failures/glitches/etc.
(have experienced this with desktops, laptops, all-in-ones, as well as various peripherals. both PCs and MACs, some dating back to mid-1980s)

and even the newer stuff fails due to prior owner abuse, cold solder connections finally failing intermittently, environmental factors(humidity/temperature/condensation/etc)

sometimes just a good cleaning/vacuuming, some plastic-safe contact-cleaner, and a few unplugging-and-replugging cycles of all possible connections will do the trick.

good luck!
Title: Re: MC 11.1 x86 sleeps after 30 days uptime
Post by: PDP-8 on November 06, 2021, 05:01:01 PM
Hm..  not sure if usbcore is compiled into the TC kernel.

If so, perhaps this kernel parameter could be passed

Code: [Select]
usbcore.autosuspend=-1
I've used this elsewhere on some substandard / aging hardware.  Not absolutely sure usbcore is in TC though...
Title: Re: MC 11.1 x86 sleeps after 30 days uptime
Post by: Santos on November 12, 2021, 09:02:42 PM
Thank you for your answers.

So far after reading your responses I think that  microcore  does not have some kind of "hibernation" mode that needs to be looked at. It gives me peace of mind so I don't have to reconfigure it.

refreshed my memory regarding the T42 by visiting:
https://www.anandtech.com/show/4052/ibms-thinkpad-t42-lcd-a-blast-from-the-past

notes/thoughts to add to Rich's:
the unit has reached the age where board-level components become unreliable and prone to random failures/glitches/etc.
(have experienced this with desktops, laptops, all-in-ones, as well as various peripherals. both PCs and MACs, some dating back to mid-1980s)

and even the newer stuff fails due to prior owner abuse, cold solder connections finally failing intermittently, environmental factors(humidity/temperature/condensation/etc)

sometimes just a good cleaning/vacuuming, some plastic-safe contact-cleaner, and a few unplugging-and-replugging cycles of all possible connections will do the trick.

good luck!

Thank you.  :)

It's really sad for me to see such a marvelous device to go to waste. I think computers are one of the best things humanity have come up with, it's so incredible the power it give us. But everything has to come to an end. So hopefully some last precautions can extend it's working life span for a couple of months.

Hm..  not sure if usbcore is compiled into the TC kernel.

If so, perhaps this kernel parameter could be passed

Code: [Select]
usbcore.autosuspend=-1
I've used this elsewhere on some substandard / aging hardware.  Not absolutely sure usbcore is in TC though...


Thank you PDP-8, I'll definitely will be checking it out on the next reboot cycle.


Besides this, any advice on how to lower UDMA or some kind of safe throttling?
Title: Re: MC 11.1 x86 sleeps after 30 days uptime
Post by: Rich on November 12, 2021, 10:21:45 PM
Hi Santos
... Besides this, any advice on how to lower UDMA or some kind of safe throttling?
You could try this boot code which turns off all disc DMA:
Code: [Select]
libata.dma=0Found here:
https://mjmwired.net/kernel/Documentation/kernel-parameters.txt#2017

Or you could try changing the transfer mode:
Code: [Select]
libata.force=udma66or:
Code: [Select]
libata.force=pio4Found here:
https://mjmwired.net/kernel/Documentation/kernel-parameters.txt#2033

After the machine boots, verify the drive is running at a slower speed in dmesg.
Title: Re: MC 11.1 x86 sleeps after 30 days uptime
Post by: Santos on November 12, 2021, 10:56:37 PM
Thank you, I'll be doing some tests with different values and have the computer on for a couple of days. I'll be reporting back in a week or so. Wish me luck. :)
Title: Re: MC 11.1 x86 sleeps after 30 days uptime
Post by: Rich on November 12, 2021, 11:08:28 PM
Hi Santos
Good luck.
Title: Re: MC 11.1 x86 sleeps after 30 days uptime
Post by: Santos on February 04, 2022, 07:21:11 PM
Hello, after a month I'm comming back to post this update.

Tried:
Code: [Select]
libata.dma=0
libata.force=udma66
libata.force=pio4

The most succesful one was  libata.dma=0  , the other two were too heavy on my cheap adaptor. It seams to turn it off completely works better than forcing a particular option.

With  libata.dma=0  I was getting a max write of 3 Mbps and read of 5 Mpbs, was usable and it avoided my computer from choke with my special set up. With the other two options the read and write speeds were in the Kbps range, painfully slow.

All options worked and succesfully allowed my little laptop from freezing. Plus a deep cleaning was done on it, there was some gel-like stuff on the mother board from a protective coat on the wifi card.

Max  uptime  was 43 days before I reboot it again to clean it.

Thank you all.
Title: Re: [Solved] MC 11.1 x86 sleeps after 30 days uptime
Post by: Rich on February 04, 2022, 08:21:36 PM
Hi Santos
Thanks for the update. Glad to hear the boot code helped. I've marked this as solved.