Author Topic: strange system stuck (Read 3777 times)

roeek · « **on:** November 23, 2011, 11:54:03 AM »

i'm using an N270 ATOM board to test my system, which uses several usb devices of ~600mA each.
my tinycore kernel version is 2.6.33.3 .
the problem is very strange: after few hours of work , the main tty screen shows an output that looks like this:

[<c01157db>] ? 0xc01157db
[<c014c422>] ? 0xc014c422
[<c014db11>] ? 0xc014db11
[<c0104473>] ? 0xc0104473
[<c010422b>] ? 0xc010422b
[<c0102b9e>] ? 0xc0102b9e
......

(sorry for not using 'code'. it showed me an error that i'm not allowed to use external links..)
and the system completely stuck. no keyboard, no network..
because i'm working on RAM, i don't have any log after reset.
i could really use your help to understand what can cause this output..
Thanks.

Rich · « **Reply #1 on:** November 23, 2011, 02:01:11 PM »

Hi roeek
Just guessing, but maybe it's some kind of stack dump. Just for debugging purposes, maybe you
should try setting up a couple of links that point any log files to a USB stick.

roeek · « **Reply #2 on:** November 28, 2011, 12:07:15 PM »

well, since i couldn't find any useful log i can take, and since i can't copy any log after the system get stuck, i searched in the kernel source the place it prints those messages and indeed find it in dumpstack.c, in kernel __die.
i removed the memory and registers prints, so when the error occurred again - i could see the first error prints on the screen (before kernel dies).
so now i have this output:
BUG: unable to handle kernel NULL pointer derefrence at 000000e0
IP: [<c03bdb4e>] 0xc03bdb4e
*pde = 00000000
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1d.7/usb1/1-8/1-8.5/1-8.5:1.0/net/eth4/address
EIP: [<c03bdb4e>] SS:ESP 0068:c049be7c
CR2: 00000000000000e0
---[ end trace 0b55e72310c9d136 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G D 2.6.33.3-tinycore #2
Call Trace:

thats it.
in eth4 i had a usb-eth adapter (asix chip)
what's the "last sysfs file" means? does this mean that the problem is somewhere with the eth4 adapter?
if i couldn't fix it, i'd rather reboot the system when this bug happens. what can i add to the end of the "__die" (dumpstack.c) function in order to reboot safely?
thanks.

Rich · « **Reply #3 on:** November 28, 2011, 02:48:28 PM »

Hi roeek

Quote

what's the "last sysfs file" means?

That line refers to the last file that was created in the sys directory

Quote

does this mean that the problem is somewhere with the eth4 adapter?

More than likely a driver problem. How many network adapters are you running? If you unplug that
adapter does the problem go away?

Quote

if i couldn't fix it, i'd rather reboot the system when this bug happens. what can i add to the end of the "__die" (dumpstack.c) function in order to reboot safely?

Now that just has bad idea written all over it. If that code starts getting called from someplace else
due to other errors you'll never know because you are simply rebooting.

Maybe you should elaborate on what you are doing. Are you modifying the kernel or a driver? Trying
to write your own driver?

roeek · « **Reply #4 on:** November 29, 2011, 03:06:58 AM »

i'm not doing anything special.. but unique..
I have 2 built-in ETH interfaces, and 2-3 USB-ETH adapters. so i can have eth0-4 for example.
i'm transmitting around 2Mbps of an endless TCP&UDP data stream from each interface, at the same time, for a few days - without any pause.
the USB-ETH adapters i'm using are based on the Asix chip, and the driver is already exists in the tinycore. i just need to plug-in the usb adapter and raise up the eth interface.
for reboot, i found the option to set kernel.panic to a "seconds to reboot" value.
for debugging the problem, where can i trace the shown address to find the bugged function? i'm suppose to have some kind of address mapping of the kernel somehow.. don't i?
thans.

Rich · « **Reply #5 on:** November 29, 2011, 09:41:53 AM »

Hi roeek
If you don't have any persistent storage attached (hard drive, USB thumb drive, etc) to the board
the add some for now. Assuming the attached storage is /mnt/hda1 for example enter these commands

Code: [Select]

touch /mnt/hda1/kernel.log
sudo rm /proc/kmsg
sudo ln -s /mnt/hda1/kernel.log /proc/kmsg

If the system lets you do that without choking, let it run until it crashes again, then reboot and check
/mnt/hda1/kernel.log for more clues.
Don't forget to mount the drive first if it isn't already mounted.

curaga · « **Reply #6 on:** November 29, 2011, 09:50:20 AM »

Are you sure that's correct? /proc tends not to allow file creation or deletion...

roeek · « **Reply #7 on:** November 29, 2011, 09:53:24 AM »

cool, i didn't know dmesg is available through /proc/kmsg.
but for now it's not so relevant since i already caught the error messages.
yet, it's not possible to 'rm /proc/kmesg' , could be useful..
thanks for tip though.

curaga · « **Reply #8 on:** November 29, 2011, 09:55:46 AM »

Since the network driver is buggy, the obvious answer is to try a newer kernel. You could try TC 4 or build your own for the version of TC you're currently using.

Rich · « **Reply #9 on:** November 29, 2011, 09:55:05 PM »

Hi curaga

Quote

Are you sure that's correct? /proc tends not to allow file creation or deletion...

No, I was not sure it was correct when I wrote it, didn't have time to try it. Now that I've tried it
I'm sure it's not correct.

Tiny Core Linux

News:

Author Topic: strange system stuck (Read 3777 times)

roeek

strange system stuck

Rich

Re: strange system stuck

roeek

Re: strange system stuck

Rich

Re: strange system stuck

roeek

Re: strange system stuck

Rich

Re: strange system stuck

curaga

Re: strange system stuck

roeek

Re: strange system stuck

curaga

Re: strange system stuck

Rich

Re: strange system stuck