WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: Hardware or TC stressed to hang  (Read 3468 times)

Offline CentralWare

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 765
Hardware or TC stressed to hang
« on: February 11, 2015, 10:21:20 PM »
I have a pair of Via C3 motherboards (embedded) with a gig of ram, hybrid storage (flash boot (TC) and MicroDrive (16GBx2)) and otherwise a very basic system.  The two boards are used for repo work (4.x/tcz is on Via1, 5.x/tcz is on Via2) as well as a couple otherwise basic functions (DropBear, DnsMasq, BusyBox-httpd and I believe NFS-Utils.)

If left to operate as basic DHCP/DNS/TFTP/NFS machines, they'll run all day long - though they get very little "beating."
I ran a process which is beating them senseless through bb-httpd downloading repo files.  I placed a 1 second delay between downloads (remote machine pulls .tree, .dep, .info and the tcz itself, pauses, then repeats for the next extensions.)

For no apparent reason, the machine will hang.  (I can't ping, ssh into it, and when I switch the KVM over to it, I get no response from keypresses - just brain-freeze.)  This applies to both machines, so it's difficult for me to imagine it's hardware failure.  I have to manually reset the machine (in my case, cold-power-off as the racks don't have reset buttons) and thereafter things seem normal again...  until I hammer it in the fashion described.

Possible theories:

1) Networking overload hangs the machine - kernel level?  (** The machines USED to have PCI dual-GBE cards installed.  These were removed to ensure it's not Pro-1000 kernel driver related.)  The onboard VIA-Rhine are the ones being used.  Via1 has a dormant RTL-8150 USB T-100 network interface (dormant meaning no cable attached; it's used for firewall script testing.)

2) httpd somehow or another?  I can't fathom how it would hang the machine, though - bog it down maybe...

3) Possible configuration issues?  (ie: a physical swap instead of zswap needed, maybe?)

4) (sarcastic laugh) These "almost fan-less" boards need fans on the CPU instead of/or AND the bridge??!!
    (I think it's my ATOM boards that have fans on the N-Bridge and not the CPU and the VIA on the processor...  can't recall, thus the sarcasm.  Who would build a board with a bridge that runs hotter than the CPU????)

I don't have syslog being persisted (yet) so I don't have before-cold-boot logs to work with, but on a fresh boot there's nothing complaining.

"If this were you..."  where would you start investigating?

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 11634
Re: Hardware or TC stressed to hang
« Reply #1 on: February 12, 2015, 12:32:45 AM »
Hi centralware
Using syslog would be a good place to start. I would also recommend using a time stamp as filename so it doesn't get overwritten
when you reboot. If using persistent storage is not convenient on these machines, you can send the logs to an IP address and have
another machine save the file. Maybe use the second NIC on these motherboards for that.

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11044
Re: Hardware or TC stressed to hang
« Reply #2 on: February 12, 2015, 03:59:52 AM »
If you can reproduce at will, try connecting a screen and leaving it to the command line (no X). Any kernel errors would show there.

A sleep like that shouldn't be necessary, I've hammered bb-httpd in the past using siege without issue (a web site load tester tool).
The only barriers that can stop you are the ones you create yourself.

Offline CentralWare

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 765
Re: Hardware or TC stressed to hang
« Reply #3 on: February 12, 2015, 10:58:44 AM »
Thanks guys!

@curaga: Is there any way to disable the console screen saver?  (I leave the machine connected to a KVM, but the screen "Blanks" thus if there's a panic, it's hidden behind nothingness.  :-\  There's no X, so that's an easy-to-skip bug test!  siege might come in handy here, too!

@Rich: Yes, persisting logs was on my to-do list (/var isn't a built in persist option, so I'll likely bind-mount /var/log for this purpose in case there are others which also need to be saved.  I don't recall busybox's httpd having an option for a separate log file, but I'll be looking through the manual in case I missed it.)  syslog is enabled as a boot code.  Is there a way to redirect where syslog keeps its file(s) by chance?

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 11634
Re: Hardware or TC stressed to hang
« Reply #4 on: February 12, 2015, 11:40:24 AM »
Hi centralware
Code: [Select]
tc@box:~$ syslogd --help
BusyBox v1.19.3 (2011-10-30 01:47:29 UTC) multi-call binary.

Usage: syslogd [OPTIONS]

System logging utility
(this version of syslogd ignores /etc/syslog.conf)

        -n              Run in foreground
        -O FILE         Log to FILE (default:/var/log/messages)
        -l N            Log only messages more urgent than prio N (1-8)
        -S              Smaller output
        -s SIZE         Max size (KB) before rotation (default:200KB, 0=off)
        -b N            N rotated logs to keep (default:1, max=99, 0=purge)
        -R HOST[:PORT]  Log to IP or hostname on PORT (default PORT=514/UDP)
        -L              Log locally and via network (default is network only if -R)
        -D              Drop duplicates
        -C[size_kb]     Log to shared mem buffer (use logread to read it)

tc@box:~$
Just start it from bootlocal or bootsync with the options you want.

Offline CentralWare

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 765
Re: Hardware or TC stressed to hang
« Reply #5 on: February 12, 2015, 11:52:57 AM »
@Rich: I have syslog as a boot code on all machines running TC here, thus capturing boot messages.  This tends to be extremely important on the Cisco servers (IBM boards) as many times I have to trace problems with the TG3 network cards, so if we can preserve syslog at boot this would be awesome; but more importantly, would I have to kill the running syslogd in order to launch it manually (ie: bootsync) or would it be possible to have two instances running @ the same time?

(I launched a separate process manually and see two instances...  but not sure whether or not to expect problems doing so.)

Thanks!

Offline CentralWare

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 765
Re: Hardware or TC stressed to hang
« Reply #6 on: February 12, 2015, 11:55:33 AM »
@Rich: Disregard the last post.  The second instance "takes over" and the first basically just sits idle, thus I might as well shut down the first instance before running the second.  Thanks!