Hi zbs888
... It seems that when a malfunction occurs, the entire system seems to freeze, with cron not executing, SSH new connections waiting, and existing SSH connections functioning normally.
If the system is still responsive enough to run commands from a terminal, here
are a few things you could try.
This will create 5 snapshots spaced 1 second apart called ps1.txt through ps5.txt:
for i in `seq 1 1 5`; do sleep 1; ps aux > ps"$i".txt; done
This will display the results:
grep -v "0.0 0.0 0 0" ps1.txt | less
Entries that have %CPU %MEM VSZ RSS all set to zero will be filtered out.
Check if any process is consistently hogging CPU or MEM.
Processes with high RSS values are using the most RAM.
This lists the meaning of the codes in the STAT column:
https://askubuntu.com/a/360253Run:
free -m
Look at the -/+ buffers/cache: row. If its free column is approaching zero, your
memory requirements are greater than your RAM.
Look at the Swap: row and see if you are filling up swap space.
Run:
vmstat 1
Look at the si and so columns to see if the system is busy swapping.
This link describes the columns displayed by vmstat:
https://phoenixnap.com/kb/vmstat-command#ftoc-heading-3Create a baseline to compare against by first running the commands
when the system is operating normally.