Hi FlyingDutchman
... I have a health check script scheduled in cron to perform some basic checks every 5 minutes and take action if needed. Mostly this works well, but in some rare occasions this script does more damage than good. That results in a system with too many processes running, probably short on memory and high CPU load. ...
I would seriously try to find out why your script causes this behavior before considering a watchdog timer.
A few things come to mind:
Sometimes your script doesn't exit. Over time multiple copies are left running, possibly fighting each other.
Sometimes your script takes more than 5 minutes to run. A second copy gets started and clashes with the first copy.
A loop in your script reads a system file (/proc, /sys, ... ) waiting for something to happen. Without a sleep command
to slow down that loop, CPU usage will quickly rise to 100%.
Launching a command in the background that doesn't always complete. Over time multiple copies are left running.