There have been some problems with my server recently. It has been running out of swap three or four times resulting in random processes being killed. Two times i had to reboot it, because i was not able to log in anymore, first time the named process was killed, second time the ssh process was killed.
Now i have added some kind of poor mans watchdog script to cron.
The cause of the problem is still unknown, the log only says that perl processes are stealing my memory, so suspects are SpamAssassin or a CGI.
/etc/rc.d/sshd status > /dev/null
if [ $? -ne 0 ]; then
/etc/rc.d/sshd restart
fi
I restored a backup from before the problem started (wow, my backups seem to actually work..) and started to experiment with RLimitMem.
Since i am going on holiday in September (two weeks without internet access), i hope the problems are solved now.
P.S. turbo23 pointed me to the work by Chris Jones for SoC06, which would have solved a lot of the problems. So i am looking forward to getting jail resource limits into the mainline.
Leave a comment