Problem with cron daily/logrotate/mysql [high server load]

MislavO · Oct 30, 2012

---------------------------------------------------------------
PRODUCT, VERSION, MICROUPDATE, OPERATING SYSTEM, ARCHITECTURE
Plesk, 11.0.9, #21, Debian 6.0.6, Quad-Core i7-2600 processor (Hyper-Threading), 32 GB DDR 3 RAM

PROBLEM DESCRIPTION
In last couple of days I've noticed high load on server (I'm talking about load 30-90) in some parts of day. To be more specific, this happens only ONCE per day. After few days of the investigation, i think the problems is caused by daily crons that are running. Why i think so? Lets start.

After problem occurred, i have searched, and I've found nothing specific in following logs:
#/var/log/syslog
#/var/log/apache2/error.log
#/var/log/sw-cp-server/error_log
#/var/log/messages
#/usr/local/psa/var/log/maillog

More or less, I've also activated slow query log (mysql) and I'm watching that file also.

On the first day when this happened i didn't even knew what "hit me", because i was not even logged in shell and client called me that his website is not working. Sign, because of high load, nothing could be reached, websites were down, mysql were down, after few minutes i managed to login into shell and reboot the server. Every day after that i was logged into shell and I was waiting for this to happen again (assuming it would happen again on the very same time). Since my assume was correct, on the next day it happened again. I've start seeing "logrotate" and "mysql" processes in top, which i "killed". After killing process "logrotate" and stoping "mysql", server load dropped down to 3-5 (after load dropped, I've successfully started again mysql and pages were working just fine). As i mention already, this happens once per day, so after this "problem", rest of the day (and before that), everything works smooth.

Since i was suspicious in this "logrotate" which is most likely run by daily cron, i rescheduled daily cron to run 2 hours before it usually does. As you could already assume, on the next day this problem happened again. Again i solved this problem with killing process "logrotate" and stoping mysql. To be more obvious, i rescheduled this cron for next day to run hour and half before last day, and again this problem occurred. So i suspect in two things:
1) daily cron (more obvious)
2) mysql (maybe - i doubt some client is running some long slow query on exactly the same time i rescheduled new crons to run and even if they runs they are not written in slow query log because i stop mysql and this query maybe never gets completed)

Maybe something went wrong because on the first day i rebooted server, although i didn't knew what is even happening then, i only wanted to get everything running (production server, more then 300 domains). Maybe now this cron is trying to complete something which might didn't finish or its running something "heavy" which server obvious can't handle and I'm not able to resolve this problem by myself, that's why I'm requesting some support.

STEPS TO REPRODUCE
Daily cron that is run on specific hour, i guess cron completes "what ever" he's doing first, and after that it encounters on possible problem which leads to my problem.

ACTUAL RESULT
Plesk panel is unreachable, websites/mysql is down, server load is 30-90, to do something in shell you need few minutes.

EXPECTED RESULT
Everything should go smooth and with no problems, as it was before.

ANY ADDITIONAL INFORMATION
- access to the "Log Rotation" is disabled for every client
- log rotations are not run by size, but daily
- all websites runs as fast cgi, not apache module
- FTP/fastcgi/apache are tunned in some ways
- when nothing is happening on the server, load is usually 0.3 - 2 (on high load 3-6)
--------------------------------------------------------------

Any suggestions, what to do next?

MislavO · Nov 3, 2012

I manage to solve my problem. Since i assume that something is wrong with this "logrotate", i wanted to investigate my problem deeper and deeper, and i found the problem. Problem was really with logs. Puzzle now have sense and it's finish.

I won't type the whole story all over again what happened first days, but after few days of killing "logrotate", logs apparently grow bigger and bigger, and since i was still killing "logrotate" process every day, this scheduled cron never finish what he was suppose to do. So i was thinking, well....what the.....something is killing this cron that he never finish, and when i check disk space on server, i was like WOW, it grow bigger 500GB+, and this is where i found my problem. I run command:

find . -size +<number_for_size>

(note, you can type here size in both Kilobytes/Megabytes, but it's precisely if you type in MB, e.g. find . -size +100MB)

to find all files bigger that...e.g. 1GB, and yeah. i found it! One client had 2 error logs, one was 502GB and second 44GB. So i guess when cron run into this log, he could process it, but since it's way toooo big, this cause load to 30-90.

So for all of you out there, make sure if you kill "logrotate" process, you check after that if some log is big, compress it, remove, empty it, what ever you want to do, or you will have same problem as i did.

Problem with cron daily/logrotate/mysql [high server load]

MislavO

Regular Pleskian

MislavO

Regular Pleskian

Similar threads