• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Issue Server unresponsive during Daily Maintenance job

OlgaKM

Basic Pleskian
Hi,

I was having issues with this earlier, but the problem is persisting (and getting worse!). The server becomes completely unresponsive while the Daily Maintenance script is performed. Just now as I was running it manually (to monitor what is going on), the server has been unresponsive for an hour (!). This is completely unacceptable for a production server, which this server is.

I have tried a couple of things, such as increasing the niceness of the script (default is 10) and limiting iops using cgroups, but none of this has helped.

I'm not sure where I would even begin to investigate this. Any suggestions?
 
Alright, I will attempt to do so. Probably tomorrow, as it is late here. Are the tasks listed on the above page listed in the normal order that they are run? The reason I'm asking is that I know the slow down starts before updates are installed. However, it looks like InstallUpdates is the 2nd task? Does this mean that the slowdown happens as part of CheckForUpdates?
 
I have traced the issue. It looks like the cause is spamtrain (possibly there are multiple causes, I haven't finished testing all the tasks individually). I don't even use any anti-spam system on the server! How do I disable spamtrain? I came across the following article, but though it mentions spamtrain in the heading, it only talks about sa-learn and sa-update (which don't seem to be running on my server currently):

High CPU usage for "sa-learn" and "spamtrain" processes
 
Do you have /usr/bin/sa-learn file on your Plesk server?
If yes, just remove Spamassassin with

# rpm -e spamassassin psa-spamassassin
 
I uninstalled spamassassin, and this has greatly improved the issue, though not fully resolved it. It is notable that I only get the issue when the Daily Maintenance script is run as a whole. Running the individual tasks one-by-one causes no issue.

I have created a bash script that runs each tasks one at a time with a 30 second wait between each task. We will see if this helps.
 
The steps in the above article did not help in my case. I ended up using cgroups to finally fix this (at least so far, I am continuing to monitor)

First, I created a bash file to run each Daily Maintenance task individually called "/root/DailyMaintenanceIndividual.sh":

Code:
#!/bin/sh

/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f CheckForUpdates
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f InstallUpdates
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpdateKeys
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f PleskUsage
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f Sitebuilder
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f MailUsage
cgexec --sticky -g "cpu,memory,blkio":/throttle /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ExecuteStatistics
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ProcessAutoreports
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f OptimizeStatistics
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f AnalyseDomainStatistics
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f AnalyseClientStatistics
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ExecuteSpamtrain
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f LoadCustomizations
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpdateApsCache
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpdateApsApplications
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ExecuteGlCleaner
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f StoreProtectedConfigs
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f Filesharing
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpgradePanel
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ExecuteWebStatistics
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpdateModSecurityRuleSet
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f AutoresponderEndDate
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpdatePhpCurlCertificates
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ExecuteQuotacheck
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f BackupRestoreStats
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpgradeExtensions
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ComposerSelfUpdate

Then I added the following code to /etc/cgconfig.conf:

Code:
group throttle {
        cpu {
                cpu.rt_period_us="1000000";
                cpu.rt_runtime_us="0";
                cpu.cfs_period_us="1000000";
                cpu.cfs_quota_us="500000";
                cpu.shares="1024";
        }
        memory {
                memory.memsw.failcnt="0";
                memory.limit_in_bytes="1073741824";
                memory.memsw.max_usage_in_bytes="0";
                memory.move_charge_at_immigrate="0";
                memory.swappiness="60";
                memory.use_hierarchy="0";
                memory.failcnt="0";
                memory.soft_limit_in_bytes="134217728";
                memory.memsw.limit_in_bytes="1073741824";
                memory.max_usage_in_bytes="0";
        }
        blkio {
                blkio.throttle.write_iops_device="8:0   10";
                blkio.throttle.read_iops_device="8:0    10";
                blkio.throttle.write_bps_device="";
                blkio.throttle.read_bps_device="";
                blkio.weight="500";
                blkio.weight_device="";
        }
}

What this does is a) limit CPU usage to 50% of 1 core, 2) limit memory usage to 1 GB, 3) limit disk I/O to 10 operations per second. Needless to say, you will need to enable cgroups on your server if isn't enabled already. On my server, which runs RHEL 6.7, this is accomplished using:

Code:
service cgconfig start
/sbin/chkconfig --add cgconfig
/sbin/chkconfig --list cgconfig
/sbin/chkconfig cgconfig on

Finally, you have to modify the file /etc/cron.daily/50plesk-daily (make a backup first!) Replace the following line:

Code:
/usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php >/dev/null 2>&1

With this one:

Code:
sh /root/DailyMaintenanceIndividual.sh >/dev/null 2>&1

I'm going to monitor the server for the next few days and see how it does. If I have no problems, I will go ahead and raise the limits that I have set. For reference, my server has 8 CPU cores, 16 GB RAM and the drives are capable of about 75 IOPS. This means that with the current cgroups settings, the ExecuteStatistics task will use at most 1/16 of CPU power, 1/16 of RAM and about 1/8 of disk I/O.
 
Well, after a few days where the issue did not recur, it looks like I have solved it. It took the following:

1. The cgroup settings explained above.
2. I inserted "sleep 30" between each line in the file DailyMaintenance.sh (described above). This causes the script to pause 30 second between each task.
3. I edited the run-parts command (located at /usr/bin/run-parts) to insert a delay at the end of each command run by the script. I added the line "sleep 5m" just before the "done" command of the main loop. This is the script that crontab and anacrontab use to run tasks in the cron directories. The source of mine is posted below. My modifications are near the very bottom.

That's it!

Code:
#!/bin/bash
# run-parts - concept taken from Debian

# keep going when something fails
set +e

if [ $# -lt 1 ]; then
        echo "Usage: run-parts <dir>"
        exit 1
fi

if [ ! -d $1 ]; then
        echo "Not a directory: $1"
        exit 1
fi

# Ignore *~ and *, scripts
for i in $(LC_ALL=C; echo $1/*[^~,]) ; do
        [ -d $i ] && continue
        # Don't run *.{rpmsave,rpmorig,rpmnew,swp,cfsaved} scripts
        [ "${i%.cfsaved}" != "${i}" ] && continue
        [ "${i%.rpmsave}" != "${i}" ] && continue
        [ "${i%.rpmorig}" != "${i}" ] && continue
        [ "${i%.rpmnew}" != "${i}" ] && continue
        [ "${i%.swp}" != "${i}" ] && continue
        [ "${i%,v}" != "${i}" ] && continue

        # jobs.deny prevents specific files from being executed
        # jobs.allow prohibits all non-named jobs from being run.
        # can be used in conjunction but there's no reason to do so.
        if [ -r $1/jobs.deny ]; then
                grep -q "^$(basename $i)$" $1/jobs.deny && continue
        fi
        if [ -r $1/jobs.allow ]; then
                grep -q "^$(basename $i)$" $1/jobs.allow || continue
        fi

        if [ -x $i ]; then
                if [ -r $1/whitelist ]; then
                        grep -q "^$(basename $i)$" $1/whitelist && continue
                fi
                logger -p cron.notice -t "run-parts($1)[$$]" "starting $(basename $i)"
                $i 2>&1 | awk -v "progname=$i" \
                              'progname {
                                   print progname ":\n"
                                   progname="";
                               }
                               { print; }'
                logger -i -p cron.notice -t "run-parts($1)" "finished $(basename $i)"
        fi
        sleep 5m
done

exit 0
 
Back
Top