1. Please take a little time for this simple survey! Thank you for participating!
    Dismiss Notice
  2. Dear Pleskians, please read this carefully! New attachments and other rules Thank you!
    Dismiss Notice
  3. Dear Pleskians, I really hope that you will share your opinion in this Special topic for chatter about Plesk in the Clouds. Thank you!
    Dismiss Notice

Issue Server unresponsive during Daily Maintenance job

Discussion in 'Plesk 12.x for Linux' started by OlgaKM, Apr 18, 2017.

  1. OlgaKM

    OlgaKM Basic Pleskian

    10
    85%
    Joined:
    Aug 31, 2016
    Messages:
    36
    Likes Received:
    1
    Location:
    NY
    Hi,

    I was having issues with this earlier, but the problem is persisting (and getting worse!). The server becomes completely unresponsive while the Daily Maintenance script is performed. Just now as I was running it manually (to monitor what is going on), the server has been unresponsive for an hour (!). This is completely unacceptable for a production server, which this server is.

    I have tried a couple of things, such as increasing the niceness of the script (default is 10) and limiting iops using cgroups, but none of this has helped.

    I'm not sure where I would even begin to investigate this. Any suggestions?
     
  2. IgorG

    IgorG Forums Analyst Staff Member

    49
    24%
    Joined:
    Oct 27, 2009
    Messages:
    24,385
    Likes Received:
    1,213
    Location:
    Novosibirsk, Russia
    Affiliate:
    https://plesk.com/?a_aid=59ae552b0731c
  3. OlgaKM

    OlgaKM Basic Pleskian

    10
    85%
    Joined:
    Aug 31, 2016
    Messages:
    36
    Likes Received:
    1
    Location:
    NY
    So for clarification, I should run each task 1 by 1 and see which one creates issues?
     
  4. IgorG

    IgorG Forums Analyst Staff Member

    49
    24%
    Joined:
    Oct 27, 2009
    Messages:
    24,385
    Likes Received:
    1,213
    Location:
    Novosibirsk, Russia
    Affiliate:
    https://plesk.com/?a_aid=59ae552b0731c
    Yes.
     
  5. OlgaKM

    OlgaKM Basic Pleskian

    10
    85%
    Joined:
    Aug 31, 2016
    Messages:
    36
    Likes Received:
    1
    Location:
    NY
    Alright, I will attempt to do so. Probably tomorrow, as it is late here. Are the tasks listed on the above page listed in the normal order that they are run? The reason I'm asking is that I know the slow down starts before updates are installed. However, it looks like InstallUpdates is the 2nd task? Does this mean that the slowdown happens as part of CheckForUpdates?
     
  6. IgorG

    IgorG Forums Analyst Staff Member

    49
    24%
    Joined:
    Oct 27, 2009
    Messages:
    24,385
    Likes Received:
    1,213
    Location:
    Novosibirsk, Russia
    Affiliate:
    https://plesk.com/?a_aid=59ae552b0731c
    As far as I know, order is not important.
     
  7. OlgaKM

    OlgaKM Basic Pleskian

    10
    85%
    Joined:
    Aug 31, 2016
    Messages:
    36
    Likes Received:
    1
    Location:
    NY
    I have traced the issue. It looks like the cause is spamtrain (possibly there are multiple causes, I haven't finished testing all the tasks individually). I don't even use any anti-spam system on the server! How do I disable spamtrain? I came across the following article, but though it mentions spamtrain in the heading, it only talks about sa-learn and sa-update (which don't seem to be running on my server currently):

    High CPU usage for "sa-learn" and "spamtrain" processes
     
  8. IgorG

    IgorG Forums Analyst Staff Member

    49
    24%
    Joined:
    Oct 27, 2009
    Messages:
    24,385
    Likes Received:
    1,213
    Location:
    Novosibirsk, Russia
    Affiliate:
    https://plesk.com/?a_aid=59ae552b0731c
    Do you have /usr/bin/sa-learn file on your Plesk server?
    If yes, just remove Spamassassin with

    # rpm -e spamassassin psa-spamassassin
     
  9. OlgaKM

    OlgaKM Basic Pleskian

    10
    85%
    Joined:
    Aug 31, 2016
    Messages:
    36
    Likes Received:
    1
    Location:
    NY
    I uninstalled spamassassin, and this has greatly improved the issue, though not fully resolved it. It is notable that I only get the issue when the Daily Maintenance script is run as a whole. Running the individual tasks one-by-one causes no issue.

    I have created a bash script that runs each tasks one at a time with a 30 second wait between each task. We will see if this helps.
     
  10. OlgaKM

    OlgaKM Basic Pleskian

    10
    85%
    Joined:
    Aug 31, 2016
    Messages:
    36
    Likes Received:
    1
    Location:
    NY
    The step I tried above did not help, but I definitively traced the issue to the ExecuteStatistics task. Google turned up the following article:

    High CPU load during statistics calculation

    I have tried the recommendations described and will see if it helps.
     
    Last edited: Apr 25, 2017
  11. OlgaKM

    OlgaKM Basic Pleskian

    10
    85%
    Joined:
    Aug 31, 2016
    Messages:
    36
    Likes Received:
    1
    Location:
    NY
    The steps in the above article did not help in my case. I ended up using cgroups to finally fix this (at least so far, I am continuing to monitor)

    First, I created a bash file to run each Daily Maintenance task individually called "/root/DailyMaintenanceIndividual.sh":

    Code:
    #!/bin/sh
    
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f CheckForUpdates
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f InstallUpdates
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpdateKeys
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f PleskUsage
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f Sitebuilder
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f MailUsage
    cgexec --sticky -g "cpu,memory,blkio":/throttle /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ExecuteStatistics
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ProcessAutoreports
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f OptimizeStatistics
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f AnalyseDomainStatistics
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f AnalyseClientStatistics
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ExecuteSpamtrain
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f LoadCustomizations
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpdateApsCache
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpdateApsApplications
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ExecuteGlCleaner
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f StoreProtectedConfigs
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f Filesharing
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpgradePanel
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ExecuteWebStatistics
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpdateModSecurityRuleSet
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f AutoresponderEndDate
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpdatePhpCurlCertificates
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ExecuteQuotacheck
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f BackupRestoreStats
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f UpgradeExtensions
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php -f ComposerSelfUpdate
    Then I added the following code to /etc/cgconfig.conf:

    Code:
    group throttle {
            cpu {
                    cpu.rt_period_us="1000000";
                    cpu.rt_runtime_us="0";
                    cpu.cfs_period_us="1000000";
                    cpu.cfs_quota_us="500000";
                    cpu.shares="1024";
            }
            memory {
                    memory.memsw.failcnt="0";
                    memory.limit_in_bytes="1073741824";
                    memory.memsw.max_usage_in_bytes="0";
                    memory.move_charge_at_immigrate="0";
                    memory.swappiness="60";
                    memory.use_hierarchy="0";
                    memory.failcnt="0";
                    memory.soft_limit_in_bytes="134217728";
                    memory.memsw.limit_in_bytes="1073741824";
                    memory.max_usage_in_bytes="0";
            }
            blkio {
                    blkio.throttle.write_iops_device="8:0   10";
                    blkio.throttle.read_iops_device="8:0    10";
                    blkio.throttle.write_bps_device="";
                    blkio.throttle.read_bps_device="";
                    blkio.weight="500";
                    blkio.weight_device="";
            }
    }
    
    What this does is a) limit CPU usage to 50% of 1 core, 2) limit memory usage to 1 GB, 3) limit disk I/O to 10 operations per second. Needless to say, you will need to enable cgroups on your server if isn't enabled already. On my server, which runs RHEL 6.7, this is accomplished using:

    Code:
    service cgconfig start
    /sbin/chkconfig --add cgconfig
    /sbin/chkconfig --list cgconfig
    /sbin/chkconfig cgconfig on
    Finally, you have to modify the file /etc/cron.daily/50plesk-daily (make a backup first!) Replace the following line:

    Code:
    /usr/local/psa/bin/sw-engine-pleskrun /usr/local/psa/admin/plib/DailyMaintainance/script.php >/dev/null 2>&1
    With this one:

    Code:
    sh /root/DailyMaintenanceIndividual.sh >/dev/null 2>&1
    I'm going to monitor the server for the next few days and see how it does. If I have no problems, I will go ahead and raise the limits that I have set. For reference, my server has 8 CPU cores, 16 GB RAM and the drives are capable of about 75 IOPS. This means that with the current cgroups settings, the ExecuteStatistics task will use at most 1/16 of CPU power, 1/16 of RAM and about 1/8 of disk I/O.
     
  12. OlgaKM

    OlgaKM Basic Pleskian

    10
    85%
    Joined:
    Aug 31, 2016
    Messages:
    36
    Likes Received:
    1
    Location:
    NY
    Well, after a few days where the issue did not recur, it looks like I have solved it. It took the following:

    1. The cgroup settings explained above.
    2. I inserted "sleep 30" between each line in the file DailyMaintenance.sh (described above). This causes the script to pause 30 second between each task.
    3. I edited the run-parts command (located at /usr/bin/run-parts) to insert a delay at the end of each command run by the script. I added the line "sleep 5m" just before the "done" command of the main loop. This is the script that crontab and anacrontab use to run tasks in the cron directories. The source of mine is posted below. My modifications are near the very bottom.

    That's it!

    Code:
    #!/bin/bash
    # run-parts - concept taken from Debian
    
    # keep going when something fails
    set +e
    
    if [ $# -lt 1 ]; then
            echo "Usage: run-parts <dir>"
            exit 1
    fi
    
    if [ ! -d $1 ]; then
            echo "Not a directory: $1"
            exit 1
    fi
    
    # Ignore *~ and *, scripts
    for i in $(LC_ALL=C; echo $1/*[^~,]) ; do
            [ -d $i ] && continue
            # Don't run *.{rpmsave,rpmorig,rpmnew,swp,cfsaved} scripts
            [ "${i%.cfsaved}" != "${i}" ] && continue
            [ "${i%.rpmsave}" != "${i}" ] && continue
            [ "${i%.rpmorig}" != "${i}" ] && continue
            [ "${i%.rpmnew}" != "${i}" ] && continue
            [ "${i%.swp}" != "${i}" ] && continue
            [ "${i%,v}" != "${i}" ] && continue
    
            # jobs.deny prevents specific files from being executed
            # jobs.allow prohibits all non-named jobs from being run.
            # can be used in conjunction but there's no reason to do so.
            if [ -r $1/jobs.deny ]; then
                    grep -q "^$(basename $i)$" $1/jobs.deny && continue
            fi
            if [ -r $1/jobs.allow ]; then
                    grep -q "^$(basename $i)$" $1/jobs.allow || continue
            fi
    
            if [ -x $i ]; then
                    if [ -r $1/whitelist ]; then
                            grep -q "^$(basename $i)$" $1/whitelist && continue
                    fi
                    logger -p cron.notice -t "run-parts($1)[$$]" "starting $(basename $i)"
                    $i 2>&1 | awk -v "progname=$i" \
                                  'progname {
                                       print progname ":\n"
                                       progname="";
                                   }
                                   { print; }'
                    logger -i -p cron.notice -t "run-parts($1)" "finished $(basename $i)"
            fi
            sleep 5m
    done
    
    exit 0
     
Loading...