Question Apache & PHP-FPM memory usage spike and no CPU usage

anthill · Aug 31, 2021

Hi,

I'm struggling to narrow down some really high spikes in apache and php-fpm memory usage, which coinsides with a complete drop in apache and php-fpm cpu usage. See attached images.

This is causing a gateway timeout message on the website during these memory spikes.

I've gone through the syslog but nothing stands out as the culprit? Can anyone suggest where to look or has had similar issues.

Note: This doesn't appear to happen at regular times. It happened yesterday when traffic was relatively low (bank holiday monday)

Bitpalast · Aug 31, 2021

Please check /var/log/messages and /var/log/apache2/* logs (/httpd/* logs) for entries that refer to webserver restarts or other issues. I think that the drop in cpu activity means that the web server is not doing anything any longer. Maybe the logs show the reason for the outage.

anthill · Aug 31, 2021

Thanks for the reply. Don't have /var/log/messages. Can't see anything worthwhile in the httpd logs other than the 504 reponses. We are using nginx as a proxy and are seeing lots of these entries in the proxy logs:

2021/08/30 11:35:13 [error] 278301#0: *6391166 upstream timed out (110: Connection timed out) while reading response header from upstream, client

i've also attached the php-fpm settings which are unchanged from plesk defaults

This has happened today. Nothing obvious in logs or using top/process list at the time. I restarted apache and the site restored.

Bitpalast · Aug 31, 2021

What type of server do you have if you don't have /var/log/messages on it? What operating system?

anthill · Aug 31, 2021

Ubuntu 20.04

Bitpalast · Aug 31, 2021

In Ubuntu this would probably be /var/log/syslog.

anthill · Aug 31, 2021

Nothing in the syslog that suggests anything around the times the peak starts (around 11:30)

Aug 30 11:25:01 plesk-user cron[687]: (psaadm) RELOAD (crontabs/psaadm)
Aug 30 11:25:01 plesk-user CRON[1059202]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 30 11:25:01 plesk-user CRON[1059203]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:26:01 plesk-user CRON[1059223]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:26:55 plesk-user snapd[711194]: storehelpers.go:551: cannot refresh: snap has no updates available: "core18", "lxd", "snapd"
Aug 30 11:26:55 plesk-user snapd[711194]: autorefresh.go:513: auto-refresh: all snaps are up-to-date
Aug 30 11:27:01 plesk-user CRON[1059245]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:28:01 plesk-user CRON[1059262]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:29:01 plesk-user CRON[1059296]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:30:01 plesk-user CRON[1059318]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 30 11:30:01 plesk-user CRON[1059320]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:30:01 plesk-user CRON[1059323]: (root) CMD (/var/scripts/process_audit_emails.sh >/dev/null 2>&1)
Aug 30 11:31:01 plesk-user CRON[1059341]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:32:01 plesk-user CRON[1059363]: (psaadm) CMD (/opt/psa/admin/bin/php -dauto_prepend_file=sdk.php '/opt/psa/admin/plib/modules/revisium-antivirus/scripts/ra_executor_run.php')
Aug 30 11:32:01 plesk-user CRON[1059366]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:32:01 plesk-user check-quota[1059392]: Starting the check-quota filter...
Aug 30 11:32:01 plesk-user plesk sendmail[1059391]: handlers_stderr: SKIP
Aug 30 11:32:01 plesk-user plesk sendmail[1059391]: SKIP during call 'check-quota' handler
Aug 30 11:33:01 plesk-user CRON[1059455]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:34:01 plesk-user CRON[1059523]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:35:01 plesk-user CRON[1059545]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:35:01 plesk-user CRON[1059546]: (root) CMD ([ -x /opt/psa/admin/sbin/backupmng ] && /opt/psa/admin/sbin/backupmng >/dev/null 2>&1)
Aug 30 11:35:01 plesk-user CRON[1059549]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 30 11:36:01 plesk-user CRON[1059559]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:37:01 plesk-user CRON[1059580]: (psaadm) CMD (/opt/psa/admin/bin/php -dauto_prepend_file=sdk.php '/opt/psa/admin/plib/modules/sslit/scripts/keep-secured.php')
Aug 30 11:37:01 plesk-user CRON[1059581]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:38:01 plesk-user CRON[1059602]: (panopta-agent) CMD ( /usr/bin/python /usr/bin/panopta-agent/panopta_agent.py --from-cron &> /dev/null)
Aug 30 11:38:01 plesk-user CRON[1059603]: (psaadm) CMD (/opt/psa/admin/bin/php -dauto_prepend_file=sdk.php '/opt/psa/admin/plib/modules/monitoring/scripts/detect-hardware-changes.php')

weltonw · Aug 31, 2021

Are you running on a 512MB server?

anthill · Aug 31, 2021

Server is 6 core, 24 gig ram

Bitpalast · Aug 31, 2021

Your Apache webserver simply fails for a yet unknown reason. That is why the cpu usage drops and the timeout of the response to Nginx occurs. The goal must be to find out why the Apache web server becomes unresponsive.

For example, have you configured graceful restarts for Apache so that Apache does not restart each time a configuration change is applied? There must also be a log entry for either the failure or the web server restart. If it is not in your syslog file, it is somewhere else. There should also be an output with a reason for the failure when you request the service status from your operating system like
# service apache2 status
This should give an excerpt of the latest log entries that might mention an issue.

anthill · Sep 1, 2021

Most of the settings are as they are out-of-the-box so yes, graceful restarts is checked

anthill · Sep 30, 2021

Hi,

Apologies for not replying sooner. I've taken your advice and have been doing my own investigations.

We are still having this issue but its slightly different.

We are still seeing random spikes in memory but not a drop in CPU as before.

Attached is a graph of CPU and Apache & php-fpm memory. When the Apache & php-fpm gets above circa 954MiB we get uers complaining the site is going slow.

When it gets about 1.2GiB users start seeing gateway timeouts and the site becomes unresponsive. First thing i did was to restart apache and the php-fpm processes which cause a very slight dip in the apache ram usage, but immediately it started increasing again. The big drop in CPU is when i gave up and rebooted the whole VPS. As soon as it came back online the usage ramped up once again.

The VPS itself has 6 cores and 24Gig of RAM so why isn't appache utilising these resources effectively?

At times such as the spike shown, we are seeing 50 (as allocated) php-fpm process all using around the same amount of resource, i.e. less that 1% cpu and ram. So not one single process seems to be the the single cause.

Additionally, at this point we are also seeing a hugh spike in database connections (AWS RDS - see other attachment) . For as many db connections we are also seeing a lot of db connections in a sleep state.

Any help is much appreciated.

LTUser · Sep 30, 2021

Your problem is that either a search bot opens a lot of connections very quickly or someone tries DOS on the domain.

Check this by opening the log file acess_ssl.log for the domain concerned and looking there for entries of an IP address from which many page views take place in quick succession.

You can then block this IP address with the firewall

anthill · Sep 30, 2021

Hi. Thanks for your reply.

I've been looking at the access logs and can't see any major spikes. I did see AhrefsBot was crawling links which i've since prevented in the robots.txt but i'm not sure if this could have been related. I also forgot to mention the website uses Wordpress 5.8 but also has a lot of custom PHP with a persisten db connection.

LTUser · Sep 30, 2021

In that case, you may need to upgrade from 24 to 32 GB of RAM or more to avoid such bottlenecks.

Then, with more ram, you should check your database settings and carefully increase the value for Connections.

anthill · Sep 30, 2021

but where is the bottleneck? I did try increasing the processes from 10 to 20, then 20 to 50, then 50 to 80 then 80 to 100 (over a period of days), yet still, when the Apache and PHP memory went above circa 954MiB the same symptoms occured.

LTUser · Sep 30, 2021

anthill said:
but where is the bottleneck?

There

anthill said:
the website uses Wordpress 5.8 but also has a lot of custom PHP with a persisten db connection.

Your note above, in connection with what is quoted, indicate that the connection: Database / Php / Webbrowser / Nginx is running out of resources somewhere and one in the row gets stuck. (Gateway timeout ...)

Either the settings of those involved must be optimized or the RAM must be increased so that all processes involved can use enough RAM to prevent a crash.

You could also ensure that database connections are closed again immediately after they have been used (minimize the persisten db connections) , and thus released. But then your CPU will be more stressed.

Question Apache & PHP-FPM memory usage spike and no CPU usage

anthill

New Pleskian

Attachments

Bitpalast

Plesk addicted!

anthill

New Pleskian

Attachments

Bitpalast

Plesk addicted!

anthill

New Pleskian

Bitpalast

Plesk addicted!

anthill

New Pleskian

weltonw

Regular Pleskian

anthill

New Pleskian

Bitpalast

Plesk addicted!

anthill

New Pleskian

anthill

New Pleskian

Attachments

LTUser

Regular Pleskian

anthill

New Pleskian

LTUser

Regular Pleskian

anthill

New Pleskian

LTUser

Regular Pleskian

Similar threads

Question Apache & PHP-FPM memory usage spike and no CPU usage

New Pleskian

​

Attachments

Plesk addicted!

New Pleskian

Attachments

Plesk addicted!

New Pleskian

Plesk addicted!

New Pleskian

Regular Pleskian

New Pleskian

Plesk addicted!

New Pleskian

New Pleskian

Attachments

Regular Pleskian

New Pleskian

Regular Pleskian

New Pleskian

Regular Pleskian

Similar threads