Input How to overcome intermittent, occasional 502 errors due to Apache or PHP-FPM response failures

Bitpalast · Jul 26, 2017

The 502 error is a frequently reported issue. While this is most often based on PHP FPM services being offline or Apache service not running, we have observed occasional 502 errors or PHP-FPM service failures or Apache crashes for seemingly no reason. These occur on systems with many concurrent users or webserver connections, e.g >200 accounts or >500 domains.

Symptoms:

a) Sometimes after a configuration change, Apache – despite set to “graceful” restarts in Plesk - does a “restart” (hard restart) instead of a “reload” (graceful restart). During a restart, configuration changes are not only reloaded, but the service halts and restarts, leaving a short period of time when it is inaccessible. If during that time Nginx sends a request to Apache, the request will not be answered, hence a 502 error occurs.

b) PHP-FPM sometimes becomes unresponsive, although a “# service Plesk-php<version>-fpm status” displays the state as “active” with no errors. When a .php script is requested, the script will not answer, and again Nginx sends a 502 to the user. The service must be restarted to get websites using that PHP version with PHP-FPM up again.

c) Apache sometimes crashes altogether on reloading or restarting for seemingly no reason. The service enters a “failed state” and must be restarted manually shortly after. It can be restarted without any difficulties. During the offline period of Apache, Nginx sends a 502 to the user, because Nginx does not receive a response from Apache.

Solution:

Step 1) One of the basic pre-requisites to overcome frequent 502 errors is to set Apache to graceful restarts according to this Plesk FAQ article: How to enable graceful restart for Apache . Do that first. For the majority of users it will solve most 502 errors without taking any additional steps.

Step 2) We found that Plesk seems to be restarting Apache twice upon configuration changes. It sometimes reports that it cannot do a graceful restart, so it does a hard restart right after that. The faulty control adapter is not only causing a long wait for reloads or restarts upon configuration changes. An additional or alternative hard restart causes a short interruption of the service availability, too, leading to the 502 error in Nginx if a user requests a file in that very moment. Apply this bug fix, if the file apache_control_adapter is of March 17, 2017, or older: Apache service restarts twice while applying changes in Plesk .

Step 3) During some Apache reload or restart processes, you might notice an error message in the log like “[core:emerg] No space left on device: Couldn’t create the mpm-accept mutex” while there is enough space on the device. The error message might or might not appear. It could be a different error message that seems strange, as there are enough resources available. It is possible, too, that there are no messages at all, yet Apache simply enteres a "failed state". The real issue behind such messages or Apache or PHP-FPM crashes is that there are not enough semaphores available in the operating system. However, that is never reported in a log, so there is no message like “semaphores exhausted”. Instead you will see or experience some strange error with a strange message or no message at all. The solution is to increase the system semaphores.

For example, on a Red Hat/CentOS system, the semaphore defaults (# cat /proc/sys/kernel/sem) are
250 32000 32 128
(SEMMSL, SEMMNS, SEMOPM, SEMMNI)
(max_sem_per_id max_sem_total max_ops_sem_call max_sem_ids)
(max_semaphores_per_id, max_semaphores total, max_operations_semaphore_call, max_semaphore_ids)

These are too low for today’s average shared hosting environments. Increase the parameters by this scheme:

SEMMSL, max_sem_per_id: X
SEMMNS, max_sem_total: SEMMSL * SEMMNI = X * Y = Z
SEMOPM, max_sem_ops_sem_call: X (should be the same like max_sem_per_id)
SEMMNI, max_sem_ids: Y

For example, on a system with a 6-core processor and 128 GB RAM, values like
1024 131072 1024 128
are much more appropriate than the default.

For Red Hat/CentOS these are entered for example into a separate line in /etc/sysctl.conf as
kernel.sem = 1024 131072 1024 128
Reload the values
# sysctl -p

With steps 1, 2 and 3 combined, you should no longer experience any strange Apache or PHP-FPM service failures, no excessively long wait on configuration changes in Plesk or other strange symptoms alike and of course: No more strange 502 errors.

Input How to overcome intermittent, occasional 502 errors due to Apache or PHP-FPM response failures

Bitpalast

Plesk addicted!

Similar threads