• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Issue Huge performance issues

Tassos Voulgaris

New Pleskian
Server operating system version
Debian 11.8
Plesk version and microupdate number
Plesk Obsidian 18.0.56 Update 4
I have comissioned a very capable server in order to host 40 Wordpress & Woocommerce sites using Plesk Obsidian Web Host Edition Version 18.0.56 Update #4. I am experienging dissapointing performance not only on my sites, but on Plesk as well. I will try to provide as much information I can, so maybe a sysadmin much more capable than me can provide some insights:

1. Server Info

CPUIntel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz (12 core(s))
VersionPlesk Obsidian v18.0.56_build1800231106.15 os_Debian 11.0
OSDebian 11.8
RAM256 GB

2. Websites Info
  • 37 Woocommerce sites
  • 3 Wordpress site
  • Optimized with page caching (Wp-Rocket) and Object Caching (Redis). Optimized images
  • Low traffic in general. Average 60 simultaneous connections. Spikes sometimes up to 300 simultaneous connections.
3. Plesk
  • All sites are running on NGINX alone (proxy mode is not checked)
  • 21 sites are running as FPM Application and 19 as Dedicated FPM Application
  • All sites are running on PHP 8.2
  • For all sites Performance Booster options are enabled
4. MariaDB

I am trying to optimize MariaDB using Releem.
Current MariaDB settings are:


Code:
+-------------------------------------+----------------+
| Parameter                           | Value          |
+-------------------------------------+----------------+
| none                                | none           |
| innodb_change_buffer_max_size        | 25             |
| innodb_adaptive_flushing_lwm         | 25.000000      |
| innodb_max_dirty_pages_pct           | 70.000000      |
| innodb_autoextend_increment          | 48             |
| thread_stack                        | 524288         |
| transaction_prealloc_size           | 8192           |
| thread_cache_size                   | 256            |
| max_connections                     | 1000           |
| query_cache_type                    | 1              |
| query_cache_size                    | 134217728      |
| query_cache_limit                   | 33554432       |
| query_cache_min_res_unit            | 4096           |
| key_buffer_size                     | 8388608        |
| max_heap_table_size                  | 16777216       |
| tmp_table_size                       | 16777216       |
| innodb_buffer_pool_instances        | 1              |
| innodb_buffer_pool_size              | 10066329600    |
| innodb_log_file_size                 | 1354760192     |
| innodb_file_per_table                | 1              |
| sort_buffer_size                     | 2097152        |
| read_rnd_buffer_size                 | 262144         |
| bulk_insert_buffer_size              | 8388608        |
| myisam_sort_buffer_size              | 134216704      |
| innodb_page_cleaners                 | 1              |
| innodb_buffer_pool_chunk_size        | 134217728      |
| join_buffer_size                     | 262144         |
| table_open_cache                     | 2000           |
| table_definition_cache               | 400            |
| innodb_flush_log_at_trx_commit       | 2              |
| innodb_log_buffer_size               | 16777216       |
| innodb_write_io_threads              | 4              |
| innodb_read_io_threads               | 4              |
| innodb_flush_method                  | fsync          |
| innodb_thread_concurrency            | 0              |
| optimizer_search_depth               | 62             |
| innodb_purge_threads                 | 4              |
| thread_handling                      | one-thread-per-connection |
| thread_pool_size                     | 12             |
+-------------------------------------+----------------+

5. NGINX

I have tried to tweak NGINX configuration with the following values:
Code:
#user  nginx;
worker_processes  auto;

#error_log  /var/log/nginx/error.log;
#error_log  /var/log/nginx/error.log  notice;
#error_log  /var/log/nginx/error.log  info;

#pid        /var/run/nginx.pid;

include /etc/nginx/modules.conf.d/*.conf;

events {
    worker_connections  1024;
}


http {
    include       mime.types;
    default_type  application/octet-stream;

    #log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
    #                  '$status $body_bytes_sent "$http_referer" '
    #                  '"$http_user_agent" "$http_x_forwarded_for"';

    #access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;
    #tcp_nodelay        on;

    #gzip  on;
    #gzip_disable "MSIE [1-6]\.(?!.*SV1)";

    server_tokens off;
client_body_buffer_size 10m;
client_max_body_size 100m;
client_body_timeout 60s;
open_file_cache max=1024 inactive=10s;
open_file_cache_valid 60s;
open_file_cache_min_uses 2;
open_file_cache_errors on;


    include /etc/nginx/conf.d/*.conf;
}

# override global parameters e.g. worker_rlimit_nofile
include /etc/nginx/*global_params;

6. Some typical outputs of top, atop, htop

top.jpg

atop.jpg
htop.jpg


I would be happy to provide any other information you think it will be helpfull and I would be gracefull for any feedback you can give me.
 
The number of php-fpm processes is way too high, and some of them are also under heavy load. There can be a number of reasons. One major reason are bad bots that are hitting a specific site with lots of traffic. This can be seen in the access_ssl_log of the affected site. The site(s) can be determined from the users who are mentioned as owners of the php-fpm processes. The solution is to block the bots, e.g. by using suitable Fail2Ban jails.

I'd also not trust WP-Rocket too much, because it has the habit to visit your own sites frequently to update its cache. I've seen many cases where WP-Rocket caused so much traffic that it became the reason why the website was slow while without it it was working like a charm. If you see WP Rocket visiting your site(s) often in your access_ssl_log, you've found at least one issue.

The innodb_buffer_pool_size variable value of MariaDB might be too large. You have it set to 10 GB. What will happen there is that for each access to that pool, a lot of data needs to be read or re-arranged. The larger the pool, the more transactions and data shuffling is needed. At some point, this does not speed up access to information stored in the database, but slows it down, because all the pool processing actions take much longer than if you'd just directly access tables. I do not know of course, if this is the case for your server.

Finally, there could be a very simple issue: When all your websites are Woocommerce sites and get a lot of traffic, they will generate a lot of cpu load altogether.
 
I will add to Peter answer than you need check access log

You can add server-status and check what happen there.
 
Hello and thank you so much for your valueble insights. Sorry for the late reply but I was down with corona.
I am trying to evaluate and apply your recommendations I will come back with an update
 
One question. Which approach is better? Should I have all sites using the FPM application served by NGINX, or have the most visited sites setup using Dedicated FPM application served by NGINX?
 
I wrote a bash script to help me automate the analysis of my logs. Here it is:
Code:
#!/bin/bash

LOG_PATH="/var/www/vhosts/*/logs/access_ssl_log.processed.1.gz"

for logfile in $LOG_PATH; do
    echo "Analyzing $logfile"

   echo "IP addresses"
   zcat "$logfile" | awk '{print $1}' | sort | uniq -c | sort -nr | head
   echo "404"   
   zcat "$logfile" | awk '$9 == 404'  | awk '{print $7}' | sort | uniq -c | sort -nr | head
   echo "common user agents"
   zcat "$logfile" | awk -F\" '{print $6}' | sort | uniq -c | sort -nr | head
   echo "common urls"
   zcat "$logfile" |  awk '{print $7}' | sort | uniq -c | sort -nr | head
   echo "post requests"
   zcat "$logfile" |  awk '$6 == "POST" {print $7}' | sort | uniq -c | sort -nr
done

And here is a summary of my findings:

  1. Some sites have huge traffic from Amazonbot
  2. Substancial traffic from Plesk screenshot bot on all sites
  3. WP Rocket/Preload indeed adds high traffic (but doesnt it help with the result of caching pages?)
  4. /.well-known/traffic-advice is the most common 404
 
Wp-rocket preload is a problem some times, so if you have a lot of wordpress and preload just start scan your site, its add a lot of load.

May help, yes, but if you run a preload every day and clean cache every day not help to much, try run preload at 3am for example in all sites and not clean cache each day, just clean it if something change.
 
I think your problem is from the bots. I have same problem with AS32934 (Facebook). The only way is Cloudflare and block the bots.
 
  • Optimized with page caching (Wp-Rocket) and Object Caching (Redis). Optimized images
Do I see that right that you have just one redis?
You should have a separate redis for each dataset. Especially if the cache is persistent, as redis will block while the cache is saved to disk.
 
Do I see that right that you have just one redis?
You should have a separate redis for each dataset. Especially if the cache is persistent, as redis will block while the cache is saved to disk.
Yes, I have one redis and I use WP_REDIS_PREFIX to avoid data collisions in the multiple databases. Can you please elaborate on having a separate redis for each dataset?
 
I face issues in some servers today, big website with wp-rocket / preload

Load average down from 13 to 2 when disable preload...
 
Can you please elaborate on having a separate redis for each dataset?
The read access time on a hash table grows sublinear with size, but still grows.
Write access - to insert new values - is significantly worse. Especially when you have many workers, which you do when every site connects to the same redis.
And you usually want persistence for the session cache, but it is not necessary to save other caches to disk. So you would have at least one that regularly dumps to disk and one that doesn't.
 
Back
Top