Issue Huge performance issues

Tassos Voulgaris · Nov 16, 2023

I have comissioned a very capable server in order to host 40 Wordpress & Woocommerce sites using Plesk Obsidian Web Host Edition Version 18.0.56 Update #4. I am experienging dissapointing performance not only on my sites, but on Plesk as well. I will try to provide as much information I can, so maybe a sysadmin much more capable than me can provide some insights:

1. Server Info

CPU	Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz (12 core(s))
Version	Plesk Obsidian v18.0.56_build1800231106.15 os_Debian 11.0
OS	Debian 11.8
RAM	256 GB

2. Websites Info

37 Woocommerce sites
3 Wordpress site
Optimized with page caching (Wp-Rocket) and Object Caching (Redis). Optimized images
Low traffic in general. Average 60 simultaneous connections. Spikes sometimes up to 300 simultaneous connections.

3. Plesk

All sites are running on NGINX alone (proxy mode is not checked)
21 sites are running as FPM Application and 19 as Dedicated FPM Application
All sites are running on PHP 8.2
For all sites Performance Booster options are enabled

4. MariaDB

I am trying to optimize MariaDB using Releem.
Current MariaDB settings are:

Code:

+-------------------------------------+----------------+
| Parameter                           | Value          |
+-------------------------------------+----------------+
| none                                | none           |
| innodb_change_buffer_max_size        | 25             |
| innodb_adaptive_flushing_lwm         | 25.000000      |
| innodb_max_dirty_pages_pct           | 70.000000      |
| innodb_autoextend_increment          | 48             |
| thread_stack                        | 524288         |
| transaction_prealloc_size           | 8192           |
| thread_cache_size                   | 256            |
| max_connections                     | 1000           |
| query_cache_type                    | 1              |
| query_cache_size                    | 134217728      |
| query_cache_limit                   | 33554432       |
| query_cache_min_res_unit            | 4096           |
| key_buffer_size                     | 8388608        |
| max_heap_table_size                  | 16777216       |
| tmp_table_size                       | 16777216       |
| innodb_buffer_pool_instances        | 1              |
| innodb_buffer_pool_size              | 10066329600    |
| innodb_log_file_size                 | 1354760192     |
| innodb_file_per_table                | 1              |
| sort_buffer_size                     | 2097152        |
| read_rnd_buffer_size                 | 262144         |
| bulk_insert_buffer_size              | 8388608        |
| myisam_sort_buffer_size              | 134216704      |
| innodb_page_cleaners                 | 1              |
| innodb_buffer_pool_chunk_size        | 134217728      |
| join_buffer_size                     | 262144         |
| table_open_cache                     | 2000           |
| table_definition_cache               | 400            |
| innodb_flush_log_at_trx_commit       | 2              |
| innodb_log_buffer_size               | 16777216       |
| innodb_write_io_threads              | 4              |
| innodb_read_io_threads               | 4              |
| innodb_flush_method                  | fsync          |
| innodb_thread_concurrency            | 0              |
| optimizer_search_depth               | 62             |
| innodb_purge_threads                 | 4              |
| thread_handling                      | one-thread-per-connection |
| thread_pool_size                     | 12             |
+-------------------------------------+----------------+

5. NGINX

I have tried to tweak NGINX configuration with the following values:

Code:

#user  nginx;
worker_processes  auto;

#error_log  /var/log/nginx/error.log;
#error_log  /var/log/nginx/error.log  notice;
#error_log  /var/log/nginx/error.log  info;

#pid        /var/run/nginx.pid;

include /etc/nginx/modules.conf.d/*.conf;

events {
    worker_connections  1024;
}


http {
    include       mime.types;
    default_type  application/octet-stream;

    #log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
    #                  '$status $body_bytes_sent "$http_referer" '
    #                  '"$http_user_agent" "$http_x_forwarded_for"';

    #access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;
    #tcp_nodelay        on;

    #gzip  on;
    #gzip_disable "MSIE [1-6]\.(?!.*SV1)";

    server_tokens off;
client_body_buffer_size 10m;
client_max_body_size 100m;
client_body_timeout 60s;
open_file_cache max=1024 inactive=10s;
open_file_cache_valid 60s;
open_file_cache_min_uses 2;
open_file_cache_errors on;


    include /etc/nginx/conf.d/*.conf;
}

# override global parameters e.g. worker_rlimit_nofile
include /etc/nginx/*global_params;

6. Some typical outputs of top, atop, htop

I would be happy to provide any other information you think it will be helpfull and I would be gracefull for any feedback you can give me.

Peter Debik · Nov 16, 2023

The number of php-fpm processes is way too high, and some of them are also under heavy load. There can be a number of reasons. One major reason are bad bots that are hitting a specific site with lots of traffic. This can be seen in the access_ssl_log of the affected site. The site(s) can be determined from the users who are mentioned as owners of the php-fpm processes. The solution is to block the bots, e.g. by using suitable Fail2Ban jails.

I'd also not trust WP-Rocket too much, because it has the habit to visit your own sites frequently to update its cache. I've seen many cases where WP-Rocket caused so much traffic that it became the reason why the website was slow while without it it was working like a charm. If you see WP Rocket visiting your site(s) often in your access_ssl_log, you've found at least one issue.

The innodb_buffer_pool_size variable value of MariaDB might be too large. You have it set to 10 GB. What will happen there is that for each access to that pool, a lot of data needs to be read or re-arranged. The larger the pool, the more transactions and data shuffling is needed. At some point, this does not speed up access to information stored in the database, but slows it down, because all the pool processing actions take much longer than if you'd just directly access tables. I do not know of course, if this is the case for your server.

Finally, there could be a very simple issue: When all your websites are Woocommerce sites and get a lot of traffic, they will generate a lot of cpu load altogether.

Ohya · Nov 16, 2023

I will add to Peter answer than you need check access log

You can add server-status and check what happen there.

Tassos Voulgaris · Nov 20, 2023

Hello and thank you so much for your valueble insights. Sorry for the late reply but I was down with corona.
I am trying to evaluate and apply your recommendations I will come back with an update

Tassos Voulgaris · Nov 20, 2023

One question. Which approach is better? Should I have all sites using the FPM application served by NGINX, or have the most visited sites setup using Dedicated FPM application served by NGINX?

Tassos Voulgaris · Nov 20, 2023

I wrote a bash script to help me automate the analysis of my logs. Here it is:

Code:

#!/bin/bash

LOG_PATH="/var/www/vhosts/*/logs/access_ssl_log.processed.1.gz"

for logfile in $LOG_PATH; do
    echo "Analyzing $logfile"

   echo "IP addresses"
   zcat "$logfile" | awk '{print $1}' | sort | uniq -c | sort -nr | head
   echo "404"   
   zcat "$logfile" | awk '$9 == 404'  | awk '{print $7}' | sort | uniq -c | sort -nr | head
   echo "common user agents"
   zcat "$logfile" | awk -F\" '{print $6}' | sort | uniq -c | sort -nr | head
   echo "common urls"
   zcat "$logfile" |  awk '{print $7}' | sort | uniq -c | sort -nr | head
   echo "post requests"
   zcat "$logfile" |  awk '$6 == "POST" {print $7}' | sort | uniq -c | sort -nr
done

And here is a summary of my findings:

Some sites have huge traffic from Amazonbot
Substancial traffic from Plesk screenshot bot on all sites
WP Rocket/Preload indeed adds high traffic (but doesnt it help with the result of caching pages?)
/.well-known/traffic-advice is the most common 404

Ohya · Nov 20, 2023

Wp-rocket preload is a problem some times, so if you have a lot of wordpress and preload just start scan your site, its add a lot of load.

May help, yes, but if you run a preload every day and clean cache every day not help to much, try run preload at 3am for example in all sites and not clean cache each day, just clean it if something change.

tanasis · Nov 20, 2023

I think your problem is from the bots. I have same problem with AS32934 (Facebook). The only way is Cloudflare and block the bots.

mow · Nov 21, 2023

Tassos Voulgaris said:
Optimized with page caching (Wp-Rocket) and Object Caching (Redis). Optimized images

Do I see that right that you have just one redis?
You should have a separate redis for each dataset. Especially if the cache is persistent, as redis will block while the cache is saved to disk.

Tassos Voulgaris · Nov 21, 2023

mow said:
Do I see that right that you have just one redis?
You should have a separate redis for each dataset. Especially if the cache is persistent, as redis will block while the cache is saved to disk.

Yes, I have one redis and I use WP_REDIS_PREFIX to avoid data collisions in the multiple databases. Can you please elaborate on having a separate redis for each dataset?

Ohya · Nov 21, 2023

I face issues in some servers today, big website with wp-rocket / preload

Load average down from 13 to 2 when disable preload...

mow · Nov 22, 2023

Tassos Voulgaris said:
Can you please elaborate on having a separate redis for each dataset?

The read access time on a hash table grows sublinear with size, but still grows.
Write access - to insert new values - is significantly worse. Especially when you have many workers, which you do when every site connects to the same redis.
And you usually want persistence for the session cache, but it is not necessary to save other caches to disk. So you would have at least one that regularly dumps to disk and one that doesn't.

Issue Huge performance issues

Tassos Voulgaris

New Pleskian

Peter Debik

Community Manager until 3/2024

Ohya

Basic Pleskian

Tassos Voulgaris

New Pleskian

Tassos Voulgaris

New Pleskian

Tassos Voulgaris

New Pleskian

Ohya

Basic Pleskian

tanasis

Regular Pleskian

mow

Silver Pleskian

Tassos Voulgaris

New Pleskian

Ohya

Basic Pleskian

mow

Silver Pleskian

Similar threads