• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Issue Since mid June - Random Slow Curl Response - Unnoticed Issue For Everyone?

JohnD

New Pleskian
*** NEED HELP TESTING PLEASE ***

Around the middle of June 2019 I have suddenly had random slow PHP Curl responses on the server, something that you wouldn't notice unless you closely monitored your server.
Response times randomly go from 0.3sec to 5sec on page loads at random.

Running CentOS Linux 7.6.1810 (Core)
Plesk Onyx v17.8.11 after Update from 55 to 56
Possible around the time the Kernal updated to: 3.10.0-957.21.3.el7.x86_64


I first noticed only my big curl based cron jobs timing out and spent days trying to figure out why.
Any PHP task that contained multiple CURL requests would fail randomly and I get the same result directly in the SHELL. It seems to be caused no matter what version of PHP I select in Plesk for a website.
I blamed the VM hosting company initially as I thought the network speed had dropped off causing the issue but I ran a temporary Linux recovery boot system and tested from that and it worked just fine on the same VM.
I also noticed that it would cause a web page.. like wordpress to be slow at random but typically fast - so it could go unnoticed!! So instead of loading in 3sec it would take like 15sec. With zero server load and all resources at around 10 to 20%.

There is a way to test this and I was wondering if someone with a similar configuration could run this test and confirm that its a universal problem or if its just my server.

Run a little PHP script which will take up to 60 seconds to report back and look for *** which means there is a slow response.

HTTP Code: 200 || Response Time: 0.339ms
HTTP Code: 200 || Response Time: 0.665ms
HTTP Code: 200 || Response Time: 0.257ms
HTTP Code: 200 || Response Time: 0.643ms
HTTP Code: 200 || Response Time: 0.256ms
HTTP Code: 200 || Response Time: 5.788ms***
HTTP Code: 200 || Response Time: 0.638ms
HTTP Code: 200 || Response Time: 0.293ms
HTTP Code: 200 || Response Time: 0.258ms
HTTP Code: 200 || Response Time: 0.294ms
HTTP Code: 200 || Response Time: 0.352ms
HTTP Code: 200 || Response Time: 0.768ms
HTTP Code: 200 || Response Time: 0.255ms
HTTP Code: 200 || Response Time: 0.36ms
HTTP Code: 200 || Response Time: 0.751ms
HTTP Code: 200 || Response Time: 0.342ms
HTTP Code: 200 || Response Time: 0.634ms
HTTP Code: 200 || Response Time: 0.737ms
HTTP Code: 200 || Response Time: 0.271ms
HTTP Code: 200 || Response Time: 0.681ms
HTTP Code: 200 || Response Time: 0.35ms
HTTP Code: 200 || Response Time: 0.718ms
HTTP Code: 200 || Response Time: 0.744ms
HTTP Code: 200 || Response Time: 0.761ms
HTTP Code: 200 || Response Time: 0.761ms
HTTP Code: 200 || Response Time: 0.35ms
HTTP Code: 200 || Response Time: 0.761ms
HTTP Code: 200 || Response Time: 0.253ms
HTTP Code: 200 || Response Time: 0.364ms
HTTP Code: 200 || Response Time: 0.769ms
HTTP Code: 200 || Response Time: 0.648ms

To do this... copy the PHP code below into a file - eg: testcurl.php
Open that file on your web browser about 3 to 5 times (don't forget to give it up to 60 seconds to load each time)
Take note if you get any *** > 3 second alerts and how many you get.
Please report back if you find anything.

You can change the website it talks to in the code but I just put one in at random that I know is stable.

PHP:
<?php

for ($i = 0; $i <=30; $i++)
{
$curlcheck = get_url_contents('https://blog.serverdensity.com/80-linux-monitoring-tools-know/');
}

function get_url_contents($url){
$execute="";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL,$url);
  curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
$execute = curl_exec($ch);
$total_time = "";
$httpcode = "";
   if(!curl_errno($ch))
     {
        $httpcode =  curl_getinfo($ch, CURLINFO_HTTP_CODE);
        $total_time = curl_getinfo($ch, CURLINFO_TOTAL_TIME);
        echo 'HTTP Code: '.$httpcode.' || Response Time: ' . round($total_time,3).'ms';
     if ($total_time>3) {echo '***<br>';} else {echo '<br>';};
        clearstatcache();
    };
curl_close($ch);

return $curlcheck;
}

?>

Below are some graphs from my servers own monitoring system that shows the issue..
30 Day Graph:
z_perf_issue3-30day-local-site2.jpg

and below the 3hr graph:

z_perf_issue1-3hr-local-site1.jpg
 
Last edited:
Interesting case!
I ran your script on a test server with a default Wordpress installation and so far I was not able to reproduce it.
OS: CentOS 7.6
Plesk: Plesk Onyx 17.8.11 Update #59
Kernel: 3.10.0-957.21.3.el7.x86_64
cURL: curl-7.29.0-51.el7.x86_64
nginx: sw-nginx-1.14.2.1-centos7.19061316.x86_64
PHP: 7.2.19

Maybe a DNS problem? You could run tcpdump and check if you see any unanswered/re-sent DNS queries while your script is running maybe....
 
Interesting case!
I ran your script on a test server with a default Wordpress installation and so far I was not able to reproduce it.
OS: CentOS 7.6
Plesk: Plesk Onyx 17.8.11 Update #59
Kernel: 3.10.0-957.21.3.el7.x86_64
cURL: curl-7.29.0-51.el7.x86_64
nginx: sw-nginx-1.14.2.1-centos7.19061316.x86_64
PHP: 7.2.19

Maybe a DNS problem? You could run tcpdump and check if you see any unanswered/re-sent DNS queries while your script is running maybe....

Thanks for trying.. Ill try the tcpdump and see what I get. Good idea.
 
Update: I was indeed able to reproduce it once (5.52 seconds) while running the PHP script multiple times and while tracing the PHP-FPM process with "strace" and indeed, the reason for the delay was an unanswered DNS lookup that had to be re-sent after 5 seconds. Our resolvers use rate-limiting and running the script multiple times triggered the rate-limit, that's why the DNS query was unanswered.

So maybe you're experiencing something similar?
 
I captured the traffic.. didn't see any issues but going to re-test - maybe fluke.
I also changed the DNS server from the local one at the host provider to 1.1.1.1 and 8.8.8.8 but same issue.
Thing is... this was never an issue before the middle of June.
And the same cron job that keeps not completing in 60 seconds has been running for years and usually completes in under 15 seconds.
All those curl requests are local to the VM and not even outside the server.
And the monitoring software which uses curl shows the same issue and that is just a once off random check 1x per minute across a few websites.
Attached is a 30min chart.. which used to be a nice flat line.
z_perf_30min.jpg
 
Right... I added that test address to /etc/hosts
And boom no issue when testing. So it must be a DNS thing - as you said!
However adding a local hosting DNS IP and the external DNS like 1.1.1.1 still give same issue.. :mad:
 
Also noticed some 'bad checksums' in most of the lookups. (names changed to protect innocent)

myserver.com.54080 > rec1.svc.1u1.uk.domain: [bad udp cksum 0xb333 -> 0x25ba!] 9779+ A? www.test.org. (30)
14:45:02.688924 IP (tos 0x0, ttl 64, id 57334, offset 0, flags [DF], proto UDP (17), length 58)
myserver.com.54080 > rec1.svc.1u1.uk.domain: [bad udp cksum 0xb333 -> 0xdb95!] 28732+ AAAA? www.test.org. (30)
14:45:02.696213 IP (tos 0x0, ttl 58, id 11996, offset 0, flags [DF], proto UDP (17), length 90)
rec1.svc.1u1.uk.domain > myserver.com.54080: [udp sum ok] 9779 q: A? www.test.org. 2/0/0 www.test.org. A 104.24.6.72, www.test.org. A 104.24.7.72 (62)
14:45:02.698010 IP (tos 0x0, ttl 58, id 11998, offset 0, flags [DF], proto UDP (17), length 120)
 
Back
Top