• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Qmail performance problem once again...

niamoman

New Pleskian
Hello
This is well known problem on this forum, but I made any of the best suggestions (including atomicturtle) with qmail configuration and the problem still exists. I've spent many weeks to find the reason, without success.

Symtoms: While the top traffic qmail local queue increases to hundreds or even thousends messages.
In the same time mail users has problems with SMTP connection (it breaks after 60 sec)
FTP initial connection is delayed about 5 sec.
WWW work almost perfect.
Delays occur also when I'm connecting to ssh.


This is description of 1 of my machines.
I have another server with similar load and system (Debian 4, Plesk 8.6). There is no problem.

Comparison of those servers (1 is the problem server):
1. Fujitsu Siemens RX300S4, 10 GB RAM, 1 x Intel Xeon 2 GHz QuadCore, Hardware Raid 256MB SATA 3GB/s (4 x 500 GB HDDs working in RAID 10),
2. 2 x Xeon 2.8 GHz QuadCore, 4 GB RAM, 2 x 120 GB HDD ATA (!) working in md RAID 1.

And look on the "top":
1. (problem)
top - 14:33:58 up 13 days, 3:06, 1 user, load average: 2.41, 2.32, 2.83
Tasks: 159 total, 1 running, 158 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.7%us, 0.2%sy, 0.0%ni, 73.7%id, 23.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 10391136k total, 5215652k used, 5175484k free, 440552k buffers
Swap: 15623204k total, 0k used, 15623204k free, 4133648k cached

2.
top - 14:31:20 up 111 days, 8:50, 1 user, load average: 0.56, 0.95, 0.97
Tasks: 153 total, 1 running, 151 sleeping, 0 stopped, 1 zombie
Cpu(s): 10.5%us, 2.3%sy, 0.0%ni, 78.2%id, 8.8%wa, 0.2%hi, 0.1%si, 0.0%st
Mem: 4026176k total, 3650012k used, 376164k free, 332692k buffers
Swap: 1999984k total, 1816k used, 1998168k free, 2581500k cached

For this moment qmail queue works normally without increasing, but look on diffrences on Cpu stats:
1 (problem) has much of %wa (wait) and little of %us (current cpu work)
2 has litle of %wa and much of %us

And IO:
1 (problem).
saturn:~# iostat -d 2
Linux 2.6.18-6-686-bigmem 2009-02-18

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 44,11 121,35 856,27 137735782 971883544
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 11,50 84,00 260,00 168 520
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 59,70 55,72 2061,69 112 4144
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 70,65 71,64 1464,68 144 2944
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 126,37 55,72 1950,25 112 3920
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 122,89 1086,57 1918,41 2184 3856


2.
neptun:~# iostat -d 2 /dev/md1
Linux 2.6.24-etchnhalf.1-686 2009-02-18

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 151,80 119,57 282,90 1165504794 2757497272
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 170,41 138,78 1346,94 272 2640
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 193,00 260,00 1524,00 520 3048
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 125,00 330,77 957,69 688 1992
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 347,55 250,98 2764,71 512 5640
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 469,46 275,86 3716,26 560 7544

So the second one seems to work even more on only ATA disks and md raid.

What do you think, where can be the bottleneck?
 
My gut feeling here, is that this is DNS related.

first make sure that the reverse dns record for the server is valid (you may have to talk to your ISP about this)

then make sure you've got a local name server running on the box, and it is listed first in /etc/resolv.conf. As a side note there can be limitations that would prevent more than 3 dns servers from being parsed, so thats why you want your local dns server listed first.

Check the speed of the disks with:
hdparm -t <path to device>

example:

hdparm -t /dev/sda
/dev/sda:
Timing buffered disk reads: 240 MB in 3.02 seconds = 79.56 MB/sec
 
I have similar results (medium server load):

hdparm -tT /dev/sda

/dev/sda:
Timing cached reads: 9388 MB in 2.00 seconds = 4699.34 MB/sec
Timing buffered disk reads: 240 MB in 3.01 seconds = 79.65 MB/sec

Reverse dns is OK (when I query 'host myMainIpAddress' I get excactly my servername (query to ISP's ns).
This servername is also in hosts file at main IP address.

I have changed resolv.conf and local dns is on the first position now.
I will watch.

Thanks for any suggestions.
 
Now is the time when qmail is growing up:

qmail-qstat
messages in queue: 147
messages in queue but not yet preprocessed: 58

Hdd performance is much lower:

hdparm -t /dev/sda
/dev/sda:
Timing buffered disk reads: 142 MB in 3.01 seconds = 47.18 MB/sec

I even need to wait awhile to open mc.

And much of free memory:
Mem: 10391136k total, 4925720k used, 5465416k free, 342888k buffers

And CPU wait:
Cpu(s): 1.9%us, 0.3%sy, 0.0%ni, 73.3%id, 24.4%wa, 0.0%hi, 0.0%si, 0.0%st

Tasks: 327 total, 1 running, 326 sleeping, 0 stopped, 0 zombie

iostat -d 2
Linux 2.6.18-6-686-bigmem 2009-02-19
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 44,99 125,14 873,22 152894006 1066854992
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 57,50 0,00 1968,00 0 3936
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 53,96 79,21 1572,28 160 3176
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 53,50 0,00 1788,00 0 3576
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 56,22 27,86 1436,82 56 2888

What is strange for me, www works very well.

What can be the reason?
 
I thought about it again.
I have separate partitions / [main] /var /opt
Do you think it could cause bigger latency of copy/move temporrary files between /tmp and /var?
If yes, how can I change tmp dir for spamassassin (to /var/tmp for example)?
 
The main problem was in hdds slow access at some server load.
The solution was Raid firmware upgrade.

It's all ok now, Thanks
 
Back
Top