Hello
This is well known problem on this forum, but I made any of the best suggestions (including atomicturtle) with qmail configuration and the problem still exists. I've spent many weeks to find the reason, without success.
Symtoms: While the top traffic qmail local queue increases to hundreds or even thousends messages.
In the same time mail users has problems with SMTP connection (it breaks after 60 sec)
FTP initial connection is delayed about 5 sec.
WWW work almost perfect.
Delays occur also when I'm connecting to ssh.
This is description of 1 of my machines.
I have another server with similar load and system (Debian 4, Plesk 8.6). There is no problem.
Comparison of those servers (1 is the problem server):
1. Fujitsu Siemens RX300S4, 10 GB RAM, 1 x Intel Xeon 2 GHz QuadCore, Hardware Raid 256MB SATA 3GB/s (4 x 500 GB HDDs working in RAID 10),
2. 2 x Xeon 2.8 GHz QuadCore, 4 GB RAM, 2 x 120 GB HDD ATA (!) working in md RAID 1.
And look on the "top":
1. (problem)
top - 14:33:58 up 13 days, 3:06, 1 user, load average: 2.41, 2.32, 2.83
Tasks: 159 total, 1 running, 158 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.7%us, 0.2%sy, 0.0%ni, 73.7%id, 23.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 10391136k total, 5215652k used, 5175484k free, 440552k buffers
Swap: 15623204k total, 0k used, 15623204k free, 4133648k cached
2.
top - 14:31:20 up 111 days, 8:50, 1 user, load average: 0.56, 0.95, 0.97
Tasks: 153 total, 1 running, 151 sleeping, 0 stopped, 1 zombie
Cpu(s): 10.5%us, 2.3%sy, 0.0%ni, 78.2%id, 8.8%wa, 0.2%hi, 0.1%si, 0.0%st
Mem: 4026176k total, 3650012k used, 376164k free, 332692k buffers
Swap: 1999984k total, 1816k used, 1998168k free, 2581500k cached
For this moment qmail queue works normally without increasing, but look on diffrences on Cpu stats:
1 (problem) has much of %wa (wait) and little of %us (current cpu work)
2 has litle of %wa and much of %us
And IO:
1 (problem).
saturn:~# iostat -d 2
Linux 2.6.18-6-686-bigmem 2009-02-18
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 44,11 121,35 856,27 137735782 971883544
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 11,50 84,00 260,00 168 520
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 59,70 55,72 2061,69 112 4144
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 70,65 71,64 1464,68 144 2944
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 126,37 55,72 1950,25 112 3920
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 122,89 1086,57 1918,41 2184 3856
2.
neptun:~# iostat -d 2 /dev/md1
Linux 2.6.24-etchnhalf.1-686 2009-02-18
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 151,80 119,57 282,90 1165504794 2757497272
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 170,41 138,78 1346,94 272 2640
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 193,00 260,00 1524,00 520 3048
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 125,00 330,77 957,69 688 1992
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 347,55 250,98 2764,71 512 5640
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 469,46 275,86 3716,26 560 7544
So the second one seems to work even more on only ATA disks and md raid.
What do you think, where can be the bottleneck?
This is well known problem on this forum, but I made any of the best suggestions (including atomicturtle) with qmail configuration and the problem still exists. I've spent many weeks to find the reason, without success.
Symtoms: While the top traffic qmail local queue increases to hundreds or even thousends messages.
In the same time mail users has problems with SMTP connection (it breaks after 60 sec)
FTP initial connection is delayed about 5 sec.
WWW work almost perfect.
Delays occur also when I'm connecting to ssh.
This is description of 1 of my machines.
I have another server with similar load and system (Debian 4, Plesk 8.6). There is no problem.
Comparison of those servers (1 is the problem server):
1. Fujitsu Siemens RX300S4, 10 GB RAM, 1 x Intel Xeon 2 GHz QuadCore, Hardware Raid 256MB SATA 3GB/s (4 x 500 GB HDDs working in RAID 10),
2. 2 x Xeon 2.8 GHz QuadCore, 4 GB RAM, 2 x 120 GB HDD ATA (!) working in md RAID 1.
And look on the "top":
1. (problem)
top - 14:33:58 up 13 days, 3:06, 1 user, load average: 2.41, 2.32, 2.83
Tasks: 159 total, 1 running, 158 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.7%us, 0.2%sy, 0.0%ni, 73.7%id, 23.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 10391136k total, 5215652k used, 5175484k free, 440552k buffers
Swap: 15623204k total, 0k used, 15623204k free, 4133648k cached
2.
top - 14:31:20 up 111 days, 8:50, 1 user, load average: 0.56, 0.95, 0.97
Tasks: 153 total, 1 running, 151 sleeping, 0 stopped, 1 zombie
Cpu(s): 10.5%us, 2.3%sy, 0.0%ni, 78.2%id, 8.8%wa, 0.2%hi, 0.1%si, 0.0%st
Mem: 4026176k total, 3650012k used, 376164k free, 332692k buffers
Swap: 1999984k total, 1816k used, 1998168k free, 2581500k cached
For this moment qmail queue works normally without increasing, but look on diffrences on Cpu stats:
1 (problem) has much of %wa (wait) and little of %us (current cpu work)
2 has litle of %wa and much of %us
And IO:
1 (problem).
saturn:~# iostat -d 2
Linux 2.6.18-6-686-bigmem 2009-02-18
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 44,11 121,35 856,27 137735782 971883544
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 11,50 84,00 260,00 168 520
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 59,70 55,72 2061,69 112 4144
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 70,65 71,64 1464,68 144 2944
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 126,37 55,72 1950,25 112 3920
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 122,89 1086,57 1918,41 2184 3856
2.
neptun:~# iostat -d 2 /dev/md1
Linux 2.6.24-etchnhalf.1-686 2009-02-18
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 151,80 119,57 282,90 1165504794 2757497272
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 170,41 138,78 1346,94 272 2640
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 193,00 260,00 1524,00 520 3048
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 125,00 330,77 957,69 688 1992
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 347,55 250,98 2764,71 512 5640
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
md1 469,46 275,86 3716,26 560 7544
So the second one seems to work even more on only ATA disks and md raid.
What do you think, where can be the bottleneck?