• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Postfix still sending hundreds 'queue file write error'

# df -h
(output reduced)
tmpfs 1.0G 137M 888M 14% /usr/local/psa/handlers/spool

# cd /usr/local/psa/handlers/spool
# ls -l | wc -l
71
(there are 71 items spooled)
[root@myserver spool]# ls -lh | sort -rn
total 137M
-rw------- 1 mhandlers-user popuser 8.0K Mar 3 21:21 proxy.WJTZGI
-rw------- 1 mhandlers-user popuser 8.0K Mar 3 20:11 proxy.4UHBEl

There seems to be a build up of repeated messages, which are not forwarded to the mailboxes (or rejected). They just sit in this spool, gobbling up memory. However, in the Plesk control panel, there are only 4 entries:

just to confirm, same behavior on my Server the spool has ~150 mails in, most of them are the same mail again and again over some weeks now. It seems they are circulating in the spoolfile and the header date is updated. I believe it is related as I wrote above on the hanging postfixe-queue processes and I found the mail which are stucking in the spool in the maillogfile freqently every 80minutes and with matching timestamp with log entry like:
before-remote[5413]: check handlers for addr: [email protected]
before-queue[5412]: check handlers for addr: [email protected]

The mail queue in plesk looks like:
Messages in queue: 1 deferred, 0 held, 0 incoming, 149 active, 0 corrupted, 150 total.
but in the listing only one mail is visible
1-1 messages of 150 found (150 total)
Realy strange and just show how broken the postfix integration is!

Brujo
 
Guys,

This problem Postfix "queue file write error" is really became priority for us and I would like to receive the following information from you:

Finally!

1. Version of Plesk and OS. Specify, if it is Virtuozzo container.

Debian 5.0 / Plesk 9.3 / No Virtuozzo

2. What of the published Postfix patches have been installed.

Debian 5.0 default Postfix (upgraded to latest version)

3. Do you have enabled any spam protections.

SpamAssassin (default configuration)
 
just to confirm, same behavior on my Server the spool has ~150 mails in, most of them are the same mail again and again over some weeks now. It seems they are circulating in the spoolfile and the header date is updated. I believe it is related as I wrote above on the hanging postfixe-queue processes and I found the mail which are stucking in the spool in the maillogfile freqently every 80minutes and with matching timestamp with log entry like:


The mail queue in plesk looks like: but in the listing only one mail is visible Realy strange and just show how broken the postfix integration is!

Brujo

It gets worse. I've been digging around on the filesystem a bit more. The files listed in Plesk appear to be headers only, stored at /var/spool/postfix/defer/ , whereas the emails which are actually deferred are stored in /var/spool/postfix/deferred/.
 
Plesk 9.3 on CentOS 5.4 queue file write error

Guys,

This problem Postfix "queue file write error" is really became priority for us and I would like to receive the following information from you:

1. Version of Plesk and OS. Specify, if it is Virtuozzo container.
2. What of the published Postfix patches have been installed.
3. Do you have enabled any spam protections.
4. Conditions for occurrence of this error that you have noticed.
5. Your any additional information which you considers necessary to inform about this problem.

Please answer strictly on these points, shortly and with necessary details.
On the basis of this collected information I will prepare the analytical report for developers.
Thank you for cooperation.

Hi. I've got the same problem with queue file write error
1. Plesk 9.3 installed 2 or 3 weeks ago. So it is fresh install
2. None. I have read on the forum that this patches do not work
3. Yes
4. I have configured two frontend email server that they are doing first step of spam and virus control, they make queue buffer and greylist. After that "clean" emails are rerouted to plesk for user spam protection and another virus control. I have noticed that fronts have some emails in queue with this error. First of all I have noticed that this are emails for non existing box. But today I have an email with attachment about 30k, for a box that exists and can not be delivered.
5. If You can specify what else do You need.
 
... Postfix "queue file write error...
Version of Plesk: 9.3.0
OS: CentOS 5.4 Linux 2.6.18-164.11.1.el5
Virtuozzo container: No
Do you have enabled any spam protections? Yes. DomainKeys, SPF, DNSBL service (zen.spamhaus.org;cbl.abuseat.org;bl.spamcop.net;blackholes.mail-abuse.org;dnsbl.sorbs.net)

I attached some Error logs.

Ioan
 

Attachments

  • Errors.txt
    3.3 KB · Views: 7
1. Version of Plesk and OS. Specify, if it is Virtuozzo container.

Plesk 9.3.0 (+all official hotfixes/updates), CentOS 5.4 (entirely up-to-date), no Virtuozzo.
Postfix is version postfix-2.3.3-2.1.el5_2

2. What of the published Postfix patches have been installed.

None as they don't seem to fix the problem.

3. Do you have enabled any spam protections.

None. No SpamAssassin, no anti-virus, no domain keys, no SPF, no DNSBL.

4. Conditions for occurrence of this error that you have noticed.

I don't know.

5. Your any additional information which you considers necessary to inform about this problem.

Well, I notices something that is very strange. The /usr/local/psa/var/log/maillog file did show some weird things, and when reading back this topic, I saw some strange things in some other peoples' logs too.

Watch this:

$ fgrep 'postfix/smtpd' /usr/local/psa/var/log/maillog
Mar 8 04:13:06 shared01 postfix/smtpd[19571]: connect from unknown[xxx.xxx.xxx.xxx]
Mar 8 04:13:08 shared01 postfix/smtpd[19581]: connect from unknown[127.0.0.1]
Mar 8 03:13:08 shared01 postfix/smtpd[19571]: NOQUEUE: client=unknown[xxx.xxx.xxx.xxx]
Mar 8 03:13:08 shared01 postfix/smtpd[19581]: E838A768E4A: client=unknown[xxx.xxx.xxx.xxx]
Mar 8 03:13:10 shared01 postfix/smtpd[19581]: disconnect from unknown[127.0.0.1]
Mar 8 03:13:11 shared01 postfix/smtpd[19571]: disconnect from unknown[xxx.xxx.xxx.xxx]
Mar 8 03:14:28 shared01 postfix/smtpd[19571]: connect from unknown[xxx.xxx.xxx.xxx]
Mar 8 03:14:28 shared01 postfix/smtpd[19581]: connect from unknown[127.0.0.1]
Mar 8 03:14:29 shared01 postfix/smtpd[19571]: NOQUEUE: client=unknown[xxx.xxx.xxx.xxx]
Mar 8 03:14:29 shared01 postfix/smtpd[19581]: 05DA3768E4A: client=unknown[xxx.xxx.xxx.xxx]
Mar 8 03:14:29 shared01 postfix/smtpd[19571]: disconnect from unknown[xxx.xxx.xxx.xxx]
Mar 8 03:14:29 shared01 postfix/smtpd[19581]: disconnect from unknown[127.0.0.1]
Mar 8 04:26:52 shared01 postfix/smtpd[20139]: connect from unknown[xxx.xxx.xxx.xxx]
Mar 8 04:26:54 shared01 postfix/smtpd[20143]: connect from unknown[xxx.xxx.xxx.xxx]
Mar 8 04:26:55 shared01 postfix/smtpd[20148]: connect from unknown[127.0.0.1]
Mar 8 03:26:55 shared01 postfix/smtpd[20139]: NOQUEUE: client=unknown[xxx.xxx.xxx.xxx]
Mar 8 03:26:55 shared01 postfix/smtpd[20148]: 14526768E4E: client=unknown[xxx.xxx.xxx.xxx]
Mar 8 03:26:56 shared01 postfix/smtpd[20143]: lost connection after CONNECT from unknown[xxx.xxx.xxx.xxx]

The strange bit is the apparently random timestamp that jumps back and forth exactly an hour. I have disabled the NTP services, I think it's not related to actual time synchronization, but I'm not sure.

Could it be that Postfix ****s up a connection when it thinks an hour has passed? I find this hard to believe, but I think it may be something wothwile investigating...
It only seems to be with lines from postfix/smtpd, all other lines seem to be ok.

What is this? Time zone? (CET, GMT+1) Daylight savings time? Or is it just a coïncidence?
First message seems to jump 8 hours back in time. Is that customer in timezone GMT+8?

Please read back all posts containing error logs. We're all talking about the error messages here, not about the timestamps, which do appear messed up.
 
So... IgorG, please can you report back as to what the developers' initial thoughts are?

Does the information provided actually help in any way? If so, how?
 
This is worrying.

I see this huge thread with the queue file write error. I also see a lots of stuff on Plesk 9.2. And with a patch which perhaps doesn't work? And in any case would presumably not work with 9.3?

I upgraded two servers from 8.6 to 9.3 a few weeks ago. Both have this problem.

I see Postfix error messages on the lines of "Postfix SMTP server: errors from unknown[8x.14x.23x.xxx]". When I look in the logs I see this kind of thing:

Mar 11 04:43:30 my-mail-server before-remote[25025]: check handlers for addr: [email protected]
Mar 11 04:43:30 my-mail-server before-remote[25025]: check handlers for addr: [email protected]
Mar 11 04:43:30 my-mail-server before-remote[25025]: Unprocessed command: >
Mar 11 04:43:30 my-mail-server before-remote[25025]: errno: Broken pipe
Mar 11 04:43:30 my-mail-server before-remote[25025]: System error: Broken pipe
Mar 11 04:43:30 my-mail-server before-remote[25025]: Unable to write data into incoming socket
Mar 11 04:43:30 my-mail-server before-remote[25025]: Unable to translate command answer: > >>> Bye for now=2C love Ann xx


Surely after all this time there must be a fix for this? (Perhaps there is but there is so MUCH discussion of the problem that I can't see it?)

(It looks to me that Postfix is getting confused by command characters in the email message body. But surely Postfix ITSELF can't be that easily compromised? It's supposed to be so reliable and secure! So does this mean that Plesk have messed with it? If so, why? And why is this still an ongoing problem?)
 
Patch for fixing "Unprocessed command" problem has been published in Workarounds long time ago.
 
Patch for fixing "Unprocessed command" problem has been published in Workarounds long time ago.

Yes, and so has this issue been around for a long, long time ....

Please, we still await any update from the developers as to their thoughts. Why are they not contributing to this critical thread? The number of errors we are experiencing is increasing as each day passes, and as each day passes we hear nothing from these developers.

Ignoring customer requests for information is just a slap in the face; it adds insult to injury and really doesn't win any love or loyalty. Our requests are to help YOU fix YOUR product. If you're not going to fix it, why not release the source code so that some people can?

This whole charade is a perfect example of why proprietary/closed source systems are BAD. Prove me wrong.
 
Believe me, I did all what I can.
The problem is really considered with highest priority and developers make huge efforts for its solution.
 
Sorry Igor, it's not a personal attack on you - your job is probably the most difficult of all ;)

I appreciate you (or another moderator) approving my rants!

The thing is, there is something fundamentally wrong with Parallels' customer support process if a bug this serious is not escalated sufficiently within the development team.

You have many sysadmins contributing their system information to this list, and I'm sure pretty-much everyone here would be happy to investigate further, IF given some direction from your developers.

But what seems to be happening is that we get assurances from you that this is being investigated, probably because you are getting assurances from others, and yet they are not turning around the work. I think we would all be able to resolve this issue if, instead of buffering communication between developers and users, we could interface directly with them.

What is it that they're investigating? Can we help? We have the broken systems, so surely we are in the best position to help, and from my perspective I'm more than willing to help resolve this openly.

Is this website the right medium? How about a private IRC channel that a selected group of system administrators and developers can have an open conference on? Sorry to quote that old cliche but let's start thinking "outside the box" and remove all these barriers to a successful resolution.
 
Hello,

just to inform you that "Unprocessed command" messages are back again:

Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: ^Iby
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: return-path:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: ^Iby
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: ^Itue,
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: ^Iby
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: ^Itue,
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: ^Iid
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: message-id:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: to:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: ^I<35028c7855d540c88c021d5a6364d29c@usppd>
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: mime-version:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: x-msmail-priority:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: x-mimeole:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: x-beenthere:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: precedence:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: list-id:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: ^I<mailto:[email protected]?subject=unsubscribe>
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: list-post:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: list-subscribe:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: content-type:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command: sender:
Mar 23 15:10:06 web before-remote[4473]: Unprocessed command:

web:/var/log# cat /opt/psa/version
9.3.0 Debian 4.0 93091230.06
web:/var/log# postconf smtpd_proxy_timeout
smtpd_proxy_timeout = 3600
web:/var/log# postconf smtpd_timeout
smtpd_timeout = 3600
web:/var/log#

Any suggestion/resolution (if exists) would be really appreciated.

Best regards,

F.
 
Qmail much more stable MTA than Postfix in my opinion.
Probably both have pluses and minuses.
Qmail is very old MTA (11 years now) with very large bunch of patches - I'm afraid not all with original quality.
For me Qmail have nearly useless log files (very loose connections between "received" and "delivered") and this is reason why I migrate to Postfix as fast as I can.
I think that giving something other than Qmail is very good step.
I understand "we always supports qmail and this works" thinking, but I'm afraid than "migrate back" is not solution.

On other hand much of "queue file write error" you find for amavisd and Plesk integration.
So I think even if this is Postfix problem (I do not thin so) then Plesk developers should just contribute fix for this.
Or - if identifying problem is hard - should give possibilities for others (i.e. Postfix developers and maybe us, users) to check as much as we can.

If Postfix problem will not have solution in near future I think you should just tell us that straight.
Then probably I'll try to skip as much as I can of Plesk-Postfix binary integration, but use Plesk SQLlite/MySQL databases and do same "virtual delivery" myself - with Postfix or Exim (that can be easier because of Exim's internal integration with both DrWeb and SpamAssassin and good support of both databases).
 
Patch in final testing stage now and fix should be included to Plesk 9.5.0 which will be release in April.
 
workaround

1. Plesk 9.3.0, Debian 5.0 x86-64, not Virtuozzo
2. Postfix patch for Debian 5 64bit from KB 6074
3. No spam protection (all handled by edge servers)
4. Seems random
5. none

Used hotfix http://kb.odin.com/en/6074 , and sure enough, unprocessed command errors where gone. The ~10 per minute 'unprocessed command' errors were replaced with ~0,5 per minute 'Some error occurred' errors.

Are the chmod commands in the hotfix KB article correct (for all distributions)? Mine were 550 mhandlers-user : root pre-hotfix patch, should they be mhandlers-user : popuser post-hotfix?

Also, post-hotfix we are seeing an increasing number of postfix spawn processes, like so:
postfix 30726 0.0 0.1 38892 2192 ? S 10:23 0:00 spawn -n 127.0.0.1:10027 -t inet user=mhandlers-user argv=/usr/lib/plesk-9.0/postfix-queue 127.0.0.1 10026 before-remote

All in all pretty disappointed with the postfix support at this point.

edit for workarounds:
a few workarounds can make this whole bug a lot less intrusive for clients (keep in mind: the workarounds provided in this thread and in this post are not solutions, just ways to decrease the impact of this problem by stretching timeouts etc);
- if you have edge mail gateways on postfix: use the main.cf smtp_never_send_ehlo line to dumb down connections. For some reason this reduces the timeout errors on the plesk postfix server.
- smtpd_timeout and smtpd_proxy_timeout should be set to something rediculous like 1200s

edit2: workaround:
- disable the whole mess by changing the master.cf, as detailed here: http://se.id.au/journal/files/plesk_postfix_queue_file_write_error.php
this will probably disable all plesk spamfiltering functions etc, so this is only usable if you have edge servers (and have disabled direct smtp to the server.
 
Last edited by a moderator:
Hi IgorG,

Thanks for the update.
Could you please provide us with some hints as of what went wrong in the software? Were it indeed timeout-related errors, or was this all related to the time(zone?)-jumping I mentioned? Or something completely else?
 
Back
Top