• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Issue Xinetd service regularly failing on Obsidian / CentOS 7.8

ospsanderm

New Pleskian
Hi,

Recently we have been encountering an issue with the Xinetd service on our Plesk servers hanging.
This issue occurs spontaneously after clients have made serveral FTP connections to the vhosts on the server, which leads me to believe the issue is tied to the ProFTPD service configuration.

There is sparse logging available in the server (both the ftp logs and /var/log/messages report nothing), and the service seems to be in a fine state when examined with systemctl.
When restarting the Xinetd service, the issue resolves and the following logging is available:

Sep 30 11:32:32 dev systemd: Stopping Xinetd A Powerful Replacement For Inetd...
Sep 30 11:32:32 dev xinetd[15887]: unexpected signal: 18 (Continued) in signal pipe
Sep 30 11:32:32 dev xinetd[15887]: Exiting...
Sep 30 11:32:32 dev systemd: Stopped Xinetd A Powerful Replacement For Inetd.
Sep 30 11:32:32 dev systemd: Starting Xinetd A Powerful Replacement For Inetd...
Sep 30 11:32:32 dev systemd: Started Xinetd A Powerful Replacement For Inetd.
Sep 30 11:32:32 dev xinetd[26727]: xinetd Version 2.3.15 started with libwrap loadavg labeled-networking options compiled in.
Sep 30 11:32:32 dev xinetd[26727]: Started working: 2 available services

But i think the signal 18 is normal / expected behaviour when restarting the xinetd service (?)

The problem occurs on multiple Plesk servers, all running on CentOS 7.8 and with Plesk Obsidian (i've encountered the issue on at least 4 of our servers).
I haven't seen this issue occur before updating to Obsidian 18.0.29, and all the servers that have the problem occur are either on 18.0.29 or 18.0.30.

Has anything been changed in the Xinetd or ProFTPD configurations with the 18.0.29 update?
 
Hello!

We are constantly having to restart xinetd in one of our servers. Usually our ftp client program just displays a message to the effect of "uploading" but nothing occurs, there is no progress or error on that end.
We restart the service, cancel then restart the transfer and all works fine for a period of time. We can experience this several times a day depending on load.

A few times the server has refused the connection and we have to restart xinetd for the connection to be successful.

Our Plesk version is: Version 18.0.30 Update #2 but this has been happening since around the original upgrade to 18.0.30

We refer to @ospsanderm's technical description since we are experiencing the same issue. No need to post the same content again.
 
In our experience it is the only service we need to reload/restart to correct the issue. Some times days pass in which we have no use for FTP and we connect and transfer files using SSH. Then, when using FTP we cannot connect and the clients 'hang' while attempting to connect, they don't freeze or crash. They 'wait' for the connection to complete, they don't even time out.

That or we work successfully and at some point FTP transfers stop, never time out... they just stop working. Only solution is to restart the service and re-queue the transfers.

The clients we use are: Filezilla and PHPStorm's built-in client.

To fix we connect to the server using SSH and reload/restart the service and it all goes back to normal. I understand that is only a description of our experience has no technical details but if you have any suggestions as to what logs might be useful to share here we will post them.
 
I am not sure if the first diagnosis here is correct. You cannot really derive that there is an issue with either xinetd or proftpd from the fact that it starts working again when you kill xinetd. There could equally well be an issue with network traffic or anything else, e.g. a lack of resources.

My suggestion in this case is to investigate the issue from a different angle:

# netstat -tap
and examine what the ftp port is doing when ftp became unresponsive

# ps aux | grep ftp | grep -v grep
that gives you a list of ongoing ftp processes. From there examine what the processes are actually doing by
# strace -p <process id>
 
@Peter Debik

We experienced the issue a few minutes ago and these are the results of the commands you suggested, a bit sanitized.
(removed some lines that seemed unnecesary)
# netstat -tap
1603114581749.png

# ps aux | grep ftp | grep -v grep
(sanitized, no lines removed)
1603114605345.png


# strace -p <process id>
when I run it against the id of the connection that froze first I got:
strace: attach: ptrace(PTRACE_SEIZE, 24052): Operation not permitted

I then started trying the id's of the others I got the same result.

This morning reload was not enough. We had to restart the xinetd. I am not 100% that reload did work in the past. I will add a note so we keep track of this as well.

Thank you for the time you are putting into this. We really appreciate your interest in helping us resolve this matter.
 
Please look into /etc/xinetd.conf and check these lines:

cps = n m
instances = p
per_source = q

What is the value of n, m, p and q in your case?
 
This looks correct. I was hoping that a resource limit in xinetd would cause the problems, because in your case FTP seems to stop at exactly 10 open connections. /etc/proftpd.conf knows another "MaxInstances" parameter, but this only applies if proftp runs stand-alone, so that probably is not the case for the hick-up either.

We operate several dedicated servers with Obsidian, and we for sure have many more simultaneous FTP connections on them, so it should work in your installation, too. Of what other limits could you think on your machine? Is it a CentOS 7.8 default installation from the EPEL repository on a dedicated machine? In that case we would not need to look on the machine longer, because all the default settings will work. Or is it a virtual server? In that case maybe there are limits in your virtualization environment that your servers started hitting lately? And one more important question is whether it only occurs in connections between specific clients and the machine or all clients?
 
Plesk Lives inside of an AWS EC2 m4.xlarge instance.
It was created from a "Plesk AMI" so it was all configured as Plesk originally designed the AMI. I am sure throughout the years we have tweaked a few things to nothing specific to this and not around the time FTP started demonstrating this behavior.

I have personally experienced this while overseas through a VPN to the office on the company issued laptop. Now that I am back at the office using my workstation directly, no VPN - same issue.

As far as software I have also personally experienced the issue using filezilla, which uses multiple, concurrent connections and using phpStorm which as far as I know only uses a single connection. Same on both laptop and workstation.
 
Regarding the "operation not permitted" by strace that is strange, too. I have only seen that before when you're not logged in as root. All of our dedicated systems for customers are CentOS 7.8 and with hours on them daily I should have seen this before in some setting, but have not.

I have checked the Amazon docs on AWS EC2 m4.xlarge and cannot see any limit that could cause the symptoms.

So summing it all up from my end: I cannot solve this issue, absolutely not sure where to look further. I still think though that Plesk itself is the wrong place to look, because they don't touch OS files, neither on their initial installation, nor on updates.
 
It happened again and I found to process under the same user that first experienced it. I didn't know for how long to run the trace so here is the first part...
attached are somewhat longer versions in case they help.

id 5050
1603126211694.png
and the other id 3342
1603126274545.png
 

Attachments

  • processes-ftp-issue-2020-10-19.zip
    1.9 KB · Views: 2
It tells us two things:
a) ftp on the server is not dead, the processes are running (else you would not see repeated entries in the process trace)
b) they are waiting on something that never happens
But to me it's not really clear what they are waiting on. It is again pointing to a lack of resources of some kind. All used up by another process, now the waiting processes are waiting on their share to become available so that they can continue their work. But again, not clear what this could be.
 
Might be interesting what tcpdump port ftp (as root) sees, especially when a connection attempt is made.
 
Back
Top