• If you are still using CentOS 7.9, it's time to convert to Alma 8 with the free centos2alma tool by Plesk or Plesk Migrator. Please let us know your experiences or concerns in this thread:
    CentOS2Alma discussion

Issue remove-expired-tokens.php process continuously running, not hanging but not completing either

Bitpalast

Plesk addicted!
Plesk Guru
On almost all of our Onyx machines (Centos 7.8) last night at midnight this process started:

/usr/local/psa/admin/plib/modules/letsencrypt/scripts/remove-expired-tokens.php

It is running continously ever since. According to strace, the process is also continuously doing something. But we've never before seen a situation where this or a similar process is running for >8 hours. The process is listed in crontab as
Code:
/usr/local/psa/admin/bin/php -dauto_prepend_file=sdk.php '/usr/local/psa/admin/plib/modules/letsencrypt/scripts/remove-expired-tokens.php'

Why is it taking so long to complete on so many machines this time?
 
Now running for almost 24 hours and still no sign of stopping. It's creating and removing temporary directories all the time. The total net cpu runtime is approx. at 70 minutes now on all systems, 12-core machines.

excerpt from strace:

Code:
...
wait4(26585, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 26585
stat("/usr/local/psa/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=5234688, ...}) = 0
mkdir("/usr/local/psa/tmp/agent51931d0bb74ae3f82b1f51fd166de260", 0777) = 0
stat("/usr/local/psa/tmp/agent51931d0bb74ae3f82b1f51fd166de260", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
mkdir("/usr/local/psa/tmp/agent51931d0bb74ae3f82b1f51fd166de260/.log", 0777) = 0
pipe([7, 8])                            = 0
pipe([9, 10])                           = 0
pipe([11, 12])                          = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f68da566bd0) = 26586
close(8)                                = 0
fstat(7, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fcntl(7, F_SETFD, FD_CLOEXEC)           = 0
close(10)                               = 0
fstat(9, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fcntl(9, F_SETFD, FD_CLOEXEC)           = 0
close(11)                               = 0
fstat(12, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fcntl(12, F_SETFD, FD_CLOEXEC)          = 0
fcntl(7, F_GETFL)                       = 0 (flags O_RDONLY)
fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
fcntl(9, F_GETFL)                       = 0 (flags O_RDONLY)
fcntl(9, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
fcntl(12, F_GETFL)                      = 0x1 (flags O_WRONLY)
fcntl(12, F_SETFL, O_WRONLY|O_NONBLOCK) = 0
select(13, [7 9], [12], [], NULL)       = 1 (out [12])
close(12)                               = 0
select(10, [7 9], [], [], NULL)         = 1 (in [7])
read(7, "/var/www/vhosts/default/htdocs/."..., 8192) = 140
select(10, [7 9], [], [], NULL)         = 2 (in [7 9])
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=26586, si_uid=0, si_status=0, si_utime=0, si_stime=3} ---
read(7, "", 8192)                       = 0
close(7)                                = 0
read(9, "", 8192)                       = 0
close(9)                                = 0
wait4(26586, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 26586
stat("/usr/local/psa/tmp/agent51931d0bb74ae3f82b1f51fd166de260", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/usr/local/psa/tmp/agent51931d0bb74ae3f82b1f51fd166de260", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 7
getdents(7, /* 3 entries */, 32768)     = 72
getdents(7, /* 0 entries */, 32768)     = 0
close(7)                                = 0
lstat("/usr/local/psa/tmp/agent51931d0bb74ae3f82b1f51fd166de260/.log", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/usr/local/psa/tmp/agent51931d0bb74ae3f82b1f51fd166de260/.log", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/usr/local/psa/tmp/agent51931d0bb74ae3f82b1f51fd166de260/.log", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 7
getdents(7, /* 2 entries */, 32768)     = 48
getdents(7, /* 0 entries */, 32768)     = 0
close(7)                                = 0
rmdir("/usr/local/psa/tmp/agent51931d0bb74ae3f82b1f51fd166de260/.log") = 0
rmdir("/usr/local/psa/tmp/agent51931d0bb74ae3f82b1f51fd166de260") = 0
access("/usr/local/psa/admin/bin/modules/letsencrypt/filemng", F_OK) = -1 ENOENT (No such file or directory)
nanosleep({tv_sec=0, tv_nsec=1000}, NULL) = 0
pipe([7, 8])                            = 0
pipe([9, 10])                           = 0
pipe([11, 12])                          = 0
...
 
Peter,
Tokens for unsuccessful challenges were not deleted before on systems with a large number of domains. This led to cluttering up disk space.
Therefore this task was made that cleans the trash.
I showed this problem to developers. They find it difficult to say what could be a reason for this behaviour and offer to contact support to figure it out directly on the server.
 
Thank you for your feedback. On some of the hosts the process finished last night. It was running more than 24 hours on them. On some others, it is still running. It is very well possible that we have a huge number of files to process on these machines as there are thousands of customers and domains. The process has only created an issue for a surveillance software we are using. Sometimes that needs to auto-restart the web server, and that restart must not be done when a Let's Encrypt reconfiguration is ongoing, so the software looks into the process list, sees the "letsencrypt" string of a running process and will not execute the web server restart, because it could be possible that the certificate file names in the configuration files won't match what is on disk during the restart process, so a restart would fail for a syntax error.

I was made aware of the long running task by that service surveillance software, because it told me that it could not executed the expected tasks for a number of minutes as a Let's Encrypt process is running.

I'll now simply wait until the process finishes on the other machines, too. As it did on some, it probably will on others, too.
 
Hello,
We've investigate such behavior and found that task which deletes expired tokens running too slowly which leads to a long time on servers with a huge number of tokens.
We've create a bug with ID EXTLETSENC-845 which will be fixed in one of the next releases.
At the moment we did not find how to stop this task in a safe way:
Stopping of the task leads to the temporary inconsistency in Plesk's database. Plesk thinks that task is still exists, but after some time task become failed and everything looks Ok. Next time task is running fast if tokens were deleted manually.
Cleaning-up tokens manually does not stop task because Plesk still stores the tokens' attributes in its database and after checking them all the task is stopped. We did not find that the task became faster when the tokens are absent. But you may try to clean-up tokens anyway by using the following script:
# find /var/www/vhosts/default/htdocs/.well-known/acme-challenge/ -type f -ctime +90 -exec rm -rf {} +;
This script will delete all the tokens which are older then 90 days from Common Challenge Directory folder.
 
Back
Top