We have already added Webserver Restart delay 60 seconds, graceful restart in the PSA database etc. with no positive effect.
Graceful restarts is an important step to ease the problem. Then, choose a much longer restart interval. 60 seconds will cause many problems, because one restart won't have finished when the next one is already occuring. See explanation in next paragraph. You should allow at least five minutes (=300 seconds), on high load servers better fifteen minutes, because it can take a very long while until an Apache restart sequence is completed. Apache will wait a long while in the shutdown (deactivation) phase, because it is waiting on other processes to finish their job.
It seems to be a combination of too low PHP FPM values for pm.max_children and pm.max_requests and long running PHP scripts in subscriptions. When Apache restarts, it tries to wait with shutting down until PHP processes that are run through FPM service have completed. If FPM is causing trouble at this point, it takes a long while for Apache to restart. The more subscriptions are using PHP FPM instead of FastCGI, the more intense the issue becomes. It is not easy to solve, because the load that PHP scripts create cannot be controlled through any settings. If there are scripts running that are causing many changes to the FPM processes and have a long runtime, it will simply take a long while for the shutdown. During that shutdown phase, new requests won't be served if no spare FPM service children are remaining or the number of max requests of others have been reached, but more requests are coming in, which is giving extra trouble.
To make a long story short: We've seen that effect on one of our machines for several months now, and since we have removed one customer who was using long running scripts and have increased the max_children and max_requests values on all subscriptions who showed "has reached max_children" error in PHP FPM logs, the situation has greatly improved. Before we had several FPM and Apache failures and outages (very short, only a few seconds, but they did occur), now this machine is up to 100% availability again and responding faster to domain creation or changes on the panel, too.
pm.max_children and pm.max_requests must be set according to the physical characteristics of the server. On a virtual server, max_children=5 and max_requests=2000 can make sense, while on a powerful dedicated server with many hosting accounts, max_children=25 and max_requests=10000 makes more sense. There is no good rule on what numbers to choose, they must simply balance RAM usage, process load and overall service behavior depending on the platform.