Denis Gomes Franco
Regular Pleskian
So I just wanted to record in writing what I've been through for the past few hours today with Plesk Obsidian. I immediately started shaking when I realized what just happened. Maybe writing about it can calm me down a little bit, and prevent others from making the same mistakes as I did. And I suppose this could also work as a feedback to the Plesk team.
This morning some customers complained that they couldn't log on to their webmails. I started to investigate and after quite a while I found out that dovecot wasn't running, it was failing with status code 89. After searching for a solution I came across a topic where @IgorG replied: Resolved - Dovecot seems down, but...
So I thought... okay, uninstalling and then reinstalling dovecot might just work, why not give it a try... Then I copied and pasted this command:
And dovecot started uninstalling.
And then Plesk.
And then other essential components.
In a few seconds I thought 'Oh f@@@...' and the Plesk panel was just gone. I panicked. First because I thought 'Why uninstalling dovecot would also uninstall the panel?'. Then because I thought 'Oh sh@@, all my domains must be down as well...'
So I started assessing the situation and looking for tech support, as I can do my own maintenance and care but I'm no expert. Fortunately, Apache, Nginx and MariaDB kept running, so the hosted sites weren't down, they kept operating as expected. Now, on to find a solution to the mess I just caused.
Restoring from backups was not possible. I'm in process of setting up a new backup scheme as the current one and the last backups aren't working, and in any case it would take quite some time to put everything back. Setting up a new server would also take time, as well as transferring these clients to my other servers, and in any case even if I did transfer them, there is the issue with DNS propagation (which I have witnessed firsthand in the past...)
I actually paid for the Emergency Service at BobCares to see if they could help me sort this thing out, but they were quite slow. First response was fast but they initially suggested simply restoring the backups or installing a fresh server - quite the expensive advice, I must say. Anyway, I managed to LEARN and FIX everything by myself in the time it took for them to simply assess my situation.
So, to make a long story short, in about 3 hours:
- I tried running the installation wizard again from CLI, figuring that "reinstalling over" would solve it. It detected the missing components but the installation produced errors.
- With the help of Plesk's support articles I figured out how to re-add the components that were accidentally uninstalled (Plesk panel, Roundcube, Firewall, etc)
- I found out that PSA and ROUNDCUBEMAIL tables were dropped, but managed to get them back from the daily dumps that Plesk saves separately from regular backups. PSA table imported fine but the ROUNDCUBEMAIL table couldn't be imported because there were errors with foreign key constraints. My fix was to disable foreign key checks before importing.
- I managed to get the panel back up and running, so it was time to fix the last issue: dovecot wouldn't start.
- Learned about how to read logs (journalctl), found out that dovecot had some invalid configuration files due to a domain that was deleted yesterday. Not sure why Plesk kept these files around... Anyway, deleted those files and then dovecot started running again, but now no one could log in to their mailboxed.
- Investigated a bit more and noticed that I could not create mailboxes or change passwords. Found a Plesk article referencing the error, I skimped over it. Started panicking that I had lost customers mails or something.
- Decided to run PLESK REPAIR MAIL, I had already run it before and caused no harm, so I waited.
And after all of that... I finally had everything back up and running. PLESK REPAIR MAIL had fixed the permissions and everything, and customers could log in to their mailboxes again. No emails were lost, nor Roundcube contacts and signatures. Fortunately, most of my customers were quite forgiving and patient during this outage.
Now, on to some questions:
- Why a single command to uninstall dovecot uninstalled other components as well? Yes, I'm aware of my stupidity and I know I maybe should have uninstalled it from the GUI.
- And why it uninstalled *critical* components such as the panel itself?
- Why Plesk didn't delete those dovecot configuration files when deleting the domain? Or, why these files caused problems in the first place?
This morning some customers complained that they couldn't log on to their webmails. I started to investigate and after quite a while I found out that dovecot wasn't running, it was failing with status code 89. After searching for a solution I came across a topic where @IgorG replied: Resolved - Dovecot seems down, but...
So I thought... okay, uninstalling and then reinstalling dovecot might just work, why not give it a try... Then I copied and pasted this command:
# plesk installer --select-product-id plesk --select-release-current --remove-component dovecot
And dovecot started uninstalling.
And then Plesk.
And then other essential components.
In a few seconds I thought 'Oh f@@@...' and the Plesk panel was just gone. I panicked. First because I thought 'Why uninstalling dovecot would also uninstall the panel?'. Then because I thought 'Oh sh@@, all my domains must be down as well...'
So I started assessing the situation and looking for tech support, as I can do my own maintenance and care but I'm no expert. Fortunately, Apache, Nginx and MariaDB kept running, so the hosted sites weren't down, they kept operating as expected. Now, on to find a solution to the mess I just caused.
Restoring from backups was not possible. I'm in process of setting up a new backup scheme as the current one and the last backups aren't working, and in any case it would take quite some time to put everything back. Setting up a new server would also take time, as well as transferring these clients to my other servers, and in any case even if I did transfer them, there is the issue with DNS propagation (which I have witnessed firsthand in the past...)
I actually paid for the Emergency Service at BobCares to see if they could help me sort this thing out, but they were quite slow. First response was fast but they initially suggested simply restoring the backups or installing a fresh server - quite the expensive advice, I must say. Anyway, I managed to LEARN and FIX everything by myself in the time it took for them to simply assess my situation.
So, to make a long story short, in about 3 hours:
- I tried running the installation wizard again from CLI, figuring that "reinstalling over" would solve it. It detected the missing components but the installation produced errors.
- With the help of Plesk's support articles I figured out how to re-add the components that were accidentally uninstalled (Plesk panel, Roundcube, Firewall, etc)
- I found out that PSA and ROUNDCUBEMAIL tables were dropped, but managed to get them back from the daily dumps that Plesk saves separately from regular backups. PSA table imported fine but the ROUNDCUBEMAIL table couldn't be imported because there were errors with foreign key constraints. My fix was to disable foreign key checks before importing.
- I managed to get the panel back up and running, so it was time to fix the last issue: dovecot wouldn't start.
- Learned about how to read logs (journalctl), found out that dovecot had some invalid configuration files due to a domain that was deleted yesterday. Not sure why Plesk kept these files around... Anyway, deleted those files and then dovecot started running again, but now no one could log in to their mailboxed.
- Investigated a bit more and noticed that I could not create mailboxes or change passwords. Found a Plesk article referencing the error, I skimped over it. Started panicking that I had lost customers mails or something.
- Decided to run PLESK REPAIR MAIL, I had already run it before and caused no harm, so I waited.
And after all of that... I finally had everything back up and running. PLESK REPAIR MAIL had fixed the permissions and everything, and customers could log in to their mailboxes again. No emails were lost, nor Roundcube contacts and signatures. Fortunately, most of my customers were quite forgiving and patient during this outage.
Now, on to some questions:
- Why a single command to uninstall dovecot uninstalled other components as well? Yes, I'm aware of my stupidity and I know I maybe should have uninstalled it from the GUI.
- And why it uninstalled *critical* components such as the panel itself?
- Why Plesk didn't delete those dovecot configuration files when deleting the domain? Or, why these files caused problems in the first place?