One fine day, pretty late at night, I am being called up by one of my consulting clients that their site is down. It’s a Shopware 6 shop on an Ubuntu 22.04 LTS server and the main developer on the project has already tried to restart the mysql service and the whole server several times to no avail.
Now, it becomes my part to look at the issue.
The website itself shows its generic Shopware “error 500” page. I ssh into the server and check what software currently runs. I see that apache2 is still up and mysqld is there, too, but it gets a new PID every few seconds. This means, it’s stuck in a reboot loop.
I check the mysql logs (at /var/log/mysql/error.log
):
[ERROR] [MY-011906] [InnoDB] Database page corruption on disk or a failed file read of page [page id: space=0, page number=5]. You may have to recover from a backup.
This error appears over and over. Looks like the database is screwed beyond the point where a simple restart helps.
To stop the endless restart loop, I stop the mysql service and the service that is using it:
systemctl mysql stop
systemctl apache2 stop
There is a small chance to recover the corrupted binary database files. It’s not guaranteed, though. First of all, I make a backup of all the database files. In my case the files are all in the standard folder /var/lib/mysql
.
cp /var/lib/mysql/ib* ~/db-backup-crash/
cp -r /var/lib/mysql/{database name} ~/db-backup-crash/
If you are following suite, please replace {database name}
with the name of your database.
Now, we can try to start mysql in recovery mode. First, find the mysql-config:
# find the right config file
find /etc/mysql -type f -exec grep "\[mysqld\]" '{}' \; -print
This gives me this output
[mysqld]
/etc/mysql/mysql.conf.d/mysqld.cnf
Now, I open /etc/mysql/mysql.conf.d/mysqld.cnf
and add the following line at the end of the file
innodb_force_recovery = 1
It’s time to start the database again and watch the mysql error logs (at /var/log/mysql/error.log
). If the mysql still won’t start, I’d increase the innodb_force_recovery level. Any level beyond 4 may corrupt the database even more. Here’s a table about what the levels mean:
Level | Type | Effect |
---|---|---|
1 | SRV_FORCE_IGNORE_CORRUPT | Ignores corrupted pages and tries to restart |
2 | SRV_FORCE_NO_BACKGROUND | Start without background tasks (purge and master thread) |
3 | SRV_FORCE_NO_TRX_UNDO | Does not run automatic transaction rollbacks |
… | … | ! Don’t wander beyond this point! Here be dragons ! |
4 | SRV_FORCE_NO_IBUF_MERGE | No buffer usage for INSERTs. This may corrupt indexes. Secondary indexes need to be dropped and recreated |
5 | SRV_FORCE_NO_UNDO_LOG_SCAN | Ignore UNDO-Logs. All transactions are considered “done”. This is likely to corrupt data |
6 | SRV_FORCE_NO_LOG_REDO | Ignore REDO-Logs. Leaves database pages in obsolete state, which may cause more corruption down the road |
To my great satisfaction, setting the innodb_force_recovery
to 1 does the trick right away. The mysql-server comes right back up. If level 1 is not working, I’d have increased it one step at a time. For any level greater than 1 I’d dump all databases, reset mysql to factory settings and import all databases, again.
To make sure that there is no permanent damage to the data, I use mysqlcheck
:
mysqlcheck --all-databases
It takes a while, but fortunately there are no further problems.
It’s time to restart mysql normally. I set innodb_force_recovery
to zero and start mysql and apache regularly:
systemctl mysql start
systemctl apache2 start
The system is back up. Database works again.
Now, this was quite some excitement for such a lovely night. I’m happy I could recover the DB without re-building gigabytes of databases. It’s still a bit dangerous to not rebuild everything, but I’m taking the chances here. Rebuilding everything would have probably taken a few more hours. With the quick repair, the site was only down for about two or three hours. I was informed after 1h of downtime.
Happy coding, Manuel