On Sunday the 13th of April between approximately 4.15am and 11pm (UTC), ShareLaTeX was down for the longest time since it was launched, sorry. The end result was that we had to restore some user accounts and projects back to how they were on Saturday morning (12th April, at 6am UTC) in order to bring the site back up. If you were one of the accounts who was affected, you can still access the latest versions of your projects via https://www.sharelatex.com/restore, and we have no reason to think there was any significant data loss - it’s just a bit of mess, sorry!

The Incident

The purpose of the rest of this post is to let you know what happened in more detail and to explain what we’re going to do to prevent something like this in future.

At approximately 4am UTC, there was a power cut in the data center that contains our servers, and ShareLaTeX was effectively turned off at the wall. The servers that power ShareLaTeX slowly came back online over the next 6 hours, and everything was powered on by 10am.

Unfortunately, when we brought our database servers back online, we discovered that the abrupt power cut had corrupted one of our servers. We run our database servers in a replica set so that if one fails, another one should be able to take over. However, for a reason that we are still investigating, the corruption either replicated to all of the database servers, or put the replica set into a state where it wasn’t happy.

Over the next 8-10 hours, we attempted to repair the database. This took a lot longer than expected, since we first had problems bringing up a duplicate of our database server to attempt the repair on. This was to do with an unfortunate (and seemingly unrelated) hardware problem with our hosting company. We then tried to repair the database twice, each time taking a few hours and both unfortunately failing.

At this point we decided to instead restore any corrupted data from our latest backup. The latest backup that we had was from Saturday morning (around 6am UTC). The power cut was happened just before our Sunday morning back up would have been taken. From around 8pm to 11pm, built up a new database from the backup to replace the corrupted parts of our running database.

This leaves us in the situation we are in now, where some projects and user accounts may be in an older state from Saturday morning. Fortunately we also have another, more up to date set of project backups which are unrelated to the database. These can not be easily be added directly into ShareLaTeX, but you can get the latest versions of your projects from the URL at https://www.sharelatex.com/restore.

The Future

We plan to make a few changes in our policies in response to this incident.

  • We will be taking more regular database backups, preferably every hour, but certainly more regularly than daily.
  • We are investigating why the entire replica set was affected by a fault with one database server, and how we can prevent this problem in future.
  • We should have begun the procedure of restoring the backup as soon as we knew there was a problem with the database. We might not have needed it, but having it ready could have saved us a few hours downtime.

Apologies

Please accept our heartfelt apologies for this incident. Sunday was a very stressful day for us (and for lots of you with deadlines!), but some of your kind comments and words of encouragement were really helpful, thank you. Having done a PhD myself, I know the pain and fear of not being able to access important work, and I am beyond frustrated with our large downtime in this case. We will do everything we can to stop this happening again.

Regards,
James and Henry

Posted by James Allen on 14 Apr 2014

You can now quickly jump forwards and backwards between your code and the same place in the output PDF. We’ve introducted two new buttons between the code and PDF panels:

  • To go from your code to the corresponding place in the PDF, click the top button with the right arrow.
  • To go from the PDF to your code, click the bottom button with the left arrow.

Simple! You can also double-click anywhere in the PDF to jump to that place in the code.

Forward and reverse search

Posted by James Allen on 10 Apr 2014

Over the past year we’ve had loads of feedback about what people want from our history system, and it’s pretty clear that our existing offering wasn’t doing the job. We’re pleased to be rolling out a complete overhaul today which addresses many of previous problems and hopefully fits in well with your workflow.

Track Changes Screenshot

New Features

  • See exactly who changed what. Every word that has changed is highlighted with the unique color of the person who made it. Mouse over to find out when the change was made. This should make keeping track of your collaborators much easier.

  • See changes over any time period. Want to see everything that changed since you were last in the project? Since a day ago? Between the first Friday of the month and the last full-moon? No problem, you can select any range you like!

  • Revert any file back to any point in time. You can undo any changes, in any file, so there’s no need to worry about making any mistakes.

Many thanks to everyone who has helped us test and refine this new feature!

Posted by James Allen on 31 Mar 2014