FAQs for Jan 2021 Server Issues

What were the symptoms of the issue?

As users moved around Tapestry, they were taken to error pages. In some cases that was a 500 error page (a drawing of two dogs, a server, and one of our devs), and in some cases, a 504 error (a text page). Either way, whatever they were trying to do would not have worked and they would have needed to try again.

It affected some Tapestry accounts more severely than others. Most people experienced errors to an extent, but some people experienced a lot of them. The error rates were not consistent within accounts, meaning users may have experienced quite a lot of them in a short burst, and then no more for a while.

To help reduce these we turned off ‘background processes’ as soon as we could see load on our servers was high. What this meant in terms of symptoms was:

Notifications stopped going out.
Media uploaded successfully, provided an error page wasn’t shown, but was not ‘processed’ and therefore wasn’t visible.
Documents uploaded successfully, provided an error page wasn’t shown, but the virus check we run on them was not started and therefore they were not downloadable.
PDFs set to export did not become available to download.
Scheduled observations were not published.

These things were not lost, they were put into a queue and went out when our servers were stable – usually overnight.

Did high traffic cause it?

Whilst the issue did occur because of very high traffic, it was not one of scalability, but rather a deeper technical problem. The exceptional level of activity caused a chain reaction on our database servers. Those with technical expertise may want to read our technical explanation.

What did you do to fix it?

As soon as we saw the servers starting to struggle, we turned off what we call ‘background processes’. This reduced the strain on our servers and meant users were more likely to be able to move around Tapestry, view existing posts, and add new posts successfully. However, this was not a long term solution.

In terms of longer-term fixes, our first steps were to:

Increase our database servers to the largest ones that our hosting company, AWS, provides.
Set up systems to watch closely what was happening to our databases when these errors started – this is known as logging. That gave us a more detailed insight into the problem.

Across the following week we:

Made background changes to Tapestry to share the load more evenly across our database servers.
Took steps to help reduce the chance of certain actions clashing with each other, slowing the system down.

January 2021 Server Issues

Frequently Asked Questions

What were the symptoms of the issue?

Did high traffic cause it?

What did you do to fix it?

Is it fully resolved?