There are really only two things that stress me out - one is when the electrics on my motorcycle fail, the other is when my servers fail.
This week, both happened.
'Sterling', my 28-year-old Yamaha SR400
You're speeding along and you look down to notice that neither the headlights, brakes or indicators are working. And whilst you continue riding, you can no longer concentrate on the road — your mind is constantly occupied by the problem. Servers are much the same.
The two problems are also similar in that neither has a clear resolution — maybe it's a loose connection, a shorted fuse, a corrupted config file, or something else entirely. Whatever the issue, fixing it usually comes down to a bit of trial and error.
This week the Prevue servers have experienced some significant issues that prevented the uploading, deleting and editing of images — and in some cases have frozen accounts entirely. And whilst it's only affected a very small percentage of accounts, I've been working around the clock to diagnose and fix the problem. So in the interest of transparency, I wanted to elaborate and explain what's being done.
What happened? The root of the problem came from a malfunction in the secure connection that handles the uploading of images — which would cause roughly 1 in every 10 uploads to fail. That upload failure would then cause affected accounts to stall for around 60 seconds, during which time all actions and pageloads wouldn't work at all. After some tweaking, those problems now only seem to be affecting one in every 1,000 uploads - which is an improvement, but by no means a solution.
What is being done? This weekend will see the release of a newer and significantly improved upload system — which has been built with the intention of returning the app to a 100% upload success rate. But if you spot any problems in the meantime, please send them my way — and keep an eye on Twitter for updates.
Well it seems that all my electrical issues were simply being caused by a loose connection to the main battery. It took 2 days to get around to, 30 minutes to diagnose and 10 seconds to fix.
If only servers were that easy.