Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   General Forum (http://www.chiefdelphi.com/forums/forumdisplay.php?f=16)
-   -   [FRC BLOG] The Great Registration System Crash of 2016 (http://www.chiefdelphi.com/forums/showthread.php?t=151471)

bdaroz 22-09-2016 18:10

[FRC BLOG] The Great Registration System Crash of 2016
 
The Great Registration System Crash of 2016

Quote:

The Great Registration System Crash of 2016
Written by Frank Merrick
I am so sorry for the trouble we experienced with initial event registration today. We at FIRST HQ really do understand the investment of resources teams make to participate in our programs, and teams have the right to expect that we will be holding up our end of the deal by making things happen like we say we will. We failed to do that in this instance, and once again, I apologize.

While we did perform extensive load testing on the servers in preparation for this event, something still went wrong. Our Information Technology Department has been working feverishly since the crash to puzzle this out and come up with a plan to prevent recurrence. It’s not yet clear that it was the registration load per se that actually caused the problem. The system crashed immediately, before even a single team was registered, and this is strange behavior.

As you hopefully know from our tweets, Facebook posts, and emails, we are postponing initial registration until next week. Our goal is to be able to announce the specific day and time for registration before the weekend begins, so you can make appropriate plans.

Another goal we have is to be open and honest about what caused this issue and the steps we are taking to correct it. Our IT department has said they will do a guest blog explaining the situation in detail as soon as we have the facts.

Once again, I am sorry. Despite this issue, I do believe we have a great season ahead of us!

Frank

SenorZ 22-09-2016 18:25

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
I remember where I was...

dodar 22-09-2016 18:34

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by SenorZ (Post 1608539)
I remember where I was...

Pepperidge Farm remembers

Jardanium 22-09-2016 19:01

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
It was said that this event caused periods known as the Great Compression and STEAM Bowl to follow...

In all seriousness, I personally appreciate the continued transparency initiative from Frank and the rest of FIRST HQ. It's nice to be in the loop about these things! :)

BenDSterling 22-09-2016 19:03

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
I can't wait to find out exactly what the problem was. I'm really curious what caused it to crash before any teams where even registered.

JB987 22-09-2016 19:16

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
So when is round 2???

BrendanB 22-09-2016 19:22

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by JB987 (Post 1608552)
So when is round 2???

We probably won't hear a time for that until they've remedied the problem. No sense in setting a deadline you can't meet. Thank you Frank for addressing the crash before the close of the day.

Bryan Herbst 22-09-2016 19:43

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
I suspected it was more than just load. If teams are anything like me, they didn't just log on at precisely 12pm- they were refreshing the page a few minutes beforehand.

The system crashed right when it should have gone live, so I am thinking something with activating registration revealed a different problem.

marshall 22-09-2016 20:10

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Frank's Blog
Another goal we have is to be open and honest about what caused this issue and the steps we are taking to correct it. Our IT department has said they will do a guest blog explaining the situation in detail as soon as we have the facts.

I am really looking forward to this. I expect nginx and IIS logs!

sanddrag 23-09-2016 00:44

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
I appreciate that FIRST is honest and open about it, and they made the right call to reschedule to next week. However, many of us waited an hour at our computers, in an ambiguous state not knowing if we were going to register today or not. I'm not sure they even realize how high the stakes are to get into certain events. Certain events literally fill within 120 seconds, or even less.

What FIRST really should have done was had a twitter announcement ready to fire out in the event that something like this happened, and a staff member assigned to do it. They should have seen that they had a problem that they could not resolve by 9:05 AM. They should have been ready to abort the process by 9:10 AM, and a Twitter announcement should have gone out no later than 9:15 AM at the latest, whether they had a statement prepared or not. It's not okay to make that many people wait as long as we did, when it's in the middle of the work day. The second they saw they had an issue that would take more than 1 minute to fix, they should have aborted the whole plan, and gone to the plan to release a statement regarding rescheduling of the registration date.

I agree, load was not the issue. I was getting consistent 5-second page refresh times from 8:45 AM PDT right up to 8:59:56 PDT when I did my last refresh. At 9:00:02 the page immediately loaded with a run-time error.

I appreciate them calling it off, but I just wish it had happened (via twitter, facebook, e-mail, etc) long before 10AM.

Let this be a lesson for FIRST not only in registration, but in the fact that they need to have plans and procedures in place to widely distribute information out to their teams in a more timely fashion than a 1-hour delay.

DaveL 23-09-2016 04:44

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
If the issue had something to do with volume, FIRST could have a different day for district teams to register.
FIRST could even separate the districts further, by picking a different day for each district to register.

Dave

Billfred 23-09-2016 08:35

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by sanddrag (Post 1608622)
I appreciate them calling it off, but I just wish it had happened (via twitter, facebook, e-mail, etc) long before 10AM.

Let this be a lesson for FIRST not only in registration, but in the fact that they need to have plans and procedures in place to widely distribute information out to their teams in a more timely fashion than a 1-hour delay.

To split hairs, the abort message hit their Twitter account at 12:45 PM Manchester time. I imagine they were making efforts to get things back up before realizing how borked everything was.

Jon Stratis 23-09-2016 09:21

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Billfred (Post 1608636)
To split hairs, the abort message hit their Twitter account at 12:45 PM Manchester time. I imagine they were making efforts to get things back up before realizing how borked everything was.

Having been in situations like this before (in a professional capacity), I can tell you that a 45 minute response time isn't actually that bad. Even if you have everyone needed to analyse the problem standing by, it does take some amount of time to pull the logs, find something relevant in them, and figure out what's going on, or even to figure out if it's a 5 minute or 5 hour fix. Add on top of that having to communicate the details and expectations from the engineers to management, get a decision, and get the communication out.

And then you have to ask what FIRST would consider acceptable. Sure, they want everything to go smoothly, but if there is a hiccup is starting registration 5 or 10 minutes late acceptable? Where do they decide to draw the line? It's all good to sit back as an armchair quarterback and say that 5 minutes is too late, but that really doesn't take into account the realities of business. This isn't life or death, waiting a few minutes, or even an hour, isn't going to be the end of the world.

Whatever 23-09-2016 10:18

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
So is there a pool on the reason?

If there is I want: "A team set up automatic registration routine to make sure they got their first choice and that routine went nuts bringing down the server."

marshall 23-09-2016 10:56

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Whatever (Post 1608652)
So is there a pool on the reason?

If there is I want: "A team set up automatic registration routine to make sure they got their first choice and that routine went nuts bringing down the server."

My bet is for a failed load balancer OR a failed interaction between the login system and the registration system, though they could be one system, I don't think they are.


All times are GMT -5. The time now is 03:14.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi