Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   General Forum (http://www.chiefdelphi.com/forums/forumdisplay.php?f=16)
-   -   [FRC BLOG] The Great Registration System Crash of 2016 (http://www.chiefdelphi.com/forums/showthread.php?t=151471)

bdaroz 22-09-2016 18:10

[FRC BLOG] The Great Registration System Crash of 2016
 
The Great Registration System Crash of 2016

Quote:

The Great Registration System Crash of 2016
Written by Frank Merrick
I am so sorry for the trouble we experienced with initial event registration today. We at FIRST HQ really do understand the investment of resources teams make to participate in our programs, and teams have the right to expect that we will be holding up our end of the deal by making things happen like we say we will. We failed to do that in this instance, and once again, I apologize.

While we did perform extensive load testing on the servers in preparation for this event, something still went wrong. Our Information Technology Department has been working feverishly since the crash to puzzle this out and come up with a plan to prevent recurrence. It’s not yet clear that it was the registration load per se that actually caused the problem. The system crashed immediately, before even a single team was registered, and this is strange behavior.

As you hopefully know from our tweets, Facebook posts, and emails, we are postponing initial registration until next week. Our goal is to be able to announce the specific day and time for registration before the weekend begins, so you can make appropriate plans.

Another goal we have is to be open and honest about what caused this issue and the steps we are taking to correct it. Our IT department has said they will do a guest blog explaining the situation in detail as soon as we have the facts.

Once again, I am sorry. Despite this issue, I do believe we have a great season ahead of us!

Frank

SenorZ 22-09-2016 18:25

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
I remember where I was...

dodar 22-09-2016 18:34

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by SenorZ (Post 1608539)
I remember where I was...

Pepperidge Farm remembers

Jardanium 22-09-2016 19:01

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
It was said that this event caused periods known as the Great Compression and STEAM Bowl to follow...

In all seriousness, I personally appreciate the continued transparency initiative from Frank and the rest of FIRST HQ. It's nice to be in the loop about these things! :)

BenDSterling 22-09-2016 19:03

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
I can't wait to find out exactly what the problem was. I'm really curious what caused it to crash before any teams where even registered.

JB987 22-09-2016 19:16

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
So when is round 2???

BrendanB 22-09-2016 19:22

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by JB987 (Post 1608552)
So when is round 2???

We probably won't hear a time for that until they've remedied the problem. No sense in setting a deadline you can't meet. Thank you Frank for addressing the crash before the close of the day.

Bryan Herbst 22-09-2016 19:43

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
I suspected it was more than just load. If teams are anything like me, they didn't just log on at precisely 12pm- they were refreshing the page a few minutes beforehand.

The system crashed right when it should have gone live, so I am thinking something with activating registration revealed a different problem.

marshall 22-09-2016 20:10

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Frank's Blog
Another goal we have is to be open and honest about what caused this issue and the steps we are taking to correct it. Our IT department has said they will do a guest blog explaining the situation in detail as soon as we have the facts.

I am really looking forward to this. I expect nginx and IIS logs!

sanddrag 23-09-2016 00:44

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
I appreciate that FIRST is honest and open about it, and they made the right call to reschedule to next week. However, many of us waited an hour at our computers, in an ambiguous state not knowing if we were going to register today or not. I'm not sure they even realize how high the stakes are to get into certain events. Certain events literally fill within 120 seconds, or even less.

What FIRST really should have done was had a twitter announcement ready to fire out in the event that something like this happened, and a staff member assigned to do it. They should have seen that they had a problem that they could not resolve by 9:05 AM. They should have been ready to abort the process by 9:10 AM, and a Twitter announcement should have gone out no later than 9:15 AM at the latest, whether they had a statement prepared or not. It's not okay to make that many people wait as long as we did, when it's in the middle of the work day. The second they saw they had an issue that would take more than 1 minute to fix, they should have aborted the whole plan, and gone to the plan to release a statement regarding rescheduling of the registration date.

I agree, load was not the issue. I was getting consistent 5-second page refresh times from 8:45 AM PDT right up to 8:59:56 PDT when I did my last refresh. At 9:00:02 the page immediately loaded with a run-time error.

I appreciate them calling it off, but I just wish it had happened (via twitter, facebook, e-mail, etc) long before 10AM.

Let this be a lesson for FIRST not only in registration, but in the fact that they need to have plans and procedures in place to widely distribute information out to their teams in a more timely fashion than a 1-hour delay.

DaveL 23-09-2016 04:44

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
If the issue had something to do with volume, FIRST could have a different day for district teams to register.
FIRST could even separate the districts further, by picking a different day for each district to register.

Dave

Billfred 23-09-2016 08:35

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by sanddrag (Post 1608622)
I appreciate them calling it off, but I just wish it had happened (via twitter, facebook, e-mail, etc) long before 10AM.

Let this be a lesson for FIRST not only in registration, but in the fact that they need to have plans and procedures in place to widely distribute information out to their teams in a more timely fashion than a 1-hour delay.

To split hairs, the abort message hit their Twitter account at 12:45 PM Manchester time. I imagine they were making efforts to get things back up before realizing how borked everything was.

Jon Stratis 23-09-2016 09:21

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Billfred (Post 1608636)
To split hairs, the abort message hit their Twitter account at 12:45 PM Manchester time. I imagine they were making efforts to get things back up before realizing how borked everything was.

Having been in situations like this before (in a professional capacity), I can tell you that a 45 minute response time isn't actually that bad. Even if you have everyone needed to analyse the problem standing by, it does take some amount of time to pull the logs, find something relevant in them, and figure out what's going on, or even to figure out if it's a 5 minute or 5 hour fix. Add on top of that having to communicate the details and expectations from the engineers to management, get a decision, and get the communication out.

And then you have to ask what FIRST would consider acceptable. Sure, they want everything to go smoothly, but if there is a hiccup is starting registration 5 or 10 minutes late acceptable? Where do they decide to draw the line? It's all good to sit back as an armchair quarterback and say that 5 minutes is too late, but that really doesn't take into account the realities of business. This isn't life or death, waiting a few minutes, or even an hour, isn't going to be the end of the world.

Whatever 23-09-2016 10:18

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
So is there a pool on the reason?

If there is I want: "A team set up automatic registration routine to make sure they got their first choice and that routine went nuts bringing down the server."

marshall 23-09-2016 10:56

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Whatever (Post 1608652)
So is there a pool on the reason?

If there is I want: "A team set up automatic registration routine to make sure they got their first choice and that routine went nuts bringing down the server."

My bet is for a failed load balancer OR a failed interaction between the login system and the registration system, though they could be one system, I don't think they are.

Taylor 23-09-2016 11:20

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Put me down for: A Griswoldian cat chewed on a cable.

Whatever 23-09-2016 11:37

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Any takers for "Woodie Flowers spilled Mango Juice on the server?"

Gary Dillard 23-09-2016 12:53

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Look for metal shavings in a pwm port; that's caused us to crash plenty of times.

Trying to Help 23-09-2016 13:53

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Jardanium (Post 1608546)
In all seriousness, I personally appreciate the continued transparency initiative from Frank and the rest of FIRST HQ. It's nice to be in the loop about these things! :)

Yes, thank you Frank for speaking to this directly.

Conor Ryan 23-09-2016 14:44

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Gary Dillard (Post 1608674)
Look for metal shavings in a pwm port; that's caused us to crash plenty of times.

The barrel plug slipped out of the router when they went over a bump.

bobbysq 23-09-2016 14:50

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Conor Ryan (Post 1608692)
The barrel plug slipped out of the router when they went over a bump.

*Iowa Regional flashbacks*

I need to post a picture of our router and what the power port looked like by the end of quals at Worlds. It hasn't disconnected since.

and then we forgot to tell the top 8 teams not that we would have been picked anyway

Foster 23-09-2016 14:53

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Jon Stratis (Post 1608645)
Having been in situations like this before (in a professional capacity), I can tell you that a 45 minute response time isn't actually that bad. Even if you have everyone needed to analyze the problem standing by, it does take some amount of time to pull the logs, find something relevant in them, and figure out what's going on, or even to figure out if it's a 5 minute or 5 hour fix. Add on top of that having to communicate the details and expectations from the engineers to management, get a decision, and get the communication out.

Exactly. And from experience that 45 minutes flies by in seconds. With something as complex as a multi connected, load balanced, multi system web application with a database, there is a ton of stuff to look at piece by piece.

Quote:

Originally Posted by Jon Stratis (Post 1608645)
This isn't life or death, waiting a few minutes, or even an hour, isn't going to be the end of the world.

Ummm, the last time I said something like this I got red dots from people saying how critical it was to be able to get into events for their season, so it was pretty close to the end of the world for a high school senior. :rolleyes:

I agree it's not the end of the world. And while I feel for the many dozens of you that lost an hour trying to register, FIRST has said they will do a full reset and start over. So all you are out is an hour of time. As I always say "I've wasted far more on far less" (works both for time, money and robot parts, feel free to add it to your list of phrases)

As a per-emptive effort, begin to tamp down your anger now when the FIRST parts purchase website is slow and you can't get the "free" parts you want during build season. ;)

bdaroz 23-09-2016 15:45

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
So taking this in a *slightly* different direction....

Kit/Kickoff registration is due to open 9/29.

Perhaps event registration should be pushed back behind Kit/Kickoff to try to work out some glitches there first?

Either way, Event + Kit/Kickoff on the same week runs the risk of turning our IT staff very crispy. :)

Hallry 23-09-2016 16:25

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
An update has now been posted.

runneals 23-09-2016 16:56

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by DaveL (Post 1608628)
If the issue had something to do with volume, FIRST could have a different day for district teams to register.
FIRST could even separate the districts further, by picking a different day for each district to register.

Quote:

Originally Posted by Whatever (Post 1608652)
If there is I want: "A team set up automatic registration routine to make sure they got their first choice and that routine went nuts bringing down the server."

Both of these are very valid ideas that are both worth attempting (although I think I read the second one differently the first time). Why don't they do something like what they do in December with FIRST Choice, where it allows a team to put in events and rank them which they would want 1-5(10). This would allow teams to come up with a list of events that they would want to attend, allow mentors to input them in their free time before the specified date, and then the slotting program be ran at a specified time to slot teams into events based on their choices that they made. For quite a few people, registration is not during their lunch hour so it could be hard for them to do it at work. The "slotting program" could also account for starting at 0 and working it's way up so older teams get first pick, but it could also include some spatial algorithm to allow teams to attend their closest event.
Allowing district teams to register on their own day would offer a way for IT to ensure that their systems would work for regional registration.

Just some random ramblings after reading this :)

Koko Ed 26-09-2016 04:29

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Whatever (Post 1608652)
So is there a pool on the reason?

If there is I want: "A team set up automatic registration routine to make sure they got their first choice and that routine went nuts bringing down the server."

Someone unplugged the server so they could charge their smartphone.

ATannahill 26-09-2016 08:19

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Koko Ed (Post 1608965)
Someone unplugged the server so they could charge their smartphoneflying machine.

FTFY

nuclearnerd 26-09-2016 08:24

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by runneals (Post 1608714)
Why don't they do something like what they do in December with FIRST Choice, where it allows a team to put in events and rank them which they would want 1-5(10). This would allow teams to come up with a list of events that they would want to attend, allow mentors to input them in their free time before the specified date, and then the slotting program be ran at a specified time to slot teams into events based on their choices that they made. For quite a few people, registration is not during their lunch hour so it could be hard for them to do it at work. The "slotting program" could also account for starting at 0 and working it's way up so older teams get first pick, but it could also include some spatial algorithm to allow teams to attend their closest event.

+1
https://www.chiefdelphi.com/forums/s...&postcount=148

Chris is me 26-09-2016 09:03

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by runneals (Post 1608714)
Both of these are very valid ideas that are both worth attempting (although I think I read the second one differently the first time). Why don't they do something like what they do in December with FIRST Choice, where it allows a team to put in events and rank them which they would want 1-5(10). This would allow teams to come up with a list of events that they would want to attend, allow mentors to input them in their free time before the specified date, and then the slotting program be ran at a specified time to slot teams into events based on their choices that they made. For quite a few people, registration is not during their lunch hour so it could be hard for them to do it at work. The "slotting program" could also account for starting at 0 and working it's way up so older teams get first pick, but it could also include some spatial algorithm to allow teams to attend their closest event.
Allowing district teams to register on their own day would offer a way for IT to ensure that their systems would work for regional registration.

Just some random ramblings after reading this :)

I don't like this nearly as much as I like the FIRST Choice system.

If you don't get what you want on FIRST Choice, you don't get some free stuff you weren't totally expecting to get. If a regional is full, I want to know as soon as I sign up for it that I'm not on the confirmed list so I can look at my other options and decide.

How can an algorithm decide if I'd rather be on the waitlist for one event rather than confirmed for an event 7 hours away that I know won't be filled? There's too many human decisions in this process for me to want to automate my decision making in an algorithm.

FIRST should be able to handle a few thousand page requests at once. If they can't, any number of outside firms I'm sure would love to have a contract to do this. This isn't an insurmountable challenge.

nuclearnerd 26-09-2016 09:54

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Chris is me (Post 1608978)
How can an algorithm decide if I'd rather be on the waitlist for one event rather than confirmed for an event 7 hours away that I know won't be filled? There's too many human decisions in this process for me to want to automate my decision making in an algorithm.

This is the easiest thing in the world to automate:
1) Each team submits priority list for their first event.
2) first event slots are raffled off. Some teams receive their second pick instead of their first
3) In the time between the first event and second event raffles, teams can adjust their priority list for the second event. We give the option to choose to be on a "waiting list" for a filled event.
4) Rinse and repeat

This would be a much more equitable system than "may the fastest clicker win" (or perhaps, may the fastest-coded-registration-bot win).

Foster 26-09-2016 10:00

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Chris is me (Post 1608978)

FIRST should be able to handle a few thousand page requests at once. If they can't, any number of outside firms I'm sure would love to have a contract to do this. This isn't an insurmountable challenge.

Fake News Item: Today FIRST announced that it had outsourced it's entire application suite to a major cloud vendor to improve a computer issue that happens one day per year. In related news, FIRST raised the cost of the initial registration and event by $1800. (*)

I'm not sure why a FIRST Choice like thing wouldn't work. You put your choices in by order you want them. Random selection of team. Is regional / district event available? Yes then book. No then look at second request. Is it booked, Yes then book, No then look at third request, and so on.

Since they hold slots back for rookies, they could easily do this years rookies first (in a random pick) and then the rest of the pool.

That would be far easier to manage and a better use of mentor time than hovering over keyboards trying to snipe a regional like you do a classic Beanie Baby on E-Bay.

Edited to add: The idea by nuclearnerd is also pretty good. It appears we were typing at the same time.

(*) Outsourcing / cloud services are not the inexpensive panacea that many people think. Yea, Yea "well I can spin up an Amazon instance in just a few seconds" sounds easy as heck in a robotics forum, not so easy to do in the real world, especially if you need to manage 1000 transactions per second for any time period.

Those of you that are ready to pound on your keyboards to refute me, don't bother. American has a ton of flights into Manchester, book one and go help them out.

Chris is me 26-09-2016 10:00

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by nuclearnerd (Post 1608984)
This is the easiest thing in the world to automate:
1) Each team submits priority list for their first event.
2) first event slots are raffled off. Some teams receive their second pick instead of their first
3) In the time between the first event and second event raffles, teams can adjust their priority list for the second event. We give the option to choose to be on a "waiting list" for a filled event.
4) Rinse and repeat

This would be a much more equitable system than "may the fastest clicker win" (or perhaps, may the fastest-coded-registration-bot win).

Okay, how does a team get to decide if they are put on the waitlist for the first event, or open registration for the 2nd? What if they only want to be on a waitlist that is X teams long or smaller? Where does a team go if they only have one viable event and it fills?

The current system isn't very "fair", but being able to make these decisions in real time has some benefit that we at least have to think about.

bkahl 26-09-2016 10:09

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Foster (Post 1608986)
In related news, FIRST raised the cost of the initial registration and event by $1800. (*)

(*) Outsourcing / cloud services are not the inexpensive panacea that many people think. Yea, Yea "well I can spin up an Amazon instance in just a few seconds" sounds easy as heck in a robotics forum, not so easy to do in the real world, especially if you need to manage 1000 transactions per second for any time period.
.

Not too much experience here but that's why I am asking.

Would an upgrade to the website for the registration period really cost $5,400,000+ ($1,800*3000+)? It wouldn't even have to be on the upgraded servers for the entire year. This number seems exuberantly high.

Foster 26-09-2016 10:13

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Chris is me (Post 1608987)
Okay, how does a team get to decide if they are put on the waitlist for the first event, or open registration for the 2nd? What if they only want to be on a waitlist that is X teams long or smaller? Where does a team go if they only have one viable event and it fills?

The current system isn't very "fair", but being able to make these decisions in real time has some benefit that we at least have to think about.

How complicated do you want it to be? (I didn't know that the current system will tell you how deep in the waitlist you are).

So your choice list would look like this:

If event1 available book_it
If event1 (waitlist < 10) book_it
If event2 (available) book_it
if event1 (waitlist < 25) book_it
if event2 (waitlist < 10) book_it
if event3 (available) book_it

It just goes down the list until something works and then you are done.

Write some specs up, there are about 3000 coders lurking here that would love to write a sample program on how it could work.

Chris is me 26-09-2016 10:25

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Foster (Post 1608989)
How complicated do you want it to be? (I didn't know that the current system will tell you how deep in the waitlist you are).

So your choice list would look like this:

If event1 available book_it
If event1 (waitlist < 10) book_it
If event2 (available) book_it
if event1 (waitlist < 25) book_it
if event2 (waitlist < 10) book_it
if event3 (available) book_it

It just goes down the list until something works and then you are done.

Write some specs up, there are about 3000 coders lurking here that would love to write a sample program on how it could work.

The current system tells you a green, yellow, red indication of roughly how long the waitlist is, but not an exact number. So I guess it only needs to be that specific. I guess it wouldn't be that hard to code something like this, but I'm hesitant to rely on even more custom software and logic for this sort of thing, considering the track record from FIRST recently of their web interfaces and whatnot.

Foster 26-09-2016 10:28

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by bkahl (Post 1608988)
Not too much experience here but that's why I am asking.

Would an upgrade to the website for the registration period really cost $5,400,000+ ($1,800*3000+)? It wouldn't even have to be on the upgraded servers for the entire year. This number seems exuberantly high.

It was a high guess based on needing to do a total rewrite of the existing functionality and databases by one of the big 6 IT consulting firms. From what I remember there are about 30 screens of stuff on the website for TIMs, award submission and event registration stuff. I guessed at the number of function points (Google Function Point Analysis) required and used $1800 per function point (from the last project I worked on that had that kind of transaction rates (mutual fund user facing transaction system that had to do 500 TPS) and got to just under $3.5 million. Add 50% overrun, etc and there you go.

It's an estimate, could it be done for less, sure. Point was it's going to cost money and FIRST teams are not happy what they pay now. Even if it was $300 per team (~$1 million) there would be lots of complaining.

There may be better ways to do it than brute force. (A concept that works for computer systems and also works for robots :cool: )

Thanks for the question, hope this helped.

Foster 26-09-2016 10:47

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Chris is me (Post 1608990)
The current system tells you a green, yellow, red indication of roughly how long the waitlist is, but not an exact number. So I guess it only needs to be that specific. I guess it wouldn't be that hard to code something like this, but I'm hesitant to rely on even more custom software and logic for this sort of thing, considering the track record from FIRST recently of their web interfaces and whatnot.

The nice thing is that it can be a standalone system.

Let me blue sky this for a second.

Lets use a simple scripting language like Lua. Lets assume that all the events have a 4 letter code. Users would create a Lua script and submit via text box.

Code:

If EVNA.available then EVNA.register;  -- first choice
If EVNA.green then EVEA.register; -- short wait list, that works
If ABCD.available then ABCD.register; -- lets see about event #2
if EVEA.yellow then EVEA.register; -- didn't get it, check on event 1 list
if ABCD.yellow them ABCD.register; -- rats, how long is the wait for #2
if XYZY.available then XYZY.register; -- sigh, ok what about event 3
-- nothing available that I want, just register on the first event and hope
EVNA.register; -- fingers crossed

Website just does a parse check to see if the code compiles, save to text field in data base.

On registration day, just pull the scripts out one by one and run them. I'd save the "random number" order so that if something bad happened I could re-run the same order again. (or run with a specific seed to the random function)

marshall 26-09-2016 12:04

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by bkahl (Post 1608988)
Not too much experience here but that's why I am asking.

Would an upgrade to the website for the registration period really cost $5,400,000+ ($1,800*3000+)? It wouldn't even have to be on the upgraded servers for the entire year. This number seems exuberantly high.


frcguy 27-09-2016 11:55

Re: [FRC BLOG] The Great Registration System Crash of 2016
 
Quote:

Originally Posted by Gary Dillard (Post 1608674)
Look for metal shavings in a pwm port; that's caused us to crash plenty of times.

At Chezy Champs, the field supervisor found a bumper pin in a team's MXP port. I wonder why they were having issues...


All times are GMT -5. The time now is 14:38.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi