Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   General Forum (http://www.chiefdelphi.com/forums/forumdisplay.php?f=16)
-   -   The Blue Alliance - Data Loss (http://www.chiefdelphi.com/forums/showthread.php?t=105398)

Greg Marra 04-04-2012 02:50

The Blue Alliance - Data Loss
 
UPDATE: We experienced some data loss on The Blue Alliance. I'm working to restore the situation, and it should be restored entirely by the end of April 4th, 2012.

Those curious about technical details can read more in this Google Doc, where I am recording what I am doing to fix it.

Most of 2012's data is recovered already. Not all events have been, so some pages will continue to generate errors. Whatever doesn't heal itself with cronjobs will get fixed tomorrow night.

Thanks,
Greg

Greg Marra 04-04-2012 03:00

Re: The Blue Alliance - Data Loss
 
my.usfirst.org appears to be down, which is preventing The Blue Alliance from rescraping data from FIRST.

http://www.usfirst.org/whatsgoingon fails to load its iframe, for example.

Greg Marra 04-04-2012 03:33

Re: The Blue Alliance - Data Loss
 
At this point, nearly all data has been recovered. 2012 Events and Matches are partially missing. 2011 Events and Matches are entirely missing. This data will be recovered when FIRST's pages come back online.

Some teams who have not competed in 2012 have lost their details like nickname. These will be restored when FIRST's pages come back online.

Will write more about backup measures we should take in the future in the document, but not tonight.

Greg Marra 05-04-2012 01:29

Re: The Blue Alliance - Data Loss
 
FIRST's servers are rejecting our scraping attempts from our production server. I've emailed frcteams@usfirst.org to attempt to resolve the issue. Does anyone know anyone else I can get in touch with?

"Google access to this page has been blocked due to repeated failure to respect robots.txt"

Mark McLeod 05-04-2012 07:56

Re: The Blue Alliance - Data Loss
 
1 Attachment(s)
What are you violating? The 10 sec page request delay or something else?
You'll want to assure them you'll respect current settings and future changes in robots.txt.

I've talked to their IT dept before, but normal changeover almost assures that it's different people by now.
New employees are publicized in the FIRST Newsletter. I've attached a list, but beware, the earlier people may be gone or re-positioned within the organization as time passed.

Greg Marra 05-04-2012 17:09

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by Mark McLeod (Post 1154493)
What are you violating? The 10 sec page request delay or something else?
You'll want to assure them you'll respect current settings and future changes in robots.txt.

I got in touch with the FIRST web team, and we're going to make some changes to how we request pages. Things should be back up and running later tonight :-)

Tetraman 05-04-2012 20:27

Re: The Blue Alliance - Data Loss
 
Thank you for keeping such a website up and running. It's a great tool to use and FIRST wouldn't be the same without it.

Barry Bonzack 05-04-2012 23:22

Re: The Blue Alliance - Data Loss
 
Thanks Greg for all your hard work. The Blue Alliance is a true asset to the community.

Greg Marra 06-04-2012 02:58

Re: The Blue Alliance - Data Loss
 
I believe all data is restored now. We lost some metadata that had been manually edited, but the Events and Matches should be back. We'll monitor our logs for errors in the next few days, and fix anything else that crops up.

We're now investigating backup options :-)

Akash Rastogi 06-04-2012 03:09

Re: The Blue Alliance - Data Loss
 
Thanks for your hard work Greg! I needed this back up ASAP to do some scouting.

Greg Marra 06-04-2012 17:38

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by Akash Rastogi (Post 1154851)
Thanks for your hard work Greg! I needed this back up ASAP to do some scouting.

:D Glad we managed to fix everything midweek instead of over a weekend.

What event is 2012oj? We don't seem to have it, and I can't figure out what it is. People are trying to get to it though, and it's throwing errors.

Billfred 06-04-2012 17:58

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by Greg Marra (Post 1155012)
What event is 2012oj? We don't seem to have it, and I can't figure out what it is. People are trying to get to it though, and it's throwing errors.

Considering this throws an error, I suspect there is no such event.

Greg Marra 06-04-2012 18:57

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by Billfred (Post 1155017)
Considering this throws an error, I suspect there is no such event.

I'll just put another thing in the list of "why we should make proper 404 pages". Thanks!

Littleboy 06-04-2012 21:39

Re: The Blue Alliance - Data Loss
 
Many team names have been lost and replaced with just the number.

~Cory~ 06-04-2012 22:13

Re: The Blue Alliance - Data Loss
 
Quote:

3 Apr 2012, 21:27 - Reporting user replies that methods were unknowingly called during probe of security hole. Thus, data was lost in prod.
This sounds very unethical, especially with TBAv4 being open source. Said user should have run tests on a local machine.

Can you expand a little more on this portion?

Greg Marra 07-04-2012 22:20

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by Littleboy (Post 1155074)
Many team names have been lost and replaced with just the number.

Yea, we don't have fully up to date team names for teams that didn't compete in 2012. If you find any missing for 2012 teams, let me know.

For pre-2012 teams, I've opened an issue to fix this (we've never done it right): https://github.com/gregmarra/the-blu...nce/issues/108

Quote:

Originally Posted by ~Cory~ (Post 1155082)
Said user should have run tests on a local machine.

Can you expand a little more on this portion?

There was no malicious intent. This was a combination of curiosity and us having a big bug in our configurations. Everyone learned something.

dodar 07-04-2012 22:22

Re: The Blue Alliance - Data Loss
 
Team 801 is missing name and sponsors.

Greg Marra 09-04-2012 01:54

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by dodar (Post 1155355)
Team 801 is missing name and sponsors.

Hmm, trying to figure out why we're not re-grabbing these right. Thanks for the pointers. FIRST's page structure makes scraping this information tricky (you can't just look up a team with it's team number), so I think something may have changed in the mean time.

Will dig in later this week, thanks.

kmehta 09-04-2012 02:27

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by Greg Marra (Post 1155739)
FIRST's page structure makes scraping this information tricky (you can't just look up a team with it's team number), so I think something may have changed in the mean time.

You could use frclinks.com/t/####.

The Lucas 09-04-2012 03:26

Re: The Blue Alliance - Data Loss
 
Oddly enough, it looks like you only have team info for defunct team numbers like 40, 47 & 65. Wonder if that is stale backup data because it is not overloading them in failed scrapes.


All the team info TBA uses is available in one (easy to parse) tab deliminated page
https://my.usfirst.org/frc/scoring/i...?page=teamlist
It would be easier to just scrape that page. Plus, that is only 1 page request instead of thousands.

Alternatively, you could ask 358 for its database (especially if you want to fill in defunct team data)

Great job getting it back up, Greg!

Eugene Fang 09-04-2012 03:47

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by The Lucas (Post 1155757)
Oddly enough, it looks like you only have team info for defunct team numbers like 40, 47 & 65. Wonder if that is stale backup data because it is not overloading them in failed scrapes.


All the team info TBA uses is available in one (easy to parse) tab deliminated page
https://my.usfirst.org/frc/scoring/i...?page=teamlist
It would be easier to just scrape that page. Plus, that is only 1 page request instead of thousands.

Alternatively, you could ask 358 for its database (especially if you want to fill in defunct team data)

Great job getting it back up, Greg!

Just found https://my.usfirst.org/frc/scoring/i...page=eventlist too!
This is amazing...

qzrrbz 09-04-2012 09:52

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by EugeneF (Post 1155760)

Don't believe the headers on this one though! Seems the "num_teams" field isn't there. :confused:

Joseph Bisch 09-04-2012 09:54

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by The Lucas
All the team info TBA uses is available in one (easy to parse) tab deliminated page
https://my.usfirst.org/frc/scoring/i...?page=teamlist
It would be easier to just scrape that page. Plus, that is only 1 page request instead of thousands.

Quote:

Originally Posted by EugeneF (Post 1155760)

While those would reduce the page requests, those seem to only provide data for 2012. I tried tacking on "&year=2011" and "&season_FRC=2011" to both URLs, but neither provided 2011 data. Is there something I am missing? :confused:

I think we should go ahead with implementing it and use it when we need data for the current season.

The Lucas 09-04-2012 10:30

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by Joseph Bisch (Post 1155812)
While those would reduce the page requests, those seem to only provide data for 2012. I tried tacking on "&year=2011" and "&season_FRC=2011" to both URLs, but neither provided 2011 data. Is there something I am missing? :confused:

I think we should go ahead with implementing it and use it when we need data for the current season.

You are not missing anything. That page is for the FMS so it only needs this year's data. Again, I recommend talking to 358 for any historical team data.

Other FMS pages you might find useful
2012 Event List
Sample Event Team List ('12 CMP)
You can get to the event team list by using index.lasso?page=event_teamlist&ID_event=<ID on the Event List Page>
I have been using these pages for a for scouting/stat purposes since they parse easier in Excel than the /myarea/ ones linked from the FRC regional page (copypasta a team list from there each team will have their town will show up in its own row, and it may be over multiple pages).

Greg Marra 09-04-2012 20:20

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by The Lucas (Post 1155757)
All the team info TBA uses is available in one (easy to parse) tab deliminated page
https://my.usfirst.org/frc/scoring/i...?page=teamlist
It would be easier to just scrape that page. Plus, that is only 1 page request instead of thousands.

:ahh: !!

This page is amazing!!! Issue opened to switch our scrapers to this - it's so much simpler!

I made a FIRST Wiki page to trade notes on how to scrape FIRST pages. Borrowing Pat Fairbank's TPID scraper made scraping Team pages possible before, but I don't think that's a widely known technique. It needs more love, but an OK start.

Eugene Fang 10-04-2012 01:39

Re: The Blue Alliance - Data Loss
 
I'm currently super confused about the event code for the Michigan State Championship. http://frclinks.frclinks.com/ and https://my.usfirst.org/myarea/index....=2012&event=gl both suggest it should be "GL"

However http://www2.usfirst.org/2012comp/eve...edulequal.html and http://www2.usfirst.org/2012comp/eve...edulequal.html show similar but conflicting data, both for Gull Lake District.

I'm guessing Gull Lake (MIGL) accidentally started posting to GL, and they changed half way through? With GL being the Michigan State Chapmionship?
Can anyone shed some light on this discrepancy? Thanks.

The Lucas 10-04-2012 09:19

Re: The Blue Alliance - Data Loss
 
Quote:

Originally Posted by EugeneF (Post 1156124)
I'm guessing Gull Lake (MIGL) accidentally started posting to GL, and they changed half way through? With GL being the Michigan State Chapmionship?
Can anyone shed some light on this discrepancy? Thanks.

The recently updated MSC teamlist suggests that it is still GL since its query includes "&year=2012&event=gl"

Perhaps, this is the reason why MIGL is so odd among event codes. It is the only 4 digit event code and doesn't follow FiM's usually standard of basing the event code off of the Michigan county where is is held (it would probably be KZ for Kalamazoo County).


All times are GMT -5. The time now is 00:14.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi