I am getting ready to update The Blue Alliance to have 2010 data for all teams. However, FIRST has changed their website, and I can’t figure out how to find the URL for a team’s information page.
This makes it appear that you can find a FIRST team’s page by setting the “tpid” parameter to be equal to “28456 + Team Number”. This suggests that Team 111’s page should be https://my.usfirst.org/myarea/index.lasso?page=team_details&tpid=28567, but this is in fact the page for Team 115.
Is there any way to turn a team number into a USFIRST.org team information page URL without any prior knowledge of how their database is structured? Pat Fairbank seems to have figured it out for frclinks.com, but I am not sure how he has done it.
FRCLinks uses a Javascript redirect. I am pointing at FRCLinks for links to team pages right now on TBA, but I need to do a full scrape of FIRST’s pages to update Team Names to be accurate now. Wget doesn’t follow Javascript redirects - I may need to bake up something a bit fancier to either parse these out of FRCLinks, or parse them out of FIRST’s team data when I am scraping event attendance.
I wish this were easier.
I was able to get this URL for listing Teams, but it’s hardcoded to max out at 250 teams listed. I was hoping to get every single team on the page at once, and then scrape out all the team URLs in one go. This approach won’t work on its own.
Attached is a CSV with every team competing in 2010’s team number and tpid, which is the number FIRST uses to refer to the team. Hopefully this will be useful to someone in the future.
“window.location.?=.?"(.*)"” as a regex on the content of the frclinks is a pretty simple way of grabbing Pat’s redirect. That is how frcfeed is doing it. Just grab the content of group 1.
I agree. I talked with Pat and decided it was easiest to just re-scrape the data from FIRST myself, since it required minimal modifications to existing TBA scraping scripts. Pat’s service is great, and I’m going to keep using it on TBA where it makes sense.