We are preparing a presentation and would like some historical data to compare from the first year our team competed to present some of you history buffs may already have a way to extract both 2010 and 2018 numbers. We are looking for Number of teams.
In 2018 Minnesota had 214 teams and globally 3616 teams.
In 2010 Minnesota had ??? teams and globally ??? teams.
As far as I know, you can’t get this easily from the user interface; I got it by getting all the 2010 teams from the API. You need to get an API key. Then I used this shell script which I have at the FIRSTmap Scraper GitHub repository. I put “2010” in a file named YEAR, and my TBA key in one named TBA-auth.
#!/bin/bash
# get_lists
api="https://www.thebluealliance.com/api/v3/teams/"
auth="?X-TBA-Auth-Key=$(cat TBA-auth)"
year=$(cat YEAR)
i=0;
go=1
TZ=utc date -Iseconds > data/team-time
while /bin/true
do wget -O"data/teams.$i" "$api$year/$i$auth"
go=$(grep -c -s key "data/teams.$i")
if $go -ge 1 ]
then echo "got data/teams.$i"
else echo "empty file for data/teams.$i; terminating"
rm "data/teams.$i"
exit 0
fi
i=$(expr $i + 1)
done
This creates a number of files in the data subdirectory (made separately). I got the number of Minnesota teams with something like “cat data/teams.* | grep -c 'state/prov.Minnesota’" and the total number of teams with something like "cat data/teams. | grep -c key.:”.
OBTW, the line beginning with TZ isn’t necessary for this function; it is used by other parts of the scraper.
If you end up being interested in another year or other regions in the future, a few months ago I pulled all of the data for all of the regions 1992-present. I also took it from TBA, using a pretty simple python script. The results are attached. My data is a little bit different from what Gus posted; not sure exactly why.
edit: I have 105 teams in MN in 2010 with a total of 1806 teams competing worldwide.
My guess would be that FIRST has not been terribly good in the past with cleaning data. I’ve seen multiple instances where there are state abbreviations that are different for the same state. (MI, Mi, mi) or where there’s no abbreviation and only a state name. Or the state name includes the country or just the abbreviation.
Dirty data doesn’t even begin to describe it. Sure, depending on what year the data is, you can get state info a little differently. That’s dirty.
But take, for example, team 4626 this year. They registered for the Medtronic Foundation Regional, but didn’t show up. They never played a match, weren’t even placed on the schedule. But they still show up in the team list for the event even now, months after we know they weren’t at the event. And it was their only event this year. That’s just plain inaccurate data at this point, something that really should have been corrected.
My independent data (non-TBA) shows 105 MN teams and 1808 total teams for 2010.
It was vetted at the time of collection, but as Jon says it’s only as good as the original official FIRST reporting tempered with personal knowledge.
Yup. The json endpoint I use for getting first data from firsts site doesn’t distinguish from active or inactive. So I have taken to checking that the events array isn’t empty. This isn’t even a for sure thing either.
You have to keep in mind that what FIRST is concerned with is whether or not they got their $5000 from the team, not whether they actually played an event. So removing that team would create a discrepancy in their system between the number of teams and the number of $5000 registration fees they received.
I did this in a few minutes this morning, and only checked for “Minnesota” in the state/province field. If there are some MN or Mn entries, I missed them. The data I pulled are still on disk at the house; I’ll verify this evening.
Edit: I DO have team 2512.
I did not have the following six states with “MN”.
2535
2561
2845
3261
3263
3312
This brings my list to 105, in concurrence with Jon Stratis and Marc McLeod.