Number of team In Minnesota and globally in 2010?

We are preparing a presentation and would like some historical data to compare from the first year our team competed to present some of you history buffs may already have a way to extract both 2010 and 2018 numbers. We are looking for Number of teams.

In 2018 Minnesota had 214 teams and globally 3616 teams.
In 2010 Minnesota had ??? teams and globally ??? teams.

Pulled from TBA just now:

In 2010, MN had 99 teams, globally 1807 teams.

I new TBA would have it but didn’t now how to get it. Maybe a lesson on that would be good for future data collection and for others.

Thanks

As far as I know, you can’t get this easily from the user interface; I got it by getting all the 2010 teams from the API. You need to get an API key. Then I used this shell script which I have at the FIRSTmap Scraper GitHub repository. I put “2010” in a file named YEAR, and my TBA key in one named TBA-auth.

#!/bin/bash 
# get_lists 
api="https://www.thebluealliance.com/api/v3/teams/" 
auth="?X-TBA-Auth-Key=$(cat TBA-auth)" 
year=$(cat YEAR) 
i=0; 
go=1 
TZ=utc date -Iseconds > data/team-time 
while /bin/true 
do	wget -O"data/teams.$i" "$api$year/$i$auth" 
	go=$(grep -c -s key "data/teams.$i") 
	if  $go -ge 1 ]  
	then	echo "got data/teams.$i" 
	else	echo "empty file for data/teams.$i; terminating" 
		rm "data/teams.$i" 
		exit 0 
	fi 
	i=$(expr $i + 1) 
done 

This creates a number of files in the data subdirectory (made separately). I got the number of Minnesota teams with something like “cat data/teams.* | grep -c 'state/prov.Minnesota’" and the total number of teams with something like "cat data/teams. | grep -c key.:”.

OBTW, the line beginning with TZ isn’t necessary for this function; it is used by other parts of the scraper.

If you end up being interested in another year or other regions in the future, a few months ago I pulled all of the data for all of the regions 1992-present. I also took it from TBA, using a pretty simple python script. The results are attached. My data is a little bit different from what Gus posted; not sure exactly why.

edit: I have 105 teams in MN in 2010 with a total of 1806 teams competing worldwide.

region_timeline_.csv (6.8 KB)

Hmmm… I show 104 teams in MN for 2010.

1816
2052
2129
2169
2175
2177
2181
2207
2220
2225
2227
2232
2239
2264
2450
2470
2472
2479
2480
2491
2498
2499
2500
2501
2502
2503
2508
2509
2511
2513
2515
2518
2525
2526
2529
2530
2531
2532
2535
2538
2545
2549
2561
2574
2606
2654
2667
2705
2823
2825
2845
2846
2847
2855
2861
2879
2883
2957
2977
2987
2989
3007
3018
3023
3026
3036
3038
3042
3054
3055
3056
3058
3081
3082
3090
3100
3102
3122
3130
3184
3202
3206
3212
3244
3261
3263
3267
3275
3276
3277
3278
3290
3291
3292
3293
3294
3297
3298
3299
3300
3312
3313
3367
3407

Any idea where the difference is between your data set and mine?

My guess would be that FIRST has not been terribly good in the past with cleaning data. I’ve seen multiple instances where there are state abbreviations that are different for the same state. (MI, Mi, mi) or where there’s no abbreviation and only a state name. Or the state name includes the country or just the abbreviation.

Basically, it’s dirty data.

It’s more consistent now, but 2010 data isn’t very reliable. I would definitely check between “MN”, “Mn”, and “Minnesota”.

Dirty data doesn’t even begin to describe it. Sure, depending on what year the data is, you can get state info a little differently. That’s dirty.

But take, for example, team 4626 this year. They registered for the Medtronic Foundation Regional, but didn’t show up. They never played a match, weren’t even placed on the schedule. But they still show up in the team list for the event even now, months after we know they weren’t at the event. And it was their only event this year. That’s just plain inaccurate data at this point, something that really should have been corrected.

Your set is missing team 2512

My independent data (non-TBA) shows 105 MN teams and 1808 total teams for 2010.
It was vetted at the time of collection, but as Jon says it’s only as good as the original official FIRST reporting tempered with personal knowledge.

Good catch. I wonder how I missed them… probably a copy/paste error on my part.

Yup. The json endpoint I use for getting first data from firsts site doesn’t distinguish from active or inactive. So I have taken to checking that the events array isn’t empty. This isn’t even a for sure thing either.

You have to keep in mind that what FIRST is concerned with is whether or not they got their $5000 from the team, not whether they actually played an event. So removing that team would create a discrepancy in their system between the number of teams and the number of $5000 registration fees they received.

I did this in a few minutes this morning, and only checked for “Minnesota” in the state/province field. If there are some MN or Mn entries, I missed them. The data I pulled are still on disk at the house; I’ll verify this evening.

Edit: I DO have team 2512.

I did not have the following six states with “MN”.

  • 2535
  • 2561
  • 2845
  • 3261
  • 3263
  • 3312

This brings my list to 105, in concurrence with Jon Stratis and Marc McLeod.