Just a preview with data from two hourly snapshots, but I’m trying to look at how well Chief Delphi represents FRC.
Python script running hourly on a RaspberryPi tracks active CD users, how many times they’re seen, geolocates their teams, and calculates their team’s average rank in the current season.
Neat, kind of interesting to see how limited the scope of FRC teams on CD is too. It’d also be neat to see a ranking of the average number of active users on CD categorized by team number.
This was only 12AM / 1AM EST, it’s quite a bit busier during the day. I have data for users per team too, will share once I’ve collected data for longer.
This is super interesting data and I would love to see it expanded to, perhaps, all active posts from stop build onwards. I have always assumed that CD posters generally made up a much stronger subset of teams than the general FRC population, but to see it in visual data is even better.
I think that understanding how teams on CD differ from the overall set of FRC teams can help us understand why opinions popular on CD may not be as popular within the overall FRC population. Also, this may give some insight into why CD discussions about hot button issues (i.e. “Mentor-run” vs “Student-run”, Regionals vs Districts, Stop Build or Not) are often more one-sided on ChiefDelphi than the larger FIRST community.
If possible, could you make another chart (or upload the data so I can… haha) that uses percentiles instead? Like for instance, if I finish 7 of 40 in my first event and 8 of 62 at my second event, on that chart, my average will go down. If we are going on percentiles, my average would go up.
Plus, at an IN event, we couldn’t have fallen past 38 at any event. On that chart, there would be like 5 or 6 teams that we’d be above by default if we came last at our only event. Does that make sense?
I’m interested to see what sorts of results you would find if you looked at chief delphi active users/teams from say… the past ten years. Does the average team on CD get stronger or weaker? How has CD growth correlated to FIRST growth? It’d also be neat to track individual users and how their team affiliation shifts.
I’m also adding district / non-district fields for teams.
Currently the script is parsing the CD front page and pulling out user ID numbers to see who’s online. I can look at the historical performance of those teams, but dealing with users active over the last 10 years seems like it would be quite a bit more difficult (do I scrape all 43000+ CD user profiles to get their last activity?).
It wouldn’t need to be ten years, even just the change between 2013 and now would be interesting.
I wonder if you could just sort a database of CD posts by year, then sort each year group by team number, any team that meets a certain threshold of posts can be considered “active.”
Make a graph for each year similar to the one that started this thread and play a relaxing game of spot the difference (CD version).
This might be easier than going through user profiles individually. But I admit, I don’t know much about auto scripting or any of these kinds of techniques.
Am I correct in thinking that if someone did not post between 12 AM and 1 AM EST, their team would not be in the chart posted?
It looks like the chart posted as a bit-mapped graphic format (.jpg). Can it be posted in a format that can be searched, say pdf? If you expand it to include more teams, it will become difficult to search manually and at some point, the text might become too small to read even when zoomed in if it is a bit-map format.
As others have indicated, this gives great insights into the CD community as a subset of the FRC community. One has to wonder if the skew towards “the top 50 percentile” of teams is a cause or effect (or both) of team members being active on CD.
It would also be interesting to see what percentage of each ranking is active on CD (just a bar chart, no team numbers needed). It is pretty obvious that a very low percentage of the low ranking teams are active here but I also suspect that this is also true of the very top level teams, with a few exceptions.
You don’t have to POST in the time frame, just be online on CD when the bot pulls the list of online users. I don’t want to continuously scrape CD, so the script only runs once an hour, and the chart only had data from the first two runs.
I’m making the charts in Tableau, so there will be an interactive version.
I’ll think about how to do percentages of rankings, seems like it shouldn’t be too hard!
If you use a lot more data (ie from several years) then the team numbers would likely disappear, but the overall shape of the curve would be more meaningful.
it may be interesting to see how many events each team has done too. Most teams on CD (outside of districts) id assume would play more than the average team (2-3 regionals before CMP).
I can add this. To compare the CD population to FRC as a whole for this, I guess I’ll pull event data for all teams from TBA. Shouldn’t be too difficult.