Twitter decoding program

Well, while it is easy to read the twitter results from the FMS (after getting used to it). I was wondering about if someone has already made a program that will use the Twitter feed and organize it into a more orderly fashion?

The Twitter I am talking about is the one that is done by the FMS at the regionals.
http://twitter.com/#!/frcfms

I have some Python code that will do this.

http://www.chiefdelphi.com/forums/frcspy.php?
http://www.chiefdelphi.com/forums/showthread.php?t=104057
http://www.chiefdelphi.com/forums/showpost.php?p=1137515&postcount=4

Decided to make my own twitter decoding program that tries to gather the average points for the teams.

It outputs a csv file with the teams in one column, and the average fouls, score, etc points for each match that team was in.

main.cpp (5.29 KB)
twittercurl.cpp (1.36 KB)
twittercurl.h (459 Bytes)
exampleResults.xls (22 KB)


main.cpp (5.29 KB)
twittercurl.cpp (1.36 KB)
twittercurl.h (459 Bytes)
exampleResults.xls (22 KB)

Does anyone know:

  1. why there was (and is) no Twitter data for Traverse City, and

  2. is this data lost forever ?

It seems this also holds true for the Oregon regional…

The twitter feed isn’t the only source of data about match results. Hereis the traverse city data, and here is the Oregon regional data.

While those sites provide the match results, the twitter feed provides this as well as a break down of how the points were scored by each alliance (IIRC, bridge points, foul points, hybrid points, and tele-operated points). I have yet to find another source for this kind of data.

Originally Posted by Kpchem

[quote]Originally Posted by quinxorin

The twitter feed isn’t the only source of data about match results. Here is the traverse city data, and here is the Oregon regional data.

**While those sites provide the match results, the twitter feed provides this as well as a break down of how the points were scored by each alliance **(IIRC, bridge points, foul points, hybrid points, and tele-operated points).[/quote]

Correct.

Also, see attachment. Does anyone know how the 205 TeleOp points number was computed for Team 67 at Waterford? Unlike the Hybrid and Bridge points, it does not seem to be equal to the total of the alliance TeleOp points scored in the 12 qual matches by the alliance in which Team 67 was a member.

*





It appears to be the sum of the twitter Teleop points and foul points. This raises the question as to whether the FMS is using the correct tiebreaker for ranking (teleop hoop points).

It would be great if someone with a working twitter parser could compare the ranking results when QS, hybrid, and bridge scores are all tied, to see which is being used as the tiebreaker. Looking at the Alamo regional, there were 2 cases of this 2721 tied with 4162 and 2583 tied with 2969. Unfortunately, the magnitude of difference between then team’s TP were large enough that including foul points would be unlikely to change the ranking. I did not see any such ties at Kansas City, BAE, or Smokey Mountain, but there’s still a lot of events I didn’t look at.

I did ask about this on Q/A.

I have now added least squares solving in order to better find the “impact” each team had on the score. The results are now much more accurate, and better predict the results in the final match( the total of the average scores from each team member is around ± 5 from the total score for that team(excluding outliers like team 93)).

( I am only using data from the qualifying matches to predict the final rounds)

While of course hand recording the individual scores of each team would be more accurate, this should be a great help in determining which teams provide the most “positive” points to help in the finals.

exampleResults.xls (24 KB)
main.cpp (7.05 KB)


exampleResults.xls (24 KB)
main.cpp (7.05 KB)

Just a heads-up: the Twitter data has completeness issues:

http://www.chiefdelphi.com/forums/showpost.php?p=1144595

http://www.chiefdelphi.com/forums/showpost.php?p=1144727

Oh, and a couple questions: What linear algebra library are you using, and is there a reason you are using SVD?

The math library is Eigen.

The reason why I am using SVD is because that is how Eigen’s tutorials describe how to perform a least squares operation(http://eigen.tuxfamily.org/api/TutorialLinearAlgebra.html#TutorialLinAlgLeastsquares)

I don’t think missing scores is going to be that bad. As long as most of the scores are posted, there should be enough data to get a reasonably accurate result. If anything, the main problem with my model is that it is very limited, not counting defense, autonomous, etc

SVD is just one (of many) ways to compute least squares. The choice of the “best” method to use (like choosing the right tool for a job) depends on the problem domain.

For this application, LDLT would be far faster* and plenty accurate.

I don’t think missing scores is going to be that bad. As long as most of the scores are posted, there should be enough data to get a reasonably accurate result.

Be aware: There are already two events (Oregon and Traverse City) for which data is missing for the entire event. No guarantee that won’t happen for more events as the season rolls on. Not counting these two events, over 12% of the data is missing for the other events.

  • For computing least squares for single events, the matrix is small enough that the time difference is probably not even noticeable. But if you ever intend to expand the functionality to compute least squares for a matrix containing all the data from an entire year’s worth of events, I believe there would be a very noticeable difference in speed. If you have the time and are so inclined, it would be interesting if you would try SVD with 2011’s data and see what the computation time is. For reference, LDLT takes 12 seconds on my 8-year-old PC to do least squares on a matrix populated with all the qual data from all the events in 2011





Twitter data is missing entirely for the San Diego, Oregon, Traverse City, Sacramento, and Utah events.

For this application, LDLT would be far faster* and plenty accurate.

The alliance selection algorithm in the qualification match scheduling software used for FRC events pretty much guarantees that the design matrix A (i.e. Ax≈b) will be full rank and well-conditioned. This means that forming the normal equations (Px=ATAx=ATb=S) and solving with Cholesky decomposition (LLT or LDLT) will give excellent numerical stability and accuracy and be far faster than other methods (and require much less memory). Furthermore, the normal equations can be formed directly in one pass (without the need to form the design matrix and multiply it by its transpose) when the raw data is read and parsed.

Regarding Sacramento, I talked with the field crew/FTA and their Twitter posts were apparently being blocked by the firewall. If the same happens at SVR, we will have a team member manually copy the data down so it won’t be lost.

It is admirable and I applaud your team for being willing to do that.

But in all seriousness, why should that even be necessary?

This data has significant statistical value and historical interest. The data is already in electronic form. Does anyone know: Is there a compelling reason why the data is being discarded instead of being saved at the point of origin?