I was interested in both FRC and learning the R language, so I created an R package (firstapiR) to download data from the FIRST API server and convert it to R data frames. Perhaps others might find it interesting.
I installed the package. Right now, I’m a little too busy but I’ll try it later.
edit:I’ve used R some but am no expert. Last year I used clustering in R via Tableau to separate the wheat and chaff. I also liked making Shiny apps and tried to make a scouting app, but didn’t end up using it. Most data I pulled from TBA or directly from the FIRST page. I didn’t use the API, just direct import. Having some tools to do this in R will be useful. I have been working on getting better at making python programs to access the TBA API, but will give this a try. I don’t yet know what will be better with the students, but thanks for adding this.
ngreen, let me know if you run into any glitches with the package, either here or in the issues section of the github repository. I was only able to test it on a couple different computers.
I considered both R and Python with the Pandas package for developing this package, and I initially played around with the FIRST API in both languages. I think Python is better as a general purpose scripting language and I found the documentation for Python to be much easier to understand than R’s documentation. But for some random reason, early on, I made more progress in R than in Python.
I’m experimenting with a Shiney app as well. I think Shiney could work well for building an application to present our scouting data to the drive crew – I’m going to see if our scouting group is interested in pursuing that project. If we come up with any interesting R data visualization techniques we’ll post them on the github webpages for the firstapiR package.
Thanks for starting the API! With the right community for R, we can really bring the beginnings of data science to the masses. Machine Learning for alliance selection? K-Means for determining balanced alliances? We can really go a step further here than what exists today.
I’ve been doing a lot of work with Rshiny for work lately and it is easily the best interface for rapid prototyping of R data applications. Highly recommended, in 20 minutes you can have a quality interface. If you get proficient at it there are a lot of great job opportunities out there.
After R the next leap forward is Scala, but this is a great place to be today.
Thanks for posting this. I had been thinking about something along R analysis of match data and a Shiny based scouting system. This is perfect.
One question, I looked at the whole API project on Teamforg and joined the project. But it was not clear if that automatically would get me an authorization token, or if there was some other way one had to go about requesting one.
Here’s a re-statement of the question to make it accessible to R gurus who may not be familiar with OPR computation.
I want to know if R can solve for the least-squares solution to the overdetermined system of linear equations [A][x] = **, given the attached [A] and ** files.
An equivalent question framed in the language of statistics would be: [A] is a table (in plaintext sparse format) of 17842 sets of values for 2696 independent variables, and b is a column of 17842 corresponding values for the dependent variable. Can R read the attached files and compute multiple linear regression?
It’s been a few months, but I’m pretty sure that I received my token by joining the teamforge project. I received an email from Alex Herreid a few days later at the email address I used to join the project. The email contained my username and token.
According to the TeamForge page, they’ll only be reviewing requests for tokens monthly now that the competition season is over, but they’ll review weekly once build season starts up again. It may be a few weeks before you hear from Alex.
I’m intrigued. I’m relatively new to FIRST, so bear with me. OPR refers to offensive power rating, correct? You’re referring to using the Choleski decomposition to produce an estimate of how many points any single team should be expected to contribute to an alliance score, based on past performance?
There’s an R package that supposedly does this – it’s called optR and it’s available on the CRAN repository. It has a function called choleskilm that should do the trick. Of course the raw data has to be shaped into a positive definite matrix first.
The size of the matrix could be a problem. The description that I found on this method for calculating OPR focused on using this method for data from a single competition – generally no more than a 100 x 100 matrix. But 2700 x 2700? I’ll experiment with smaller data sets over the weekend and see if I can figure out what the computation time will be.
You’re referring to using the Choleski decomposition
I did not explicitly mention Cholesky, but yes that factorization can be used to factor the Normal Equations matrix
to produce an estimate of how many points any single team should be expected to contribute to an alliance score, based on past performance?
Yes. A very rough estimate, since the model assumptions are not very realistic.
There’s an R package that supposedly does this – it’s called optR and it’s available on the CRAN repository. It has a function called choleskilm that should do the trick.
For small matrix associated with a single event
Of course the raw data has to be shaped into a positive definite matrix first.
… AND the Aij design matrix (attached to post9 in this thread) must be read by R before it can be used to compute the normal equations matrix N.
The size of the matrix could be a problem.
A big problem
The description that I found on this method for calculating OPR focused on using this method for data from a single competition – generally no more than a 100 x 100 matrix.
Yes.
But 2700 x 2700?
Full-matrix Cholesky is roughly proportional to O(n^3).
(2696/40)^3 = 306182
Big problem. Unless you use sparse matrix algorithms.
I’ll experiment with smaller data sets over the weekend and see if I can figure out what the computation time will be.
I’ve had a request for a downloadable version of the firstapiR package, so that there is another option for obtaining firstapiR that doesn’t require the devtools package and installing from github. *(https://irwinsnet.github.io/)
The file downloads are hosted on my MediaFire account. I apologize for the adds, but I didn’t want to post publicly available links to cloud drives where I keep personal files (and I’m too cheap to pay for add-free downloads :)).
I have no idea if this is the recommended way to user R for this sort of calculation, but my solution is posted at https://irwinsnet.github.io/opr.html.
My method requires 13 seconds on a Surface 4 with an i5 process and 8 Gb of memory. It actually takes more time to create the A matrix (9 seconds) than it does to solve for the OPRs once the A and B matrices are prepared (3 seconds). The remaining second is required for reading the data and preparing the B matrix.
The results and source code are available at the link above.
Cut 13 seconds in half? I’m excited that sirwin got it to work at all on that data set! At 13 seconds, the WOPR rankings can be recalculated between matches before the first verse of “Cotton eye Joe” gets played. :rolleyes:
Good job sirwin on the code, and an even better write up of how you did it.
With respect to speeding up the calculation, I had a thought.
It takes my algorithm 9 seconds to produce the 2696 x 2696 A matrix – but updating the A matrix with new match data should be nearly instantaneous. For every new match, we just have to add 1 to 18 different elements in the A matrix (all other A matrix elements will remain unchanged). Each element of the a matrix is directly accessible by team numbers. So it takes 13 seconds to calculate world OPR from scratch, but as long as we retain the A matrix, it should only take about 5 seconds to update world OPR with new match data. That’s 2 seconds for updating the A and B matrices and 3 seconds for solving for OPR.
Of course, this is assuming that R’s underlying C code doesn’t do anything stupid, like extracting an array element by going through every array element until it gets to the right one. Also, the A matrix would need to be maintained in memory – disk access would slow things down.
I like to call that symmetric positive definite 2696x2696 normal equations matrix the N matrix… to distinguish it from the 17842x2696 dichotomous design matrixA.
Ax = b … overdetermined system of linear equations (has no exact solution)
N = A’*A
d = A’*b
Nx = d … linear system of normal equations (whose solution is the least squares solution for Ax=b)