Log in

View Full Version : R Package for Downloading FIRST API Data


sirwin
22-09-2016, 21:53
I was interested in both FRC and learning the R language, so I created an R package (firstapiR) to download data from the FIRST API server and convert it to R data frames. Perhaps others might find it interesting.

Here is the link to the github repository that contains the package. (https://github.com/irwinsnet/firstapiR)

There are instructions on how to set up your R environment and download and install the package in the README section of the repository.

Stacy Irwin
Business Operations Mentor
FRC 1318
Issaquah Robotics Society

Foster
24-09-2016, 07:01
Thanks for doing this! Easy access to data could spur a new generation of stat mavens. Maybe in 2017 we will see a new OPR.

ngreen
24-09-2016, 12:18
I installed the package. Right now, I'm a little too busy but I'll try it later.

edit:I've used R some but am no expert. Last year I used clustering in R via Tableau to separate the wheat and chaff. I also liked making Shiny apps and tried to make a scouting app, but didn't end up using it. Most data I pulled from TBA or directly from the FIRST page. I didn't use the API, just direct import. Having some tools to do this in R will be useful. I have been working on getting better at making python programs to access the TBA API, but will give this a try. I don't yet know what will be better with the students, but thanks for adding this.

sirwin
24-09-2016, 19:47
Thanks for the positive feedback.

sirwin
25-09-2016, 10:10
ngreen, let me know if you run into any glitches with the package, either here or in the issues section of the github repository. I was only able to test it on a couple different computers.

I considered both R and Python with the Pandas package for developing this package, and I initially played around with the FIRST API in both languages. I think Python is better as a general purpose scripting language and I found the documentation for Python to be much easier to understand than R's documentation. But for some random reason, early on, I made more progress in R than in Python.

I'm experimenting with a Shiney app as well. I think Shiney could work well for building an application to present our scouting data to the drive crew -- I'm going to see if our scouting group is interested in pursuing that project. If we come up with any interesting R data visualization techniques we'll post them on the github webpages for the firstapiR package.

Conor Ryan
29-09-2016, 22:45
Thanks for starting the API! With the right community for R, we can really bring the beginnings of data science to the masses. Machine Learning for alliance selection? K-Means for determining balanced alliances? We can really go a step further here than what exists today.


I'm experimenting with a Shiney app as well. I think Shiney could work well for building an application to present our scouting data to the drive crew -- I'm going to see if our scouting group is interested in pursuing that project. If we come up with any interesting R data visualization techniques we'll post them on the github webpages for the firstapiR package.

I've been doing a lot of work with Rshiny for work lately and it is easily the best interface for rapid prototyping of R data applications. Highly recommended, in 20 minutes you can have a quality interface. If you get proficient at it there are a lot of great job opportunities out there.

After R the next leap forward is Scala, but this is a great place to be today.

Bald & Bearded
30-09-2016, 11:17
Thanks for posting this. I had been thinking about something along R analysis of match data and a Shiny based scouting system. This is perfect.

One question, I looked at the whole API project on Teamforg and joined the project. But it was not clear if that automatically would get me an authorization token, or if there was some other way one had to go about requesting one.

Ether
30-09-2016, 15:29
Thanks for doing this.

Question for R gurus:

Attached is a table of qual match scores for 17842 alliances (8921 matches) involving 2696 teams.

Each row has 8 fields:

red1 red2 red3 blue1 blue2 blue3 red_score blue_score

What is the recommended way to use R to efficiently compute "World OPR" for this large dataset?

Ether
02-10-2016, 20:21
Thanks for doing this.

Question for R gurus:



Here's a re-statement of the question to make it accessible to R gurus who may not be familiar with OPR computation.

I want to know if R can solve for the least-squares solution to the overdetermined system of linear equations [A][x] = [b], given the attached [A] and [b] files.

An equivalent question framed in the language of statistics would be: [A] is a table (in plaintext sparse format) of 17842 sets of values for 2696 independent variables, and b is a column of 17842 corresponding values for the dependent variable. Can R read the attached files and compute multiple linear regression?

sirwin
04-10-2016, 16:49
Thanks for posting this. I had been thinking about something along R analysis of match data and a Shiny based scouting system. This is perfect.

One question, I looked at the whole API project on Teamforg and joined the project. But it was not clear if that automatically would get me an authorization token, or if there was some other way one had to go about requesting one.

It's been a few months, but I'm pretty sure that I received my token by joining the teamforge project. I received an email from Alex Herreid a few days later at the email address I used to join the project. The email contained my username and token.

According to the TeamForge page, they'll only be reviewing requests for tokens monthly now that the competition season is over, but they'll review weekly once build season starts up again. It may be a few weeks before you hear from Alex.

Ether
04-10-2016, 17:14
I'd like to hear from anyone who has successfully computed World OPR (https://www.chiefdelphi.com/forums/showpost.php?p=1041684) using R.

sirwin
05-10-2016, 15:42
I'd like to hear from anyone who has successfully computed World OPR (https://www.chiefdelphi.com/forums/showpost.php?p=1041684) using R.




I'm intrigued. I'm relatively new to FIRST, so bear with me. OPR refers to offensive power rating, correct? You're referring to using the Choleski decomposition to produce an estimate of how many points any single team should be expected to contribute to an alliance score, based on past performance?

There's an R package that supposedly does this -- it's called optR and it's available on the CRAN repository. It has a function called choleskilm that should do the trick. Of course the raw data has to be shaped into a positive definite matrix first.

The size of the matrix could be a problem. The description that I found on this method for calculating OPR focused on using this method for data from a single competition -- generally no more than a 100 x 100 matrix. But 2700 x 2700? I'll experiment with smaller data sets over the weekend and see if I can figure out what the computation time will be.

Ether
07-10-2016, 15:36
I'm intrigued. I'm relatively new to FIRST...

Welcome!

so bear with me.

Not to worry.

OPR refers to offensive power rating, correct?

yes

You're referring to using the Choleski decomposition

I did not explicitly mention Cholesky, but yes that factorization can be used to factor the Normal Equations matrix

to produce an estimate of how many points any single team should be expected to contribute to an alliance score, based on past performance?

Yes. A very rough estimate, since the model assumptions are not very realistic.

There's an R package that supposedly does this -- it's called optR and it's available on the CRAN repository. It has a function called choleskilm that should do the trick.

For small matrix associated with a single event

Of course the raw data has to be shaped into a positive definite matrix first.

... AND the Aij design matrix (attached to post9 in this thread) must be read by R before it can be used to compute the normal equations matrix N.

The size of the matrix could be a problem.

A big problem

The description that I found on this method for calculating OPR focused on using this method for data from a single competition -- generally no more than a 100 x 100 matrix.

Yes.

But 2700 x 2700?

Full-matrix Cholesky is roughly proportional to O(n^3).

(2696/40)^3 = 306182

Big problem. Unless you use sparse matrix algorithms.

I'll experiment with smaller data sets over the weekend and see if I can figure out what the computation time will be.

Please let us know what you find out:)

sirwin
16-10-2016, 10:46
I've had a request for a downloadable version of the firstapiR package, so that there is another option for obtaining firstapiR that doesn't require the devtools package and installing from github. I've made both a source and binary version of the package available at this website. (https://irwinsnet.github.io/)

The file downloads are hosted on my MediaFire account. I apologize for the adds, but I didn't want to post publicly available links to cloud drives where I keep personal files (and I'm too cheap to pay for add-free downloads :)).

Stacy

sirwin
16-10-2016, 19:44
Thanks for doing this.

Question for R gurus:

Attached is a table of qual match scores for 17842 alliances (8921 matches) involving 2696 teams.

Each row has 8 fields:

red1 red2 red3 blue1 blue2 blue3 red_score blue_score

What is the recommended way to use R to efficiently compute "World OPR" for this large dataset?




I have no idea if this is the recommended way to user R for this sort of calculation, but my solution is posted at https://irwinsnet.github.io/opr.html.

My method requires 13 seconds on a Surface 4 with an i5 process and 8 Gb of memory. It actually takes more time to create the A matrix (9 seconds) than it does to solve for the OPRs once the A and B matrices are prepared (3 seconds). The remaining second is required for reading the data and preparing the B matrix.

The results and source code are available at the link above.

Stacy

Ether
18-10-2016, 01:41
my solution is posted at https://irwinsnet.github.io/opr.html.

My method requires 13 seconds on a Surface 4 with an i5 process and 8 Gb of memory.

Reps to you for all the fine work :)

Reps to the first person who can cut that time in half.

Foster
19-10-2016, 06:36
Reps to you for all the fine work :)

Reps to the first person who can cut that time in half.


Cut 13 seconds in half? I'm excited that sirwin got it to work at all on that data set! At 13 seconds, the WOPR rankings can be recalculated between matches before the first verse of "Cotton eye Joe" gets played. :rolleyes:

Good job sirwin on the code, and an even better write up of how you did it.

sirwin
19-10-2016, 08:57
With respect to speeding up the calculation, I had a thought.

It takes my algorithm 9 seconds to produce the 2696 x 2696 A matrix -- but updating the A matrix with new match data should be nearly instantaneous. For every new match, we just have to add 1 to 18 different elements in the A matrix (all other A matrix elements will remain unchanged). Each element of the a matrix is directly accessible by team numbers. So it takes 13 seconds to calculate world OPR from scratch, but as long as we retain the A matrix, it should only take about 5 seconds to update world OPR with new match data. That's 2 seconds for updating the A and B matrices and 3 seconds for solving for OPR.

Of course, this is assuming that R's underlying C code doesn't do anything stupid, like extracting an array element by going through every array element until it gets to the right one. Also, the A matrix would need to be maintained in memory -- disk access would slow things down.

Ether
19-10-2016, 12:26
It takes my algorithm 9 seconds to produce the 2696 x 2696 A matrix

I like to call that symmetric positive definite 2696x2696 normal equations matrix the N matrix... to distinguish it from the 17842x2696 dichotomous design matrix A.

Ax = b .. overdetermined system of linear equations (has no exact solution)

N = A'*A
d = A'*b

Nx = d .. linear system of normal equations (whose solution is the least squares solution for Ax=b)

Ether
19-10-2016, 18:50
Cut 13 seconds in half?

OK, I was being generous. Cut it by a factor of 10. Seriously.

Ether
23-10-2016, 16:48
Cut it by a factor of 10. Seriously.

If there is any interest in learning how to do this within the R community please let me know and we can work it out together.

Joey1939
25-10-2016, 15:31
I took the world OPR data and made a graph.

EDIT: I made a better graph.

Ether
29-10-2016, 11:24
If there is any interest in learning how to do this within the R community please let me know and we can work it out together.

https://www.chiefdelphi.com/forums/showthread.php?p=1614009#post1614009

sirwin
31-12-2016, 12:30
I created version 2.0.0 of the firstapiR package for downloading and manipulating FIRST API scouting data using the R language.

Here's a link:

http://irwinsnet.github.io

Once I started using the package to do OPR calculations, I realized it needed some improvements, specifically in how the data frames are shaped.

Stacy Irwin

Ether
31-12-2016, 16:48
@ Stacy: thanks for posting this. I just send you a PM.

Skyehawk
31-12-2016, 20:45
My method requires 13 seconds on a Surface 4 with an i5 process and 8 Gb of memory. It actually takes more time to create the A matrix (9 seconds) than it does to solve for the OPRs once the A and B matrices are prepared (3 seconds). The remaining second is required for reading the data and preparing the B matrix.
Stacy

I am aware I am beating a dead horse here :deadhorse: , but using unmodified Sirwin's code I am getting ~12 seconds with an i7-6560U (2.2GHz) and 8gb of RAM, memory usage is low so we can eliminate that as a factor. The bottleneck is (unsurprisingly) CPU clock cycle speed.

I'm using firstapiR v2.0.0

Skye Leake

Ether
31-12-2016, 21:11
I am getting ~12 seconds with an i7-6560U (2.2GHz) and 8gb of RAM, memory usage is low so we can eliminate that as a factor. The bottleneck is (unsurprisingly) CPU clock cycle speed.

I am getting ~0.93 seconds on a 10-year-old Pentium D desktop running 32-bit XP with 1G RAM.

Skyehawk
31-12-2016, 21:24
:ahh: I obviously am doing something not quite right...

Ether
31-12-2016, 21:26
:ahh: I obviously am doing something not quite right...

Your results look about right for R code that uses dense matrix technology.

https://www.chiefdelphi.com/forums/showpost.php?p=1612297

https://www.chiefdelphi.com/forums/showpost.php?p=1612581

https://www.chiefdelphi.com/forums/showpost.php?p=1613118

https://www.chiefdelphi.com/forums/showpost.php?p=1614125

Conor Ryan
01-01-2017, 11:50
In case anybody wants to learn R, here is a great way to get introduced http://swirlstats.com