fast OPR computation with R using sparse matrix technology

*Thread created for discussion of this paper.

Thanks for posting this Ether. Out of curiosity, have you quantified the time savings?

Look at the timings in the log file in the ZIP file.

Then compare those to this](

*Any questions?

I don’t have any experience with R (on my to do list though). Here is what I think is going on, please correct me if I am wrong.

Stacy Irwin used Rs “solve” method to calculate season OPRs, more info here. She created functions to generate the A and b matrices and used R’s “solve” method to get OPRs. I’m having trouble finding good documentation on this method, the best I could get was this. The problem is that this method requires A to be square, which it isn’t. This method seems to use LU decomposition. Solving for OPRs after creating the A and b matrices took approximately 3 seconds.

You used the sparseM package to calculate season OPRs. The key difference from Stacy’s method was that you converted the A matrix into a sparse matrix and then used the slm method to get OPRs. This method uses Cholesky decomposition. Solving for OPRs after creating the A and b matrices took approximately 0.50 seconds.

These results are roughly in line with the fact that Cholesky decomposition is around twice as fast as LU decomposition.

from Wikipedia

The [Cholesky decomposition] algorithms described below all involve about (n^3)/3 FLOPs, where n is the size of the matrix A. Hence, they are half the cost of the LU decomposition, which uses (2n^3)/3 FLOPs (see Trefethen and Bau 1997).

Stacy created N = A’A, which is square. See below.

you converted the A matrix into a sparse matrix

No. I created a sparse matrix representation of the design matrix A directly from the raw qual score data, in one pass through the raw data, with no nested loops, and with no intermediate dense matrix. This is where the bulk of the time savings is. The b and T vectors are created in the same single pass that creates the sparse design matrix. See below.

Let A be the 17842-by-2696 dichotomous (ones and zeros) design matrix of the overdetermined linear system
corresponding to the raw qual match data (alliances & scores) in the file “8col Hill.dat”.

Let b be the 17842-by-1 column vector of alliance scores

Let N be the 2696-by-2696 matrix of the coefficients of the normal equations N=A’A

Let d be A’b

Using Stacy’s R code as a dense matrix performance baseline, it takes 10 seconds to create the N matrix and d vector, and another 3 seconds to compute the OPR from Nx=d

Using sparse matrix technology, it takes 0.68 seconds to create the b column vector and the sparse matrix A, and 0.27 seconds to compute OPR from Ax=b.

*Reps to the first R guru who will write an R script that reads the file “8col Hill.dat” and creates data frames for Aij, Adim, b, and T in less than half a second. See the file “8col to A b T.AWK” for example code.

Reps to anyone that can write a working script in less than half a second. /s

That’s easy. Here you go:

edit: You didn’t say the script had to do anything, just that it had to “work”.

You are correct, enjoy your negative reps*.

*I did not actually give him negative rep.