I’m about 2 months late, but I think I have generated an accurate report. Some of the OPR and CCWM values are identical to the 1114/2834 databases, while others are a few values off. I don’t know whether this is an input error, matrix inversion error, or general code bug.

Thanks for posting. Maybe this will revive interest in the thread.

Did you time how long it took the computer to go from raw input data to finished output report ? That’s the challenge.

Some of the OPR and CCWM values are identical to the 1114/2834 databases, while others are a few values off. I don’t know whether this is an input error, matrix inversion error, or general code bug.

Oops, sorry for not posting it. Over 10 runs, it averaged 0.823266 seconds to generate the output.

As for the matrix inversion, I didn’t realize that the Cholesky solving library I used didn’t find the inverse, rather it found Cholesky factorization of the matrix.

Read from file to create data structures: 0.3636 seconds

Make sorted team lists and Bs in Ax=B: 0.0019 seconds

Make matrix A in Ax=B: 0.0675 seconds

Create Cholesky factorization and find OPR and CCWM values (DPR = OPR - CCWM): 0.2793 seconds

Write to file: 0.0242 seconds
(Record these times: 0.0128 seconds)

Total: 0.7494 seconds

I wrote this in Python using the numpy and scipy libraries to manipulate matrices. The software is running on a Windows 8 machine with an Intel i7-4700MQ running at 2.4 GHz, with 12 GB of RAM.

Fair enough. It’s been a while since I’ve messed around with this, but I dug up my MATLAB script I originally did this with. I got it down to 0.518726 seconds (average over 10 runs.)

There are some optimizations that are done with this. For example, using sparse matrices lets MATLAB do a much quicker job of left divides (which is not the same as inverting the matrix). MATLAB is actually fairly smart and decides which form of factorization to use based on the data that’s inputted, so it (generally) will choose the fastest method. Sparse is very fast.

There are some remnants in the comments of me messing with gpuArrays on MATLAB, and I’m too lazy to get rid of them. I have a Quadro K1100M Graphics card in my laptop, so I was messing around with using CUDA cores. It turns out that MATLAB has not implemented sparse matrices with CUDA processing, so they have to be full matrices, which actually slows it down (fairly significantly) over using sparse matrices with calculation on the CPU. For the record, my CPU is an Intel i7-4702HQ @ 2.2 GHz with 16 GB of RAM (My Computer is a Dell Precision M3800).

EDIT: I actually just realized there was some code that slows it down in there. the "gather"s in developing the “out” matrix are more remnants of GPU computing that I didn’t take out. They are used for getting the variables out of GPU memory to my RAM. After removing them, It took total computation time to about 0.48 seconds.

Made it a bit faster (0.02 seconds). I expected it to be a lot more, but surprisingly, reading the file into MATLAB is the longest operation at 0.31 seconds (That’s reading in the file and creating my A and B matrices). It surprises me it takes that long with a the SSD I have.

I’ve been a bit busy, but since the light is at the end of the tunnel, I met with Jim, a mathematician on our team to see if he would do this with our sparse matrix tools. The first screenshot shows the breakdown of times. The second is the code written in LV.

He is going to tinker to see if he can find a better way to build the sparse matrix, since most of the time is spent before invoking the solver.

This was timed on a Windows VM that has 4 cores running on my macbook 2.7GHz core i7. Jim was running on a desktop machine which I don’t have details for, and he wasn’t writing the file. His was somewhat faster. My cores are only about 70% utilized since most of the time isn’t spent in the solver.

I just wrote, compiled, and ran a 32-bit single-core native app on an 8-year-old Pentium D machine running 32-bit XP Pro SP3, and timed it using RDTSC instructions embedded in the code.

It took 11.9 milliseconds to read the raw data file (cached in RAM) and generate the alliance score vectors and the sparse design matrix.

Using 16-bit unsigned integers for the team numbers and scores, generating the sparse matrix directly from the raw data, and compiling to native code saves a lot of runtime.

Based on some PMs I have received, I should clarify a few things.

[ul]
[li]The posted Delphi code reads the raw 8-column (r1 r2 r3 b1 b2 b3 rs bs) whitespace-delimited alliance scores text data file (from cached RAM) and constructs two matrices [At] and [b]. It takes ~12ms to do this on an 8-year-old Pentium D machine.
[/li]

[li][A]=[At]’ is the the binary design matrix
[/li]

[li][b] is the matrix of alliance scores (not the teamOPR scores)
[/li]

[li][A]≈[b] is the overdetermined system of linear equations
[/li]

[li][A]≈[b] can be solved with one line of code in Octave (or MatLab) as follows: =[A][b]. will be the matrix whose column vectors minimize the sum of the squares of the corresponding column vectors in the residuals matrix [r]=[b]-[A]
[/li]

[li]But it is much faster (and acceptably stable and accurate for OPR purposes) to create and solve the Normal Equations [N][x]=[d] instead
[/li]

[li][N] and [d] can be formed from [At] and [b] as follows: [N]=[At][At]’ and [d]=[At][b]
[/li]

[li][N]=[d] can be solved with one line of code in Octave (or MatLab) as follows: =[N][d]
[/li]

[li]solving [N]=[d] is faster than solving [A]≈[b] because 1) [N] is a smaller matrix than [A], and 2) N is symmetric positive definite so Cholesky factorization can be used
[/li]

[li]the computations [A]=[At]', N=[At][A], and [d]=[At][b] take about 20ms on the 8-year-old Pentium D machine
[/li]

[li]the Normal Equations solution =[N][d] takes about 210ms on the 8-year-old Pentium D machine, using a single core
[/li]

[li]there may be Cholesky factoring algorithms which would permit multiple cores to be used for factoring
[/li][/ul]