This matrix is quite small compared to those generally solved in finite elements, CFD, or other common codes. As was mentioned a little bit earlier, the biggest benefit to speedup can be done by processing everything as sparse matrices.
On my 2.0 GHz Macbook Air running Matlab Student R2012a, I can run:
tic
d = load(‘d.dat’);
N = load(‘N.dat’);
toc
tic
output = N\d;
toc
and get the output:
Elapsed time is 2.768235 seconds. <–loading files into memory
Elapsed time is 0.404477 seconds. <–solving the matrix
If I now change the code to:
tic
d = load(‘d.dat’);
N = load(‘N.dat’);
toc
tic
Ns = sparse(N);
toc
tic
output = Ns\d;
toc
With output:
Elapsed time is 2.723927 seconds. <–load files
Elapsed time is 0.040358 seconds. <–conversion to sparse
Elapsed time is 0.017368 seconds. <–solving
There are only 82267 nonzero elements in the N matrix, (vs 2509*2509 ~ 6.3 million) so the sparse matrix runs much faster - it essentially skips over processing entries that are zero, so doesn’t have to do that part of the inversion process.
Here’s an iterative method solving the problem. I haven’t tuned any iteration parameters for bicgstab (biconjugate gradients, stabilized) so it could be a bit better but the mean squared error is pretty small.
tic
d = load(‘d.dat’);
N = load(‘N.dat’);
toc
tic
Ns = sparse(N);
toc
tic
output = bicgstab(Ns,d);
toc
% compute a true output
output_true = Ns\d;
% compute mean squared error of OPR
output_mse = sum((output_true - output).^2)/length(output)
Elapsed time is 2.728844 seconds.
Elapsed time is 0.040895 seconds.
bicgstab stopped at iteration 20 without converging to the desired tolerance 1e-06
because the maximum number of iterations was reached.
The iterate returned (number 20) has relative residual 2e-06.
Elapsed time is 0.015128 seconds.
output_mse =
9.0544e-07
Not much benefit in the iterative method here…the matrix is quite small. The speedup is much more considerable when you are solving similarly sparse matrices that are huge. In industry and research in my career my finite element models can get to matrices that are millions by millions or more…at that point you need sophisticated algorithms. But for the size of the OPR matrix, unless we get TONS more FRC teams soon, just running it with sparse tools should be sufficient for it to run quite fast. Octave and MATLAB have it built in, and I believe NumPy/SciPy distributions do as well. There are also C++ and Java libraries for sparse computation.
A final suggestion would be that if you construct your matrices in the sparse form explicitly from the get-go (not N, but the precursor to it) you can alleviate even the data loading time to a small fraction of what it is now.
Hope that helps.
Added: I did check the structure of N, and it is consistent with a sparse least squares matrix. It is also symmetric and positive definite. These properties are why I chose bicgstab instead of gmres or another iterative algorithm. If you don’t want to solve it iteratively, Cholesky Factorization is also very good for dealing with symmetric positive definite matrices.