|
|
|
![]() |
|
|||||||
|
||||||||
![]() |
|
|
Thread Tools | Rate Thread | Display Modes |
|
|
|
#1
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Using MATLAB 2012a on a Intel Core i7-3615QM:
Using linear equation solver (backslash operator): 0.26977 seconds Using invert-and-multiply: 2.4433 seconds Code:
N = dlmread('N.dat');
d = dlmread('d.dat');
numIters = 100;
tic;
for i=1:numIters
r = N \ d;
end
disp(['Linear solver ' num2str(toc/numIters)]);
numIters = 10;
tic;
for i=1:numIters
r = inv(N) * d;
end
disp(['Invert and multiply ' num2str(toc/numIters)]);
|
|
#2
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Wouldn't it me much less computationally intensive to actually solve the matrix into reduced row echelon form?
|
|
#3
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Using Python with Numpy
System: Code:
Ubuntu 12.04 32-bit Kernel Linux 3.2.0-43-generic-pae Memory 3.8 GiB Processor Intel Core 2 Duo T9400 @ 2.53 GHz x 2 Code:
import sys
import numpy
import time
import scipy
import psutil
n_runs = 1000
print ""
print ""
print "Python version %s" % (sys.version)
print "Numpy version %s" % (numpy.__version__)
print "Scipy version %s" % (scipy.__version__)
print "Psutil version %s" % (psutil.__version__)
print ""
N = numpy.loadtxt(open('N.dat'))
d = numpy.loadtxt(open('d.dat'))
data = []
for i in range(1,n_runs+1):
start = time.time()
x = numpy.linalg.solve(N,d)
end = time.time()
row = [end - start]
row.extend(psutil.cpu_percent(interval=1,percpu=True))
s = "\t".join([str(item) for item in row])
data.append(s)
f = open('times.dat','w')
f.write("\n".join(data))
f.close()
x = numpy.linalg.solve(N,d)
print ", ".join([str(f) for f in x])
print ""
Standard Deviation: 5.1 seconds The file output.txt contains versions and the solution for x. The file runs.txt contains the run data. Note that I was doing work while letting this run in the backround, which skews the times. I collected CPU usage data to try and account for this; one interesting note is that there are two different clusters of execution times - I believe this is from my laptop throttling the CPU when I unplugged and was running off battery for a while (if you plot runs over time, you will see three distinct sections where the execution times are consistently higher). Last edited by DMetalKong : 26-05-2013 at 01:32. |
|
#4
|
|||
|
|||
|
Re: OPR-computation-related linear algebra problem
Quote:
This test was run on my 6-year-old Core 2 Duo (T7200 @ 2.00GHz) laptop with MATLAB R2010a. Sometime later this week I'll see about running the matrix solve on a real computer, maybe one with a little extra horsepower. Code:
sizes = floor(logspace(1, 2.5, 10));
times = zeros(length(sizes), 3);
for s = 1:length(sizes);
A = rand(sizes(s));
b = rand(sizes(s), 1);
%% Gaussian elimination
tic;
nIters = 1;
for ii = 1:nIters;
r = rref([A b]);
x = r(:, end);
end
times(s, 1) = toc / nIters;
%% Invert and multiply
tic;
nIters = 50;
for ii = 1:nIters;
x2 = inv(A) * b;
end
times(s, 2) = toc / nIters;
%% Direct solve in MATLAB
tic;
nIters = 50;
for ii = 1:nIters;
x3 = A \ b;
end
times(s, 3) = toc / nIters;
end
plot(sizes, times, '-x');
xlabel('Matrix size');
ylabel('Computation time [s]');
legend('Gaussian elimination (rref)', 'Invert and multiply', 'Direct solve')
EDIT: It's been pointed out to me that a matrix inversion is also inherently O(n^3), and so there's something else at work making it slow. In this case, the catch is that rref() is written in plain MATLAB code (try "edit rref"), while inversion and solving are implemented as highly-optimized C code. Gaussian elimination is not the fastest, but it's not nearly as bad as I made it out to be. Thanks to those who pointed this out. Obviously I need to go study some more linear algebra. That's on the schedule for the fall.Last edited by StevenB : 26-05-2013 at 22:29. Reason: Corrected by much more knowlegable people |
|
#5
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Quote:
Code:
tic;
r1 = N \ d;
t1 = toc;
// also save r1 to a file here so the computation is not optimized out.
disp(['Linear solver ' num2str(t1)]);
tic;
r2 = inv(N) * d;
t2 = toc;
// also save r2 to a file here so the computation is not optimized out.
disp(['Invert and multiply ' num2str(t2)]);
PS - can someone with a working Octave installation please run this? also SciLab and R. Last edited by Ether : 26-05-2013 at 08:35. |
|
#6
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Couple of things.
In a PDE class I tool for CFD, we had to solve really large sparse matrices. the trick was to never actually store the entire matrix. However ours was much more structured and more sparse. Not sure if I can apply something similar. in this case. What is the accuracy you are looking for. Could use some iterative methods for much faster results. You can pick an accuracy of 1e-1 (inf norm) and be fine I think for OPRs. Loading it into my GTX 580 GPU right now to get some values. Will do that with and without the time taken to load it into the GPU memory and back. |
|
#7
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
This matrix is quite small compared to those generally solved in finite elements, CFD, or other common codes. As was mentioned a little bit earlier, the biggest benefit to speedup can be done by processing everything as sparse matrices.
On my 2.0 GHz Macbook Air running Matlab Student R2012a, I can run: tic d = load('d.dat'); N = load('N.dat'); toc tic output = N\d; toc and get the output: Elapsed time is 2.768235 seconds. <--loading files into memory Elapsed time is 0.404477 seconds. <--solving the matrix If I now change the code to: tic d = load('d.dat'); N = load('N.dat'); toc tic Ns = sparse(N); toc tic output = Ns\d; toc With output: Elapsed time is 2.723927 seconds. <--load files Elapsed time is 0.040358 seconds. <--conversion to sparse Elapsed time is 0.017368 seconds. <--solving There are only 82267 nonzero elements in the N matrix, (vs 2509*2509 ~ 6.3 million) so the sparse matrix runs much faster - it essentially skips over processing entries that are zero, so doesn't have to do that part of the inversion process. Here's an iterative method solving the problem. I haven't tuned any iteration parameters for bicgstab (biconjugate gradients, stabilized) so it could be a bit better but the mean squared error is pretty small. tic d = load('d.dat'); N = load('N.dat'); toc tic Ns = sparse(N); toc tic output = bicgstab(Ns,d); toc % compute a true output output_true = Ns\d; % compute mean squared error of OPR output_mse = sum((output_true - output).^2)/length(output) Elapsed time is 2.728844 seconds. Elapsed time is 0.040895 seconds. bicgstab stopped at iteration 20 without converging to the desired tolerance 1e-06 because the maximum number of iterations was reached. The iterate returned (number 20) has relative residual 2e-06. Elapsed time is 0.015128 seconds. output_mse = 9.0544e-07 Not much benefit in the iterative method here...the matrix is quite small. The speedup is much more considerable when you are solving similarly sparse matrices that are huge. In industry and research in my career my finite element models can get to matrices that are millions by millions or more...at that point you need sophisticated algorithms. But for the size of the OPR matrix, unless we get TONS more FRC teams soon, just running it with sparse tools should be sufficient for it to run quite fast. Octave and MATLAB have it built in, and I believe NumPy/SciPy distributions do as well. There are also C++ and Java libraries for sparse computation. A final suggestion would be that if you construct your matrices in the sparse form explicitly from the get-go (not N, but the precursor to it) you can alleviate even the data loading time to a small fraction of what it is now. Hope that helps. Added: I did check the structure of N, and it is consistent with a sparse least squares matrix. It is also symmetric and positive definite. These properties are why I chose bicgstab instead of gmres or another iterative algorithm. If you don't want to solve it iteratively, Cholesky Factorization is also very good for dealing with symmetric positive definite matrices. Last edited by Nikhil Bajaj : 26-05-2013 at 11:01. |
|
#8
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Sounds great. I had to actually code up some different solvers in C. We could use matlab but now allowed to use any functions more complicated than adding etc.
nice to see some of the matlab tools to do that. Just wondering, Where do you work for? Quote:
|
|
#9
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Thank you, Borna. I am currently a Ph.D. student in mechatronics and control systems at Purdue University. I did my Master's Degree in Heat Transfer and Design Optimization, and the tools I learned through that included finite element methods for structural, thermal, and fluid flow analysis, as well as the mathematical underpinnings of those methods and the numerical implementation. I also spent a lot of time looking at optimization algorithms. Some of my work was industry sponsored and so I got to help solve large problems that way.
I also did an internship at Alcatel-Lucent Bell Labs where I did CFD modeling for electronics cooling. I also use finite elements often when designing parts for my current research. For coding some of these algorithms in C by hand, if you are interested, one of the best possible references is: Matrix Computations by Golub and Van Loan. which will get you much of the way there. |
|
#10
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
C code implementing Cholesky decomposition-based solver. With minimal optimization, the calculation runs in 3.02 seconds on my system.
|
|
#11
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Quote:
ryan.exe 2509 N.dat d.dat x.dat So I dug up an old piece of code I wrote back in 1990 with a Cholesky factoring algorithm in it1 and modified it for this application and ran it. It took about 22.5 seconds: Nx=d build 5/26/2013 921p If your code took only 3 seconds to run on your machine, but 80 on mine, I'm wondering what the Rice algorithm would do on your machine. 1John Rischard Rice, Numerical Methods, Software, and Analysis, 1983, Page 139 (see attachments) |
|
#12
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Hi Ether,
Quote:
Quote:
Makes me wonder what you may have done better in your coding of the algorithm. EDIT: Changing the order of the summations got me down to 2.68 and changing to in-place computation like your code got me to 2.58. Beyond that, any improvements would seem to be in the way the Pascal compiler is generating code. Best, Last edited by RyanCahoon : 27-05-2013 at 22:32. |
|
#13
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
...
|
|
#14
|
|||
|
|||
|
Re: OPR-computation-related linear algebra problem
Finally had time to speak with the math guys.
The built-in LV linear algebra I was using links to an older version of Intel's MKL, but if I had used the SPD option on the solver it would indeed have been faster than the general version. There is a toolkit called "Multicore Analysis and Sparse Matrix Toolkit", and they ran the numbers using that tool as well. Due to a newer version of MKL, the general solver is much faster. The right column converts the matrix into sparse form and uses a sparse solver. Greg McKaskle |
|
#15
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Thanks Greg. Are the "time" units in the attachment milliseconds?
|
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|