|
|
|
![]() |
|
|||||||
|
||||||||
![]() |
|
|
Thread Tools | Rate Thread | Display Modes |
|
|
|
#1
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
New Code, based on what James put up (I just added some disp's so that the results would be more clear. disps are outside of tics and tocs. I did not find any bugs though had to change #'s to %'s.
Code:
clc
disp('Loading Data...')
tic
d = load('d.dat');
N = load('N.dat');
toc
Ns = sparse(N);
D = diag(Ns);
Ds = sparse(diag(D)); %This was a bug... maybe it still is!
% Reference Solution
disp('Reference Solution:')
tic
output1 = Ns\d;
toc
% CG Solution
disp('CG Solution:');
tic
output2 = pcg(Ns,d);
toc
% Diagonal PCG Solution
disp('Diagonal PCG Solution:');
tic
output3 = pcg(Ns,d,[],[],Ds);
toc
% Reverse Cutthill-McKee re-ordering
disp('Re-ordering (Reverse Cutthill-McKee:');
tic
p = symrcm(Ns); % permutation array
Nr = Ns(p,p); % re-ordered problem
toc
% Re-ordered Solve
disp('Re-ordered Solution:');
tic
output4 = Nr\d; %answer is stored in a permuted matrix indexed by 'p'
toc
Code:
Loading Data... Elapsed time is 3.033846 seconds. Reference Solution: Elapsed time is 0.014136 seconds. CG Solution: pcg stopped at iteration 20 without converging to the desired tolerance 1e-06 because the maximum number of iterations was reached. The iterate returned (number 20) has relative residual 4.8e-05. Elapsed time is 0.007545 seconds. Diagonal PCG Solution: pcg converged at iteration 17 to a solution with relative residual 8.9e-07. Elapsed time is 0.009216 seconds. Re-ordering (Reverse Cutthill-McKee: Elapsed time is 0.004523 seconds. Re-ordered Solution: Elapsed time is 0.015021 seconds. . Thanks for calling me out. And you're right, I should have used pcg. Thanks for the suggestion. |
|
#2
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Quote:
|
|
#3
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Quote:
Quote:
For comparison, I compiled your code using Free Pascal and the Cholesky decomposition ran in 11.9 seconds on my computer. |
|
#4
|
|||
|
|||
|
Re: OPR-computation-related linear algebra problem
I haven't used Pascal in a long time, but seem to remember it storing 2D arrays with different elements adjacent. It was column-major and C was row-major. The notation isn't important, but accessing adjacent elements in the cache will be far faster than jumping by 20Kb to pickup up the next element.
Greg McKaskle |
|
#5
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Quote:
Code:
#define ELEMENT(M, i,j) (M[(i)*((i)+1)/2+(j)]) |
|
#6
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Quote:
I was worried about the same thing. In the most recent version of the code I posted, that macro is only used once (not in a loop) to calculate a pointer to the end of the matrix. Last edited by RyanCahoon : 30-05-2013 at 23:07. |
|
#7
|
|||
|
|||
|
Re: OPR-computation-related linear algebra problem
Quote:
Walking the matrix in col order caused two memory accesses per read, one for the pointer (and the time to do the reference fetch) and the read of the data elements. You could see a -10x reduction in speed doing column order vs row order. -- Thanks for the "back when" reminder. Ether: Just for grins, I tried RLab on our 16 way cluster. For some reason I'm not getting responses to tic()/toc(); What Windows OS are you running RLab under? |
|
#8
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
I don't have direct access to my desktop at the moment, I was doing that remotely with Logmein however for some reason I lost the connection and have not got it back yet.
I tried it on a GTX555M with only 24 cuda cores. It was 50% slower than my laptop processor(Core i7 2670QM quad core running at 2.2GHz) I will post here as soon as I get to my desktop. I was able to get 0.015 seconds using sparse matrices, however GPU processing does not support sparse matrices directly. I doubt that I can get any faster results than that. |
|
#9
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Quote:
Linux, Windows XP/7, 32 or 64 ? |
|
#10
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Matlab 2012b
here are the results Normal Matrices(CPU and GPU(555M)) Using inv(N)*d: CPU 1.874s GPU 2.146s using N\d: CPU 0.175s GPU 0.507s Sparse Matrices(Only CPU) Using inv(N)*d: CPU 0.967s using N\d: CPU 0.015s Cannot get sparse matrices into the GPU easily. The times are only for the solve operation. |
|
#11
|
||||||
|
||||||
|
Re: OPR-computation-related linear algebra problem
I just got a new computer at work with a Xeon E5-1620 and a Quadro 4000 GPU (256 CUDA cores). Using Matlab 2012b:
CPU Invert and Multiply: 1.4292s CPU Linear Solver: 0.22521s CPU Sparse Linear Solver: 0.034423s GPU Invert and Multiply: 0.537926s GPU Linear Solver: 0.218403s Loading the N matrix into the GPU was 1.890602s. Creating the Sparse Matrix for N was 0.032979s. |
|
#12
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Quote:
Linear solver 0.19269 Invert and multiply 1.8698 Code:
N = dlmread('N.dat');
d = dlmread('d.dat');
tic;
r1 = N \ d;
t1 = toc;
% also save r1 to a file here so the computation is not optimized out.
dlmwrite('r_solver.dat', r1);
disp(['Linear solver ' num2str(t1)]);
tic;
r2 = inv(N) * d;
t2 = toc;
% also save r2 to a file here so the computation is not optimized out.
dlmwrite('r_invmult.dat', r2);
disp(['Invert and multiply ' num2str(t2)]);
|
|
#13
|
|||
|
|||
|
Re: OPR-computation-related linear algebra problem
Quote:
MATLAB 2012b: Code:
>> N = dlmread('N.dat');
>> d = dlmread('d.dat');
>> tic ; r = N \ d; toc
Elapsed time is 0.797772 seconds.
Code:
octave:1> N = dlmread('N.dat');
octave:2> d = dlmread('d.dat');
octave:3> tic ; r = N \ d; toc
Elapsed time is 0.624047 seconds.
|
|
#14
|
||||
|
||||
|
Re: OPR-computation-related linear algebra problem
Thanks flameout.
Quote:
Just installed SciLab 5.4.1 with Intel Math Kernel Library 10.3 on a 7-year-old desktop PC:
Code:
-->stacksize(70000000);
-->tic; N=read("N.dat",2509,2509); toc
ans = 1.672
-->d=read("d.dat",2509,1);
-->tic; x=N\d; toc
ans = 1.672
-->tic; Ns=sparse(N); toc
ans = 0.141
-->tic(); xs = umfpack(Ns,'\',d); toc
ans = 0.14
Last edited by Ether : 26-05-2013 at 20:33. |
|
#15
|
|||
|
|||
|
Re: OPR-computation-related linear algebra problem
Quote:
MATLAB R2012b: Code:
>> tic ; N = dlmread('N.dat'); toc
Elapsed time is 3.074810 seconds.
>> tic ; d = dlmread('d.dat'); toc
Elapsed time is 0.006744 seconds.
>> tic ; r = N \ d; toc
Elapsed time is 0.323021 seconds.
>> tic ; dlmwrite('out.dat', r); toc
Elapsed time is 0.124947 seconds.
GNU Octave 3.6.2: Code:
octave:1> tic ; N = dlmread('N.dat'); toc
Elapsed time is 1.87042 seconds.
octave:2> tic ; d = dlmread('d.dat'); toc
Elapsed time is 0.00241804 seconds.
octave:3> tic ; r = N \ d; toc
Elapsed time is 0.528489 seconds.
octave:4> tic ; dlmwrite('out.dat', r); toc
Elapsed time is 0.00820613 seconds.
Scilab 5.3.3: Code:
-->stacksize(70000000);
-->tic; N=read("N.dat", 2509, 2509); toc
ans =
1.21
-->tic; d=read("d.dat", 2509, 1); toc
ans =
0.003
-->tic; x=N\d; toc
ans =
1.052
-->tic; Ns=sparse(N); toc
ans =
0.081
-->tic(); xs = umfpack(Ns,'\',d); toc
ans =
0.081
FreeMat 4.0: Code:
--> tic ; N = dlmread('N.dat'); toc
ans =
2.8630
--> tic ; d = dlmread('d.dat'); toc
ans =
0.0080
--> tic ; r = N \ d; toc
ans =
3.4270
|
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|