Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   General Forum (http://www.chiefdelphi.com/forums/forumdisplay.php?f=16)
-   -   OPR-computation-related linear algebra problem (http://www.chiefdelphi.com/forums/showthread.php?t=117072)

Ether 25-05-2013 16:30

OPR-computation-related linear algebra problem
 
1 Attachment(s)

Looking for interested parties to crunch the numbers and report how long it takes to solve Nx=d for x with the tools, libraries, and computing platforms you use.

Attached ZIP file contains N and d. N is a symmetric positive definite 2509x2509 square matrix; d is a 2509 element column vector.



Greg McKaskle 25-05-2013 22:05

Re: OPR-computation-related linear algebra problem
 
My linear algebra is very rusty and isn't part of my day job, so nothing special and I hope I did it right.

The time to invert and multiply is shown on the panel and is about 5.5 seconds plus another 1.something to load them. This is in a VM on a Macbook Pro. It was clearly running on a single CPU. If you are interested I can talk to the math guys on Tuesday.

CD seems to have problems uploading at the moment, so the first few elements are 10.1758, 29.6333, 11.1155.

Greg McKaskle

RyanCahoon 25-05-2013 23:54

Re: OPR-computation-related linear algebra problem
 
1 Attachment(s)
Using MATLAB 2012a on a Intel Core i7-3615QM:

Using linear equation solver (backslash operator): 0.26977 seconds
Using invert-and-multiply: 2.4433 seconds

Code:

N = dlmread('N.dat');
d = dlmread('d.dat');

numIters = 100;
tic;
for i=1:numIters
    r = N \ d;
end
disp(['Linear solver ' num2str(toc/numIters)]);

numIters = 10;
tic;
for i=1:numIters
    r = inv(N) * d;
end
disp(['Invert and multiply ' num2str(toc/numIters)]);


Michael Hill 26-05-2013 00:17

Re: OPR-computation-related linear algebra problem
 
Wouldn't it me much less computationally intensive to actually solve the matrix into reduced row echelon form?

DMetalKong 26-05-2013 01:30

Re: OPR-computation-related linear algebra problem
 
3 Attachment(s)
Using Python with Numpy

System:
Code:

Ubuntu 12.04 32-bit
Kernel Linux 3.2.0-43-generic-pae
Memory 3.8 GiB
Processor Intel Core 2 Duo T9400 @ 2.53 GHz x 2

Code:
Code:

import sys
import numpy
import time
import scipy
import psutil

n_runs = 1000

print ""
print ""
print "Python version %s" % (sys.version)
print "Numpy version %s" % (numpy.__version__)
print "Scipy version %s" % (scipy.__version__)
print "Psutil version %s" % (psutil.__version__)
print ""


N = numpy.loadtxt(open('N.dat'))
d = numpy.loadtxt(open('d.dat'))

data = []
for i in range(1,n_runs+1):
    start = time.time()
    x = numpy.linalg.solve(N,d)
    end = time.time()
    row = [end - start]
    row.extend(psutil.cpu_percent(interval=1,percpu=True))
    s = "\t".join([str(item) for item in row])
    data.append(s)
   
f = open('times.dat','w')
f.write("\n".join(data))
f.close()

x = numpy.linalg.solve(N,d)
print ", ".join([str(f) for f in x])
print ""

Average run time: 10.1 seconds
Standard Deviation: 5.1 seconds

The file output.txt contains versions and the solution for x.
The file runs.txt contains the run data. Note that I was doing work while letting this run in the backround, which skews the times. I collected CPU usage data to try and account for this; one interesting note is that there are two different clusters of execution times - I believe this is from my laptop throttling the CPU when I unplugged and was running off battery for a while (if you plot runs over time, you will see three distinct sections where the execution times are consistently higher).

StevenB 26-05-2013 02:47

Re: OPR-computation-related linear algebra problem
 
1 Attachment(s)
Quote:

Originally Posted by Michael Hill (Post 1277213)
Wouldn't it me much less computationally intensive to actually solve the matrix into reduced row echelon form?

Interestingly, no. Gaussian elimination is O(N^3), which gets ugly really fast. When you get into the realm of hundreds or thousands of elements, there are much better ways to do it, which computational packages like MATLAB take full advantage of. I've attached a graph showing the computation times for Gaussian elimination, invert-and-multiply, and direct solve for a variety of matrix sizes. By the time you get to 300 elements, Gaussian elimination is already painfully slow, but even the invert-and-multiply has hardly broken a sweat (less than 0.02 seconds).

This test was run on my 6-year-old Core 2 Duo (T7200 @ 2.00GHz) laptop with MATLAB R2010a. Sometime later this week I'll see about running the matrix solve on a real computer, maybe one with a little extra horsepower.

Code:

sizes = floor(logspace(1, 2.5, 10));
times = zeros(length(sizes), 3);

for s = 1:length(sizes);
  A = rand(sizes(s));
  b = rand(sizes(s), 1);
 
  %% Gaussian elimination
  tic;
  nIters = 1;
  for ii = 1:nIters;
    r = rref([A b]);
    x = r(:, end);
  end
  times(s, 1) = toc / nIters;
 
  %% Invert and multiply
  tic;
  nIters = 50;
  for ii = 1:nIters;
    x2 = inv(A) * b;
  end
  times(s, 2) = toc / nIters;
 
  %% Direct solve in MATLAB
  tic;
  nIters = 50;
  for ii = 1:nIters;
    x3 = A \ b;
  end
  times(s, 3) = toc / nIters;

end

plot(sizes, times, '-x');
xlabel('Matrix size');
ylabel('Computation time [s]');
legend('Gaussian elimination (rref)', 'Invert and multiply', 'Direct solve')


EDIT: It's been pointed out to me that a matrix inversion is also inherently O(n^3), and so there's something else at work making it slow. In this case, the catch is that rref() is written in plain MATLAB code (try "edit rref"), while inversion and solving are implemented as highly-optimized C code. Gaussian elimination is not the fastest, but it's not nearly as bad as I made it out to be.

Thanks to those who pointed this out. Obviously I need to go study some more linear algebra. :o That's on the schedule for the fall.

Greg McKaskle 26-05-2013 06:15

Re: OPR-computation-related linear algebra problem
 
2 Attachment(s)
CD let me attach again. I attached the things I intended for the previous post.

As with many of the analysis functions, this calls into a DLL, to the function InvMatrixLUDri_head. So it seems to be using LU decomposition. I think the matrix qualifies as sparse, so that helps with performance.

The direct solver was almost ten seconds.

Greg McKaskle

Ether 26-05-2013 08:09

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by RyanCahoon (Post 1277212)
Using MATLAB 2012a on a Intel Core i7-3615QM:

Using linear equation solver (backslash operator): 0.26977 seconds

Ryan, could you please re-run this, without iterating? I want to eliminate the possibility that Matlab is optimizing out the iteration.

Code:

tic;
    r1 = N \ d;
t1 = toc;

// also save r1 to a file here so the computation is not optimized out.

disp(['Linear solver ' num2str(t1)]);


tic;
    r2 = inv(N) * d;
t2 = toc;

// also save r2 to a file here so the computation is not optimized out.

disp(['Invert and multiply ' num2str(t2)]);

Thanks.


PS - can someone with a working Octave installation please run this? also SciLab and R.



BornaE 26-05-2013 09:34

Re: OPR-computation-related linear algebra problem
 
Couple of things.

In a PDE class I tool for CFD, we had to solve really large sparse matrices. the trick was to never actually store the entire matrix. However ours was much more structured and more sparse. Not sure if I can apply something similar. in this case.

What is the accuracy you are looking for. Could use some iterative methods for much faster results. You can pick an accuracy of 1e-1 (inf norm) and be fine I think for OPRs.

Loading it into my GTX 580 GPU right now to get some values. Will do that with and without the time taken to load it into the GPU memory and back.

RyanCahoon 26-05-2013 10:54

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Ether (Post 1277233)
Ryan, could you please re-run this, without iterating? I want to eliminate the possibility that Matlab is optimizing out the iteration.

I had tried this originally, and the results were consistent with the iterated/averaged result, but I was getting some variation in timing so I wanted to take the average case. Interestingly, the average from the iterated trials was consistently higher than any of the trials running single-shot.

Linear solver 0.19269
Invert and multiply 1.8698

Code:

N = dlmread('N.dat');
d = dlmread('d.dat');

tic;
    r1 = N \ d;
t1 = toc;

% also save r1 to a file here so the computation is not optimized out.
dlmwrite('r_solver.dat', r1);

disp(['Linear solver ' num2str(t1)]);


tic;
    r2 = inv(N) * d;
t2 = toc;

% also save r2 to a file here so the computation is not optimized out.
dlmwrite('r_invmult.dat', r2);

disp(['Invert and multiply ' num2str(t2)]);


Nikhil Bajaj 26-05-2013 10:58

Re: OPR-computation-related linear algebra problem
 
This matrix is quite small compared to those generally solved in finite elements, CFD, or other common codes. As was mentioned a little bit earlier, the biggest benefit to speedup can be done by processing everything as sparse matrices.

On my 2.0 GHz Macbook Air running Matlab Student R2012a, I can run:

tic
d = load('d.dat');
N = load('N.dat');
toc
tic
output = N\d;
toc

and get the output:
Elapsed time is 2.768235 seconds. <--loading files into memory
Elapsed time is 0.404477 seconds. <--solving the matrix

If I now change the code to:

tic
d = load('d.dat');
N = load('N.dat');
toc
tic
Ns = sparse(N);
toc
tic
output = Ns\d;
toc

With output:
Elapsed time is 2.723927 seconds. <--load files
Elapsed time is 0.040358 seconds. <--conversion to sparse
Elapsed time is 0.017368 seconds. <--solving

There are only 82267 nonzero elements in the N matrix, (vs 2509*2509 ~ 6.3 million) so the sparse matrix runs much faster - it essentially skips over processing entries that are zero, so doesn't have to do that part of the inversion process.

Here's an iterative method solving the problem. I haven't tuned any iteration parameters for bicgstab (biconjugate gradients, stabilized) so it could be a bit better but the mean squared error is pretty small.

tic
d = load('d.dat');
N = load('N.dat');
toc
tic
Ns = sparse(N);
toc
tic
output = bicgstab(Ns,d);
toc
% compute a true output
output_true = Ns\d;
% compute mean squared error of OPR
output_mse = sum((output_true - output).^2)/length(output)

Elapsed time is 2.728844 seconds.
Elapsed time is 0.040895 seconds.
bicgstab stopped at iteration 20 without converging to the desired tolerance 1e-06
because the maximum number of iterations was reached.
The iterate returned (number 20) has relative residual 2e-06.
Elapsed time is 0.015128 seconds.

output_mse =

9.0544e-07

Not much benefit in the iterative method here...the matrix is quite small. The speedup is much more considerable when you are solving similarly sparse matrices that are huge. In industry and research in my career my finite element models can get to matrices that are millions by millions or more...at that point you need sophisticated algorithms. But for the size of the OPR matrix, unless we get TONS more FRC teams soon, just running it with sparse tools should be sufficient for it to run quite fast. Octave and MATLAB have it built in, and I believe NumPy/SciPy distributions do as well. There are also C++ and Java libraries for sparse computation.

A final suggestion would be that if you construct your matrices in the sparse form explicitly from the get-go (not N, but the precursor to it) you can alleviate even the data loading time to a small fraction of what it is now.

Hope that helps.

Added: I did check the structure of N, and it is consistent with a sparse least squares matrix. It is also symmetric and positive definite. These properties are why I chose bicgstab instead of gmres or another iterative algorithm. If you don't want to solve it iteratively, Cholesky Factorization is also very good for dealing with symmetric positive definite matrices.

BornaE 26-05-2013 11:20

Re: OPR-computation-related linear algebra problem
 
Sounds great. I had to actually code up some different solvers in C. We could use matlab but now allowed to use any functions more complicated than adding etc.

nice to see some of the matlab tools to do that.

Just wondering, Where do you work for?

Quote:

Originally Posted by Nikhil Bajaj (Post 1277238)
This matrix is quite small compared to those generally solved in finite elements, CFD, or other common codes. As was mentioned a little bit earlier, the biggest benefit to speedup can be done by processing everything as sparse matrices.

On my 2.0 GHz Macbook Air running Matlab Student R2012a, I can run:

tic
d = load('d.dat');
N = load('N.dat');
toc
tic
output = N\d;
toc

and get the output:
Elapsed time is 2.768235 seconds. <--loading files into memory
Elapsed time is 0.404477 seconds. <--solving the matrix

If I now change the code to:

tic
d = load('d.dat');
N = load('N.dat');
toc
tic
Ns = sparse(N);
toc
tic
output = Ns\d;
toc

With output:
Elapsed time is 2.723927 seconds. <--load files
Elapsed time is 0.040358 seconds. <--conversion to sparse
Elapsed time is 0.017368 seconds. <--solving

There are only 82267 nonzero elements in the N matrix, (vs 2509*2509 ~ 6.3 million) so the sparse matrix runs much faster - it essentially skips over processing entries that are zero, so doesn't have to do that part of the inversion process.

Here's an iterative method solving the problem. I haven't tuned any iteration parameters for bicgstab (biconjugate gradients, stabilized) so it could be a bit better but the mean squared error is pretty small.

tic
d = load('d.dat');
N = load('N.dat');
toc
tic
Ns = sparse(N);
toc
tic
output = bicgstab(Ns,d);
toc
% compute a true output
output_true = Ns\d;
% compute mean squared error of OPR
output_mse = sum((output_true - output).^2)/length(output)

Elapsed time is 2.728844 seconds.
Elapsed time is 0.040895 seconds.
bicgstab stopped at iteration 20 without converging to the desired tolerance 1e-06
because the maximum number of iterations was reached.
The iterate returned (number 20) has relative residual 2e-06.
Elapsed time is 0.015128 seconds.

output_mse =

9.0544e-07

Not much benefit in the iterative method here...the matrix is quite small. The speedup is much more considerable when you are solving similarly sparse matrices that are huge. In industry and research in my career my finite element models can get to matrices that are millions by millions or more...at that point you need sophisticated algorithms. But for the size of the OPR matrix, unless we get TONS more FRC teams soon, just running it with sparse tools should be sufficient for it to run quite fast. Octave and MATLAB have it built in, and I believe NumPy/SciPy distributions do as well. There are also C++ and Java libraries for sparse computation.

A final suggestion would be that if you construct your matrices in the sparse form explicitly from the get-go (not N, but the precursor to it) you can alleviate even the data loading time to a small fraction of what it is now.

Hope that helps.

Added: I did check the structure of N, and it is consistent with a sparse least squares matrix. It is also symmetric and positive definite. These properties are why I chose bicgstab instead of gmres or another iterative algorithm. If you don't want to solve it iteratively, Cholesky Factorization is also very good for dealing with symmetric positive definite matrices.


Nikhil Bajaj 26-05-2013 11:47

Re: OPR-computation-related linear algebra problem
 
Thank you, Borna. I am currently a Ph.D. student in mechatronics and control systems at Purdue University. I did my Master's Degree in Heat Transfer and Design Optimization, and the tools I learned through that included finite element methods for structural, thermal, and fluid flow analysis, as well as the mathematical underpinnings of those methods and the numerical implementation. I also spent a lot of time looking at optimization algorithms. Some of my work was industry sponsored and so I got to help solve large problems that way.

I also did an internship at Alcatel-Lucent Bell Labs where I did CFD modeling for electronics cooling. I also use finite elements often when designing parts for my current research.

For coding some of these algorithms in C by hand, if you are interested, one of the best possible references is: Matrix Computations by Golub and Van Loan. which will get you much of the way there.

RyanCahoon 26-05-2013 13:25

Re: OPR-computation-related linear algebra problem
 
1 Attachment(s)
C code implementing Cholesky decomposition-based solver. With minimal optimization, the calculation runs in 3.02 seconds on my system.

Michael Hill 26-05-2013 14:17

Re: OPR-computation-related linear algebra problem
 
The reason I say it's computationally intensive is this article: http://www.johndcook.com/blog/2010/0...t-that-matrix/

Nikhil Bajaj 26-05-2013 14:56

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Michael Hill (Post 1277260)
The reason I say it's computationally intensive is this article: http://www.johndcook.com/blog/2010/0...t-that-matrix/

That article is 100% correct. The solutions above that are solving in a handful of seconds or less are not inverting the matrix. Reducing the matrix to reduced-row echelon form is related to what methods like LU and Cholesky factorization do.

Even normal Gaussian elimination will be pretty fast on a sparse matrix (though still slower than most methods above), but it has problems with numerical stability that get worse and worse as matrix size increases and is for that main reason avoided by most people solving numerical linear algebra problems.

DMetalKong 26-05-2013 14:57

Re: OPR-computation-related linear algebra problem
 
2 Attachment(s)
Re-ran using Scipy's sparse matrix solver.

Average run time: 0.085s
Standard deviation: 0.005s

Code:

import sys
import numpy
import time
import scipy
import scipy.sparse
import scipy.sparse.linalg
import psutil

n_runs = 1000

print ""
print ""
print "Python version %s" % (sys.version)
print "Numpy version %s" % (numpy.__version__)
print "Scipy version %s" % (scipy.__version__)
print "Psutil version %s" % (psutil.__version__)
print ""


N = numpy.loadtxt(open('N.dat'))
d = numpy.loadtxt(open('d.dat'))

Ns = scipy.sparse.csr_matrix(N)

data = []
for i in range(1,n_runs+1):
    start = time.time()
    x = scipy.sparse.linalg.spsolve(Ns,d)
    end = time.time()
    row = [end - start]
    row.extend(psutil.cpu_percent(interval=1,percpu=True))
    s = "\t".join([str(item) for item in row])
    data.append(s)
   
f = open('times2.dat','w')
f.write("\n".join(data))
f.close()

_x = scipy.sparse.linalg.spsolve(Ns,d)
print ", ".join([str(f) for f in _x])
print ""


James Critchley 26-05-2013 15:20

Re: OPR-computation-related linear algebra problem
 
Nikhil,
Is there a reason why you are not using the "pcg" function which assumes symmetric positive definite inputs? This should be faster. Also please consider using the diagonal as a preconditioner. Unfortunately I do not have access the MATLAB at the moment. Could you try please the following? And sorry in advance for any bugs:

Ns = sparse(N);
D = diag(Ns);
Ds = sparse(diag(D)); #This was a bug... maybe it still is!

# Reference Solution
tic
output = Ns\d;
toc

# CG Solution
tic
output = pcg(Ns,d)
toc

# Diagonal PCG Solution
tic
output = pcg(Ns,d,[],[],Ds)
toc

# Reverse Cutthill-McKee re-ordering
tic
p = symrcm(Ns); # permutation array
Nr = Ns(p,p); # re-ordered problem
toc

# Re-ordered Solve
tic
output = Nr\d; #answer is stored in a permuted matrix indexed by 'p'
toc

Another advantage to the conjugate gradient methods is concurrent form of the solution within each iteration (parallel processing).

Best regards

Nikhil Bajaj 26-05-2013 17:04

Re: OPR-computation-related linear algebra problem
 
New Code, based on what James put up (I just added some disp's so that the results would be more clear. disps are outside of tics and tocs. I did not find any bugs though had to change #'s to %'s.

Code:

clc
disp('Loading Data...')
tic
d = load('d.dat');
N = load('N.dat');
toc
Ns = sparse(N);
D = diag(Ns);
Ds = sparse(diag(D)); %This was a bug... maybe it still is!

% Reference Solution
disp('Reference Solution:')
tic
output1 = Ns\d;
toc


% CG Solution
disp('CG Solution:');
tic
output2 = pcg(Ns,d);
toc

% Diagonal PCG Solution
disp('Diagonal PCG Solution:');
tic
output3 = pcg(Ns,d,[],[],Ds);
toc

% Reverse Cutthill-McKee re-ordering
disp('Re-ordering (Reverse Cutthill-McKee:');
tic
p = symrcm(Ns); % permutation array
Nr = Ns(p,p); % re-ordered problem
toc

% Re-ordered Solve
disp('Re-ordered Solution:');
tic
output4 = Nr\d; %answer is stored in a permuted matrix indexed by 'p'
toc

Output:
Code:

Loading Data...
Elapsed time is 3.033846 seconds.
Reference Solution:
Elapsed time is 0.014136 seconds.
CG Solution:
pcg stopped at iteration 20 without converging to the desired tolerance 1e-06
because the maximum number of iterations was reached.
The iterate returned (number 20) has relative residual 4.8e-05.
Elapsed time is 0.007545 seconds.
Diagonal PCG Solution:
pcg converged at iteration 17 to a solution with relative residual 8.9e-07.
Elapsed time is 0.009216 seconds.
Re-ordering (Reverse Cutthill-McKee:
Elapsed time is 0.004523 seconds.
Re-ordered Solution:
Elapsed time is 0.015021 seconds.

I didn't precondition earlier because I was being sloppy/lazy :). Thanks for calling me out. :yikes: And you're right, I should have used pcg. Thanks for the suggestion.

flameout 26-05-2013 19:08

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Ether (Post 1277233)
PS - can someone with a working Octave installation please run this? also SciLab and R

Since no-one has done Octave yet, I'll go ahead and do it (along with MATLAB for comparison). I can't do SciLab or R because I don't know how to use those :p

MATLAB 2012b:
Code:

>> N = dlmread('N.dat');
>> d = dlmread('d.dat');
>> tic ; r = N \ d; toc
Elapsed time is 0.797772 seconds.

GNU Octave 3.6.2:
Code:

octave:1> N = dlmread('N.dat');
octave:2> d = dlmread('d.dat');
octave:3> tic ; r = N \ d; toc
Elapsed time is 0.624047 seconds.

This is on an Intel i5 (2 core + hyperthreading) with Linux as the host OS (kernel version 3.7.6).

Ether 26-05-2013 20:18

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by flameout (Post 1277290)
Since no-one has done Octave yet, I'll go ahead and do it

Thanks flameout.

Quote:

I can't do SciLab or R because I don't know how to use those :p

Just installed SciLab 5.4.1 with Intel Math Kernel Library 10.3 on a 7-year-old desktop PC:
  • Intel Pentium D 3.4GHz (x86 Family 15 Model 6 Stepping 4)
  • 32-bit XP Pro SP3
  • 500GB Seagate Barracuda 7200

Code:

-->stacksize(70000000);
 
-->tic; N=read("N.dat",2509,2509); toc
 ans  = 1.672 
 
-->d=read("d.dat",2509,1);
 
-->tic; x=N\d; toc
 ans  = 1.672 
 
-->tic; Ns=sparse(N); toc
 ans  = 0.141 
 
-->tic(); xs = umfpack(Ns,'\',d); toc
 ans  = 0.14




Ether 26-05-2013 22:02

Re: OPR-computation-related linear algebra problem
 
2 Attachment(s)
Quote:

Originally Posted by RyanCahoon (Post 1277252)
C code implementing Cholesky decomposition-based solver. With minimal optimization, the calculation runs in 3.02 seconds on my system.

Hi Ryan. I compiled it with Borland C++ 5.5 and ran it on the computer described in this post. It took 80 seconds:

ryan.exe 2509 N.dat d.dat x.dat

Reading: 1.312000 seconds
Calculation: 79.953000 seconds


So I dug up an old piece of code I wrote back in 1990 with a Cholesky factoring algorithm in it1 and modified it for this application and ran it. It took about 22.5 seconds:

Nx=d build 5/26/2013 921p

CPU Hz (example 3.4e9 for 3.4GHz): 3.4e9
N matrix size (example 2509): 2509
N matrix filename (example N.dat): N.dat
d vector filename (example d.dat): d.dat
output filename (example x.dat): x.dat

reading N & d...
0.59 seconds

Cholesky...
22.37 seconds

Fwd & Back Subst...
0.08 seconds

Writing solution x...
0.01 seconds

done. press ENTER


If your code took only 3 seconds to run on your machine, but 80 on mine, I'm wondering what the Rice algorithm would do on your machine.


1John Rischard Rice, Numerical Methods, Software, and Analysis, 1983, Page 139 (see attachments)


flameout 26-05-2013 22:11

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Ether (Post 1277294)
Just installed SciLab 5.4.1 with Intel Math Kernel Library 10.3 on a 7-year-old desktop PC:

Now that I have working SciLab code, I'll go ahead and re-do the tests (with additional instrumentation on the reading). This is on the same computer as before (dual-core, hyperthreaded Intel i5, Linux 3.7.6).

MATLAB R2012b:
Code:

>> tic ; N = dlmread('N.dat'); toc
Elapsed time is 3.074810 seconds.
>> tic ; d = dlmread('d.dat'); toc
Elapsed time is 0.006744 seconds.
>> tic ; r = N \ d; toc
Elapsed time is 0.323021 seconds.
>> tic ; dlmwrite('out.dat', r); toc
Elapsed time is 0.124947 seconds.

I noticed that the solve time was very different from my previous run. This may be due to dynamic frequency scaling -- the results in this post (for all software) is with the frequency locked at the highest setting, 2.501 Ghz. It may also be due to a disk read -- I had not loaded MATLAB prior to running the previous test; now it's definitely in the disk cache. The solve is now consistently taking the time reported above, about a third of a second.

GNU Octave 3.6.2:
Code:

octave:1> tic ; N = dlmread('N.dat'); toc
Elapsed time is 1.87042 seconds.
octave:2> tic ; d = dlmread('d.dat'); toc
Elapsed time is 0.00241804 seconds.
octave:3> tic ; r = N \ d; toc
Elapsed time is 0.528489 seconds.
octave:4> tic ; dlmwrite('out.dat', r); toc
Elapsed time is 0.00820613 seconds.

Octave seems more consistent. The solve time is higher than for MATLAB, but the I/O times are consistently better.

Scilab 5.3.3:
Code:

-->stacksize(70000000);
 
-->tic; N=read("N.dat", 2509, 2509); toc
 ans  =
 
    1.21 
 
-->tic; d=read("d.dat", 2509, 1); toc
 ans  =
 
    0.003 
 
-->tic; x=N\d; toc
 ans  =
 
    1.052 
 
-->tic; Ns=sparse(N); toc
 ans  =
 
    0.081 
 
-->tic(); xs = umfpack(Ns,'\',d); toc
 ans  =
 
    0.081

Scilab failed to read the provided d.dat out of the box (reporting an EOF before it was done reading 2509 rows). I was able to correct this by adding a single newline to the end of d.dat.

FreeMat 4.0:
Code:

--> tic ; N = dlmread('N.dat'); toc
ans =
    2.8630
--> tic ; d = dlmread('d.dat'); toc
ans =
    0.0080
--> tic ; r = N \ d; toc
ans =
    3.4270

FreeMat did not have a dlmwrite function, so I haven't reported the write times for it. The time it took to solve the equations was significantly slower than any of the other programs. This did not improve with subsequent runs.

Ether 26-05-2013 22:34

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Ether (Post 1277172)
Attached ZIP file contains N and d.

BTW, in case anyone was wondering...

Ax~b (overdetermined system)

ATAx = ATb (normal equations; least squares solution of Ax~b)

Let N=ATA and d=ATb (N is symmetric positive definite)

then Nx=d

A is the binary design matrix of alliances and b is the vector of alliance scores for all the qual matches for the 2013 season, including 75 Regionals and Districts, plus MAR and MSC, plus Archi, Curie, Galileo, and Newton.

So solving Nx=d for x is solving for 2013 World OPR.




RyanCahoon 27-05-2013 20:43

Re: OPR-computation-related linear algebra problem
 
1 Attachment(s)
Hi Ether,

Quote:

Originally Posted by Ether (Post 1277300)
I compiled it with Borland C++ 5.5 and ran it [...] It took 80 seconds

That's quite a large difference in runtime. I compiled mine with Visual Studio 2010. I had wondered if VS was able to do any vectorized optimizations, but I don't see evidence of that in the Disassembly.

Quote:

Originally Posted by Ether (Post 1277300)
If your code took only 3 seconds to run on your machine, but 80 on mine, I'm wondering what the Rice algorithm would do on your machine.

If I'm reading the pseudocode you posted correctly, I think I'm using the same algorithm (I got mine from the formulae on Wikipedia), the only difference I could find is I didn't handle the case of roundoff errors leading to slightly negative sums for the diagonal elements and I do some of the sums in reverse order, but unless there's some drastically bad cache effects I don't see that impacting the runtime.

Makes me wonder what you may have done better in your coding of the algorithm.

EDIT: Changing the order of the summations got me down to 2.68 and changing to in-place computation like your code got me to 2.58. Beyond that, any improvements would seem to be in the way the Pascal compiler is generating code.

Best,

Ether 27-05-2013 21:13

Re: OPR-computation-related linear algebra problem
 
1 Attachment(s)
Quote:

Originally Posted by RyanCahoon (Post 1277437)
Makes me wonder what you may have done better in your coding of the algorithm.

...

Greg McKaskle 28-05-2013 16:35

Re: OPR-computation-related linear algebra problem
 
1 Attachment(s)
Finally had time to speak with the math guys.

The built-in LV linear algebra I was using links to an older version of Intel's MKL, but if I had used the SPD option on the solver it would indeed have been faster than the general version.

There is a toolkit called "Multicore Analysis and Sparse Matrix Toolkit", and they ran the numbers using that tool as well. Due to a newer version of MKL, the general solver is much faster. The right column converts the matrix into sparse form and uses a sparse solver.

Greg McKaskle

Ether 28-05-2013 17:42

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Greg McKaskle (Post 1277589)
Finally had time to speak with the math guys...

Thanks Greg. Are the "time" units in the attachment milliseconds?



Greg McKaskle 28-05-2013 17:50

Re: OPR-computation-related linear algebra problem
 
Yes, they are in milliseconds. SPD stands for symmetric positive definite, column three enables the algorithms to utilize more than one core -- though this doesn't seem to help that much.

Greg McKaskle

Ether 28-05-2013 18:16

Re: OPR-computation-related linear algebra problem
 

I did the computation on this computer using a slightly modified version of DMetalKong's Python code.

Python 2.7.5
SciPy 0.12.0
NumPy 1.7.1

Code:

>>> import numpy
>>> import time
>>> import scipy
>>> import scipy.sparse
>>> import scipy.sparse.linalg
>>>
>>> # Read N & d ...
... start = time.time()
>>> N = numpy.loadtxt(open('E:\z\N.dat'))
>>> d = numpy.loadtxt(open('E:\z\d.dat'))
>>> end = time.time()
>>> print "%f seconds" % (end-start)
6.532000 seconds
>>>
>>> # solve...
... start = time.time()
>>> x = numpy.linalg.solve(N,d)
>>> end = time.time()
>>> print "%f seconds" % (end-start)
15.281000 seconds
>>>
>>> # Convert to sparse...
... start = time.time()
>>> Ns = scipy.sparse.csr_matrix(N)
>>> end = time.time()
>>> print "%f seconds" % (end-start)
0.234000 seconds
>>>
>>> # solve sparse...
... start = time.time()
>>> xs = scipy.sparse.linalg.spsolve(Ns,d)
>>> end = time.time()
>>> print "%f seconds" % (end-start)
0.453000 seconds
>>>

I had expected Python to be at least as fast as SciLab.

Perhaps there's an MKL for Python I need to install?



MikeE 29-05-2013 01:57

Re: OPR-computation-related linear algebra problem
 
Sorry I'm a bit late to the party.

I'm running Octave 3.2.4, so an older version than flameout
Hardware is Dell E6420 laptop (CPU i7-2640M @ 2.8GHz) Win 7 64bit

Code:

octave:2> N = load('N.dat');
octave:3> d = load('d.dat');
octave:4> tic; Ns = sparse(N); toc
Elapsed time is 0.136 seconds.
octave:5> tic; r = Ns \ d; toc
Elapsed time is 0.096 seconds.
octave:6> tic; r = N \ d; toc
Elapsed time is 0.921 seconds.

So solving the sparse matrix is around 230ms and direct solution is around 900ms.

Ether 29-05-2013 13:23

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by RyanCahoon (Post 1277437)
EDIT: Changing the order of the summations got me down to 2.68 and changing to in-place computation like your code got me to 2.58. Beyond that, any improvements would seem to be in the way the Pascal compiler is generating code.

Thanks for doing this Ryan.

Interestingly, the Pascal and C++ compilers I used are essentially identical. Only the front ends are different (for the different languages).

Is it possible that the difference in timing is due to the differences in the memory access due to the data structures we used?



Ether 29-05-2013 13:30

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Greg McKaskle (Post 1277589)
There is a toolkit called "Multicore Analysis and Sparse Matrix Toolkit", and they ran the numbers using that tool as well.

Greg,

What machine & OS was used?



Ether 29-05-2013 14:17

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by BornaE (Post 1277235)
Loading it into my GTX 580 GPU right now to get some values. Will do that with and without the time taken to load it into the GPU memory and back.

Did you ever get a chance to do this?



RyanCahoon 30-05-2013 01:24

Re: OPR-computation-related linear algebra problem
 
1 Attachment(s)
Quote:

Originally Posted by Ether
Quote:

Originally Posted by RyanCahoon (Post 1277212)
Using MATLAB 2012a on a Intel Core i7-3615QM

Ryan,

Could you edit your post to indicate what O/S you are using?

Windows 7 x64 Home Premium

Quote:

Originally Posted by Ether (Post 1277688)
Is it possible that the difference in timing is due to the differences in the memory access due to the data structures we used?

We're both using 8-byte floating point numbers. Assuming Pascal stores a multidimensional array in a contiguous memory block, we're basically using the same data structure - the only difference being that I'm only storing the elements below the diagonal and I believe you store all the elements of the matrix (at least, this is the way that I wrote the read/write code). Maybe this would affect caching.

For comparison, I compiled your code using Free Pascal and the Cholesky decomposition ran in 11.9 seconds on my computer.

Greg McKaskle 30-05-2013 07:35

Re: OPR-computation-related linear algebra problem
 
I haven't used Pascal in a long time, but seem to remember it storing 2D arrays with different elements adjacent. It was column-major and C was row-major. The notation isn't important, but accessing adjacent elements in the cache will be far faster than jumping by 20Kb to pickup up the next element.

Greg McKaskle

BornaE 30-05-2013 08:09

Re: OPR-computation-related linear algebra problem
 
I don't have direct access to my desktop at the moment, I was doing that remotely with Logmein however for some reason I lost the connection and have not got it back yet.

I tried it on a GTX555M with only 24 cuda cores. It was 50% slower than my laptop processor(Core i7 2670QM quad core running at 2.2GHz)
I will post here as soon as I get to my desktop.

I was able to get 0.015 seconds using sparse matrices, however GPU processing does not support sparse matrices directly. I doubt that I can get any faster results than that.


Quote:

Originally Posted by Ether (Post 1277700)
Did you ever get a chance to do this?




Ether 30-05-2013 09:29

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by BornaE (Post 1277768)
laptop Core i7 2670QM quad core 2.2GHz...

0.015 seconds using sparse matrices

Matlab, Octave, SciLab, Python, or something else?

Linux, Windows XP/7, 32 or 64 ?



BornaE 30-05-2013 09:42

Re: OPR-computation-related linear algebra problem
 
Matlab 2012b

here are the results

Normal Matrices(CPU and GPU(555M))
Using inv(N)*d:
CPU 1.874s
GPU 2.146s
using N\d:
CPU 0.175s
GPU 0.507s

Sparse Matrices(Only CPU)
Using inv(N)*d:
CPU 0.967s
using N\d:
CPU 0.015s

Cannot get sparse matrices into the GPU easily.

The times are only for the solve operation.




Quote:

Originally Posted by Ether (Post 1277777)
Matlab, Octave, SciLab, Python, or something else?

Linux, Windows XP/7, 32 or 64 ?




Ether 30-05-2013 13:34

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Greg McKaskle (Post 1277764)
I haven't used Pascal in a long time, but seem to remember it storing 2D arrays with different elements adjacent. It was column-major and C was row-major. The notation isn't important, but accessing adjacent elements in the cache will be far faster than jumping by 20Kb to pickup up the next element.

I was thinking the extra computation required for this might be the culprit:

Code:

#define ELEMENT(M, i,j) (M[(i)*((i)+1)/2+(j)])


Ether 30-05-2013 14:23

Re: OPR-computation-related linear algebra problem
 

Just installed Octave3.6.4_gcc4.6.2_20130408 on this computer.

Results:

GNU Octave, version 3.6.4

Octave was configured for "i686-pc-mingw32".

octave:1> tic; d = load('k:/MyOctave/d.dat'); toc
Elapsed time is 0.0625 seconds

octave:2> tic; N = load('k:/MyOctave/N.dat'); toc
Elapsed time is 90.0469 seconds

octave:3> tic; r = N \ d; toc
Elapsed time is 1.03125 seconds

octave:4> tic; r = N \ d; toc
Elapsed time is 0.953125 seconds

octave:5> tic; Ns = sparse(N); toc
Elapsed time is 0.0625 seconds
octave:6> tic; rs = Ns \ d; toc
Elapsed time is 0.1875 seconds

octave:7> tic; Ns = sparse(N); toc
Elapsed time is 0.046875 seconds
octave:8> tic; rs = Ns \ d; toc
Elapsed time is 0.171875 seconds


Anybody know why Octave takes so long to load N ?

Load N times, all on the same computer:
Delphi....0.6 seconds

C.........1.3 seconds

SciLab....1.7 seconds

RLab......3.7 seconds

Python....6.5 seconds

Octave...90.0 seconds


Greg McKaskle 30-05-2013 15:13

Re: OPR-computation-related linear algebra problem
 
LV times were on this computer

Win 7 Professional 64-bit
Xeon CPU E5-1620 3.6GHz
16G RAM

Greg McKaskle

Ether 30-05-2013 21:17

Re: OPR-computation-related linear algebra problem
 

RLaB is not a contender for fastest speed, but it's definitely the tiniest linear algebra app out there. It weighs in at under 1.5MB.

Makes a wonderful pop-up for that quick calculation, or for high-school students just learning linear algebra.

Very easy to use... and free.

Welcome to RLaB. New users type `help INTRO'
RLaB version 2.1.05 Copyright (C) 1992-97 Ian Searle

This is free software, and you are welcome to redistribute it under
certain conditions; type `help conditions' for details

> tic(); N=readm("N.dat"); toc()
3.66

> d=readm("d.dat");

> tic(); x=N\d; toc()
28.5

> tic(); x=solve(N,d,"S"); toc()
14.5
>
The second solution method (x=solve(N,d,"S")) tells RLaB that the matrix is symmetric, so it uses LAPACK subroutine DSYTRF (Bunch-Kaufman diagonal pivoting) to solve, which cuts the time in half.



RyanCahoon 30-05-2013 21:52

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Greg McKaskle (Post 1277764)
I haven't used Pascal in a long time, but seem to remember it storing 2D arrays with different elements adjacent. It was column-major and C was row-major.

I tried reversing the indices and it took 59.1 (vs 11.9) seconds. A couple other websites say Pascal is row-major.

Quote:

Originally Posted by Ether (Post 1277824)
I was thinking the extra computation required for this might be the culprit:

Code:

#define ELEMENT(M, i,j) (M[(i)*((i)+1)/2+(j)])

I was worried about the same thing. In the most recent version of the code I posted, that macro is only used once (not in a loop) to calculate a pointer to the end of the matrix.

Ether 31-05-2013 09:06

Re: OPR-computation-related linear algebra problem
 

Is anybody else running Octave on a machine with 32-bit-XP Pro?

Are you having the same 30 second delay for Octave to load, and 90 seconds to load the 12MB N.dat file?



Nikhil Bajaj 31-05-2013 09:22

Re: OPR-computation-related linear algebra problem
 
I can check after work in the lab this evening if no one gets to it first, Ether!

Foster 31-05-2013 10:30

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by RyanCahoon (Post 1277890)
I tried reversing the indices and it took 59.1 (vs 11.9) seconds. A couple other websites say Pascal is row-major.

Long, long ago I was the support for a Pascal compiler that ran on the Unisys A-Series. That compiler did Row major, since that was the underlying architecture of a stack machine. A 2D array consisted of an array of pointers, each pointer referencing a row. Once you got the row, it was fast to "walk the row" since the elements were contiguous in memory.

Walking the matrix in col order caused two memory accesses per read, one for the pointer (and the time to do the reference fetch) and the read of the data elements. You could see a -10x reduction in speed doing column order vs row order. -- Thanks for the "back when" reminder.

Ether: Just for grins, I tried RLab on our 16 way cluster. For some reason I'm not getting responses to tic()/toc(); What Windows OS are you running RLab under?

Ether 31-05-2013 10:57

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Foster (Post 1277971)
Ether: Just for grins, I tried RLab on our 16 way cluster. For some reason I'm not getting responses to tic()/toc(); What Windows OS are you running RLab under?

32-bit XP Pro SP3 on a Pentium D 3.4GHz

Don't forget to leave the semi-colon off the toc() if you want it to display.

Here's the code:
Code:

diary("output.txt");

tic();
d=readm("d.dat");
toc()


tic();
N=readm("N.dat");
toc()


tic();
x = solve(N,d,"S");
toc()


Here's the output.txt file:
Code:

// RLaB diary file: output.txt. Opened Fri May 31 10:53:02 2013


tic();
d=readm("d.dat");
toc()
        0 


tic();
N=readm("N.dat");
toc()
      3.8 


tic();
x = solve(N,d,"S");
toc()
    14.5

If you run it interactively, by cutting-and-pasting the code directly into the RLaB window (or by typing the commands at the RLaB prompt one at a time), here's the output that will display on the screen:

Code:

Welcome to RLaB. New users type `help INTRO'
RLaB version 2.1.05 Copyright (C) 1992-97 Ian Searle
RLaB comes with ABSOLUTELY NO WARRANTY; for details type `help warranty'
This is free software, and you are welcome to redistribute it under
certain conditions; type `help conditions' for details
>
> tic();
> d=readm("d.dat");
> toc()
        0
>
>
> tic();
> N=readm("N.dat");
> toc()
    3.67
>
>
> tic();
> x = solve(N,d,"S");
> toc()
    14.4
>



Ether 31-05-2013 10:59

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Nikhil Bajaj (Post 1277967)
I can check after work in the lab this evening if no one gets to it first, Ether!

Thank you.



Foster 31-05-2013 14:46

Re: OPR-computation-related linear algebra problem
 
Code:

Welcome to RLaB. New users type `help INTRO'
RLaB version 2.1.05 Copyright (C) 1992-97 Ian Searle
RLaB comes with ABSOLUTELY NO WARRANTY; for details type `help warranty'
This is free software, and you are welcome to redistribute it under
certain conditions; type `help conditions' for details
> tic();
> d=readm("d.dat");
> toc()
    21.9
> tic();
> N=readm("N.dat");
> toc()
    28.8
> tic();
> x = solve(N,d,"S");
> toc()
    34.8
>

And right, if you do toc(); you don't get the output. Which isn't wasn't what I expected.

Xeon 2.93 Ghz cluster. But this code may not be doing any multithreading.

Joe Ross 21-06-2013 15:10

Re: OPR-computation-related linear algebra problem
 
I just got a new computer at work with a Xeon E5-1620 and a Quadro 4000 GPU (256 CUDA cores). Using Matlab 2012b:

CPU Invert and Multiply: 1.4292s
CPU Linear Solver: 0.22521s
CPU Sparse Linear Solver: 0.034423s

GPU Invert and Multiply: 0.537926s
GPU Linear Solver: 0.218403s

Loading the N matrix into the GPU was 1.890602s.
Creating the Sparse Matrix for N was 0.032979s.


All times are GMT -5. The time now is 07:55.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi