Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   General Forum (http://www.chiefdelphi.com/forums/forumdisplay.php?f=16)
-   -   OPR-computation-related linear algebra problem (http://www.chiefdelphi.com/forums/showthread.php?t=117072)

Nikhil Bajaj 26-05-2013 14:56

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Michael Hill (Post 1277260)
The reason I say it's computationally intensive is this article: http://www.johndcook.com/blog/2010/0...t-that-matrix/

That article is 100% correct. The solutions above that are solving in a handful of seconds or less are not inverting the matrix. Reducing the matrix to reduced-row echelon form is related to what methods like LU and Cholesky factorization do.

Even normal Gaussian elimination will be pretty fast on a sparse matrix (though still slower than most methods above), but it has problems with numerical stability that get worse and worse as matrix size increases and is for that main reason avoided by most people solving numerical linear algebra problems.

DMetalKong 26-05-2013 14:57

Re: OPR-computation-related linear algebra problem
 
2 Attachment(s)
Re-ran using Scipy's sparse matrix solver.

Average run time: 0.085s
Standard deviation: 0.005s

Code:

import sys
import numpy
import time
import scipy
import scipy.sparse
import scipy.sparse.linalg
import psutil

n_runs = 1000

print ""
print ""
print "Python version %s" % (sys.version)
print "Numpy version %s" % (numpy.__version__)
print "Scipy version %s" % (scipy.__version__)
print "Psutil version %s" % (psutil.__version__)
print ""


N = numpy.loadtxt(open('N.dat'))
d = numpy.loadtxt(open('d.dat'))

Ns = scipy.sparse.csr_matrix(N)

data = []
for i in range(1,n_runs+1):
    start = time.time()
    x = scipy.sparse.linalg.spsolve(Ns,d)
    end = time.time()
    row = [end - start]
    row.extend(psutil.cpu_percent(interval=1,percpu=True))
    s = "\t".join([str(item) for item in row])
    data.append(s)
   
f = open('times2.dat','w')
f.write("\n".join(data))
f.close()

_x = scipy.sparse.linalg.spsolve(Ns,d)
print ", ".join([str(f) for f in _x])
print ""


James Critchley 26-05-2013 15:20

Re: OPR-computation-related linear algebra problem
 
Nikhil,
Is there a reason why you are not using the "pcg" function which assumes symmetric positive definite inputs? This should be faster. Also please consider using the diagonal as a preconditioner. Unfortunately I do not have access the MATLAB at the moment. Could you try please the following? And sorry in advance for any bugs:

Ns = sparse(N);
D = diag(Ns);
Ds = sparse(diag(D)); #This was a bug... maybe it still is!

# Reference Solution
tic
output = Ns\d;
toc

# CG Solution
tic
output = pcg(Ns,d)
toc

# Diagonal PCG Solution
tic
output = pcg(Ns,d,[],[],Ds)
toc

# Reverse Cutthill-McKee re-ordering
tic
p = symrcm(Ns); # permutation array
Nr = Ns(p,p); # re-ordered problem
toc

# Re-ordered Solve
tic
output = Nr\d; #answer is stored in a permuted matrix indexed by 'p'
toc

Another advantage to the conjugate gradient methods is concurrent form of the solution within each iteration (parallel processing).

Best regards

Nikhil Bajaj 26-05-2013 17:04

Re: OPR-computation-related linear algebra problem
 
New Code, based on what James put up (I just added some disp's so that the results would be more clear. disps are outside of tics and tocs. I did not find any bugs though had to change #'s to %'s.

Code:

clc
disp('Loading Data...')
tic
d = load('d.dat');
N = load('N.dat');
toc
Ns = sparse(N);
D = diag(Ns);
Ds = sparse(diag(D)); %This was a bug... maybe it still is!

% Reference Solution
disp('Reference Solution:')
tic
output1 = Ns\d;
toc


% CG Solution
disp('CG Solution:');
tic
output2 = pcg(Ns,d);
toc

% Diagonal PCG Solution
disp('Diagonal PCG Solution:');
tic
output3 = pcg(Ns,d,[],[],Ds);
toc

% Reverse Cutthill-McKee re-ordering
disp('Re-ordering (Reverse Cutthill-McKee:');
tic
p = symrcm(Ns); % permutation array
Nr = Ns(p,p); % re-ordered problem
toc

% Re-ordered Solve
disp('Re-ordered Solution:');
tic
output4 = Nr\d; %answer is stored in a permuted matrix indexed by 'p'
toc

Output:
Code:

Loading Data...
Elapsed time is 3.033846 seconds.
Reference Solution:
Elapsed time is 0.014136 seconds.
CG Solution:
pcg stopped at iteration 20 without converging to the desired tolerance 1e-06
because the maximum number of iterations was reached.
The iterate returned (number 20) has relative residual 4.8e-05.
Elapsed time is 0.007545 seconds.
Diagonal PCG Solution:
pcg converged at iteration 17 to a solution with relative residual 8.9e-07.
Elapsed time is 0.009216 seconds.
Re-ordering (Reverse Cutthill-McKee:
Elapsed time is 0.004523 seconds.
Re-ordered Solution:
Elapsed time is 0.015021 seconds.

I didn't precondition earlier because I was being sloppy/lazy :). Thanks for calling me out. :yikes: And you're right, I should have used pcg. Thanks for the suggestion.

flameout 26-05-2013 19:08

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Ether (Post 1277233)
PS - can someone with a working Octave installation please run this? also SciLab and R

Since no-one has done Octave yet, I'll go ahead and do it (along with MATLAB for comparison). I can't do SciLab or R because I don't know how to use those :p

MATLAB 2012b:
Code:

>> N = dlmread('N.dat');
>> d = dlmread('d.dat');
>> tic ; r = N \ d; toc
Elapsed time is 0.797772 seconds.

GNU Octave 3.6.2:
Code:

octave:1> N = dlmread('N.dat');
octave:2> d = dlmread('d.dat');
octave:3> tic ; r = N \ d; toc
Elapsed time is 0.624047 seconds.

This is on an Intel i5 (2 core + hyperthreading) with Linux as the host OS (kernel version 3.7.6).

Ether 26-05-2013 20:18

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by flameout (Post 1277290)
Since no-one has done Octave yet, I'll go ahead and do it

Thanks flameout.

Quote:

I can't do SciLab or R because I don't know how to use those :p

Just installed SciLab 5.4.1 with Intel Math Kernel Library 10.3 on a 7-year-old desktop PC:
  • Intel Pentium D 3.4GHz (x86 Family 15 Model 6 Stepping 4)
  • 32-bit XP Pro SP3
  • 500GB Seagate Barracuda 7200

Code:

-->stacksize(70000000);
 
-->tic; N=read("N.dat",2509,2509); toc
 ans  = 1.672 
 
-->d=read("d.dat",2509,1);
 
-->tic; x=N\d; toc
 ans  = 1.672 
 
-->tic; Ns=sparse(N); toc
 ans  = 0.141 
 
-->tic(); xs = umfpack(Ns,'\',d); toc
 ans  = 0.14




Ether 26-05-2013 22:02

Re: OPR-computation-related linear algebra problem
 
2 Attachment(s)
Quote:

Originally Posted by RyanCahoon (Post 1277252)
C code implementing Cholesky decomposition-based solver. With minimal optimization, the calculation runs in 3.02 seconds on my system.

Hi Ryan. I compiled it with Borland C++ 5.5 and ran it on the computer described in this post. It took 80 seconds:

ryan.exe 2509 N.dat d.dat x.dat

Reading: 1.312000 seconds
Calculation: 79.953000 seconds


So I dug up an old piece of code I wrote back in 1990 with a Cholesky factoring algorithm in it1 and modified it for this application and ran it. It took about 22.5 seconds:

Nx=d build 5/26/2013 921p

CPU Hz (example 3.4e9 for 3.4GHz): 3.4e9
N matrix size (example 2509): 2509
N matrix filename (example N.dat): N.dat
d vector filename (example d.dat): d.dat
output filename (example x.dat): x.dat

reading N & d...
0.59 seconds

Cholesky...
22.37 seconds

Fwd & Back Subst...
0.08 seconds

Writing solution x...
0.01 seconds

done. press ENTER


If your code took only 3 seconds to run on your machine, but 80 on mine, I'm wondering what the Rice algorithm would do on your machine.


1John Rischard Rice, Numerical Methods, Software, and Analysis, 1983, Page 139 (see attachments)


flameout 26-05-2013 22:11

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Ether (Post 1277294)
Just installed SciLab 5.4.1 with Intel Math Kernel Library 10.3 on a 7-year-old desktop PC:

Now that I have working SciLab code, I'll go ahead and re-do the tests (with additional instrumentation on the reading). This is on the same computer as before (dual-core, hyperthreaded Intel i5, Linux 3.7.6).

MATLAB R2012b:
Code:

>> tic ; N = dlmread('N.dat'); toc
Elapsed time is 3.074810 seconds.
>> tic ; d = dlmread('d.dat'); toc
Elapsed time is 0.006744 seconds.
>> tic ; r = N \ d; toc
Elapsed time is 0.323021 seconds.
>> tic ; dlmwrite('out.dat', r); toc
Elapsed time is 0.124947 seconds.

I noticed that the solve time was very different from my previous run. This may be due to dynamic frequency scaling -- the results in this post (for all software) is with the frequency locked at the highest setting, 2.501 Ghz. It may also be due to a disk read -- I had not loaded MATLAB prior to running the previous test; now it's definitely in the disk cache. The solve is now consistently taking the time reported above, about a third of a second.

GNU Octave 3.6.2:
Code:

octave:1> tic ; N = dlmread('N.dat'); toc
Elapsed time is 1.87042 seconds.
octave:2> tic ; d = dlmread('d.dat'); toc
Elapsed time is 0.00241804 seconds.
octave:3> tic ; r = N \ d; toc
Elapsed time is 0.528489 seconds.
octave:4> tic ; dlmwrite('out.dat', r); toc
Elapsed time is 0.00820613 seconds.

Octave seems more consistent. The solve time is higher than for MATLAB, but the I/O times are consistently better.

Scilab 5.3.3:
Code:

-->stacksize(70000000);
 
-->tic; N=read("N.dat", 2509, 2509); toc
 ans  =
 
    1.21 
 
-->tic; d=read("d.dat", 2509, 1); toc
 ans  =
 
    0.003 
 
-->tic; x=N\d; toc
 ans  =
 
    1.052 
 
-->tic; Ns=sparse(N); toc
 ans  =
 
    0.081 
 
-->tic(); xs = umfpack(Ns,'\',d); toc
 ans  =
 
    0.081

Scilab failed to read the provided d.dat out of the box (reporting an EOF before it was done reading 2509 rows). I was able to correct this by adding a single newline to the end of d.dat.

FreeMat 4.0:
Code:

--> tic ; N = dlmread('N.dat'); toc
ans =
    2.8630
--> tic ; d = dlmread('d.dat'); toc
ans =
    0.0080
--> tic ; r = N \ d; toc
ans =
    3.4270

FreeMat did not have a dlmwrite function, so I haven't reported the write times for it. The time it took to solve the equations was significantly slower than any of the other programs. This did not improve with subsequent runs.

Ether 26-05-2013 22:34

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Ether (Post 1277172)
Attached ZIP file contains N and d.

BTW, in case anyone was wondering...

Ax~b (overdetermined system)

ATAx = ATb (normal equations; least squares solution of Ax~b)

Let N=ATA and d=ATb (N is symmetric positive definite)

then Nx=d

A is the binary design matrix of alliances and b is the vector of alliance scores for all the qual matches for the 2013 season, including 75 Regionals and Districts, plus MAR and MSC, plus Archi, Curie, Galileo, and Newton.

So solving Nx=d for x is solving for 2013 World OPR.




RyanCahoon 27-05-2013 20:43

Re: OPR-computation-related linear algebra problem
 
1 Attachment(s)
Hi Ether,

Quote:

Originally Posted by Ether (Post 1277300)
I compiled it with Borland C++ 5.5 and ran it [...] It took 80 seconds

That's quite a large difference in runtime. I compiled mine with Visual Studio 2010. I had wondered if VS was able to do any vectorized optimizations, but I don't see evidence of that in the Disassembly.

Quote:

Originally Posted by Ether (Post 1277300)
If your code took only 3 seconds to run on your machine, but 80 on mine, I'm wondering what the Rice algorithm would do on your machine.

If I'm reading the pseudocode you posted correctly, I think I'm using the same algorithm (I got mine from the formulae on Wikipedia), the only difference I could find is I didn't handle the case of roundoff errors leading to slightly negative sums for the diagonal elements and I do some of the sums in reverse order, but unless there's some drastically bad cache effects I don't see that impacting the runtime.

Makes me wonder what you may have done better in your coding of the algorithm.

EDIT: Changing the order of the summations got me down to 2.68 and changing to in-place computation like your code got me to 2.58. Beyond that, any improvements would seem to be in the way the Pascal compiler is generating code.

Best,

Ether 27-05-2013 21:13

Re: OPR-computation-related linear algebra problem
 
1 Attachment(s)
Quote:

Originally Posted by RyanCahoon (Post 1277437)
Makes me wonder what you may have done better in your coding of the algorithm.

...

Greg McKaskle 28-05-2013 16:35

Re: OPR-computation-related linear algebra problem
 
1 Attachment(s)
Finally had time to speak with the math guys.

The built-in LV linear algebra I was using links to an older version of Intel's MKL, but if I had used the SPD option on the solver it would indeed have been faster than the general version.

There is a toolkit called "Multicore Analysis and Sparse Matrix Toolkit", and they ran the numbers using that tool as well. Due to a newer version of MKL, the general solver is much faster. The right column converts the matrix into sparse form and uses a sparse solver.

Greg McKaskle

Ether 28-05-2013 17:42

Re: OPR-computation-related linear algebra problem
 
Quote:

Originally Posted by Greg McKaskle (Post 1277589)
Finally had time to speak with the math guys...

Thanks Greg. Are the "time" units in the attachment milliseconds?



Greg McKaskle 28-05-2013 17:50

Re: OPR-computation-related linear algebra problem
 
Yes, they are in milliseconds. SPD stands for symmetric positive definite, column three enables the algorithms to utilize more than one core -- though this doesn't seem to help that much.

Greg McKaskle

Ether 28-05-2013 18:16

Re: OPR-computation-related linear algebra problem
 

I did the computation on this computer using a slightly modified version of DMetalKong's Python code.

Python 2.7.5
SciPy 0.12.0
NumPy 1.7.1

Code:

>>> import numpy
>>> import time
>>> import scipy
>>> import scipy.sparse
>>> import scipy.sparse.linalg
>>>
>>> # Read N & d ...
... start = time.time()
>>> N = numpy.loadtxt(open('E:\z\N.dat'))
>>> d = numpy.loadtxt(open('E:\z\d.dat'))
>>> end = time.time()
>>> print "%f seconds" % (end-start)
6.532000 seconds
>>>
>>> # solve...
... start = time.time()
>>> x = numpy.linalg.solve(N,d)
>>> end = time.time()
>>> print "%f seconds" % (end-start)
15.281000 seconds
>>>
>>> # Convert to sparse...
... start = time.time()
>>> Ns = scipy.sparse.csr_matrix(N)
>>> end = time.time()
>>> print "%f seconds" % (end-start)
0.234000 seconds
>>>
>>> # solve sparse...
... start = time.time()
>>> xs = scipy.sparse.linalg.spsolve(Ns,d)
>>> end = time.time()
>>> print "%f seconds" % (end-start)
0.453000 seconds
>>>

I had expected Python to be at least as fast as SciLab.

Perhaps there's an MKL for Python I need to install?




All times are GMT -5. The time now is 05:08.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi