Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   Programming (http://www.chiefdelphi.com/forums/forumdisplay.php?f=51)
-   -   OPR Programming Challenge (http://www.chiefdelphi.com/forums/showthread.php?t=130783)

Ether 11-10-2014 11:10

OPR Programming Challenge
 

Given the dataset Michael Hill posted here, calculate OPR, DPR, and CCWM; and create a report sorted by team number.

The goal is to see who can go from raw input data to finished output report the fastest.

@ Michael Hill: please refrain since you and I have discussed this.



Arhowk 11-10-2014 11:30

Re: OPR Programming Challenge
 
by "fastest" do you mean the first response or fastest operational efficiency?

Nathan Streeter 11-10-2014 19:14

Re: OPR Programming Challenge
 
I believe operational efficiency is the goal; however, Ether can speak for himself on that...

Michael Hill 11-10-2014 19:44

Re: OPR Programming Challenge
 
*Actively Refraining* ...if that's possible

saikiranra 05-12-2014 01:40

Re: OPR Programming Challenge
 
1 Attachment(s)
I'm about 2 months late, but I think I have generated an accurate report. Some of the OPR and CCWM values are identical to the 1114/2834 databases, while others are a few values off. I don't know whether this is an input error, matrix inversion error, or general code bug.

Ether 05-12-2014 09:34

Re: OPR Programming Challenge
 
Quote:

Originally Posted by saikiranra (Post 1411651)
I'm about 2 months late, but I think I have generated an accurate report.

Thanks for posting. Maybe this will revive interest in the thread.

Did you time how long it took the computer to go from raw input data to finished output report ? That's the challenge.


Quote:

Some of the OPR and CCWM values are identical to the 1114/2834 databases, while others are a few values off. I don't know whether this is an input error, matrix inversion error, or general code bug.
http://www.chiefdelphi.com/forums/sh...29&postcount=7



saikiranra 05-12-2014 20:59

Re: OPR Programming Challenge
 
Quote:

Originally Posted by Ether (Post 1411679)

Did you time how long it took the computer to go from raw input data to finished output report ? That's the challenge.

http://www.chiefdelphi.com/forums/sh...29&postcount=7



Oops, sorry for not posting it. Over 10 runs, it averaged 0.823266 seconds to generate the output.

As for the matrix inversion, I didn't realize that the Cholesky solving library I used didn't find the inverse, rather it found Cholesky factorization of the matrix.

Ether 05-12-2014 21:23

Re: OPR Programming Challenge
 
Quote:

Originally Posted by saikiranra (Post 1411851)
Over 10 runs, it averaged 0.823266 seconds to generate the output.

That's pretty good.

The challenge includes the total time to do all these steps*:
  1. - read Michael's 8-column fixed-width data file
  2. - convert that data into a set of linear equations
  3. - compute the solution vectors OPR, DPR, & CCWM
  4. - sort the solution vector elements in ascending order by team number
  5. - write the output report to a disk file.

Please report the times for each of the above steps.

What hardware did you run this on? What software did you use?

*if you used different steps, please explain and provide timings

saikiranra 05-12-2014 23:56

Re: OPR Programming Challenge
 
Out of 50 runs, these are the average times to:

1. Read from file to create data structures: 0.3636 seconds
2. Make sorted team lists and Bs in Ax=B: 0.0019 seconds
3. Make matrix A in Ax=B: 0.0675 seconds
4. Create Cholesky factorization and find OPR and CCWM values (DPR = OPR - CCWM): 0.2793 seconds
5. Write to file: 0.0242 seconds
(Record these times: 0.0128 seconds)

Total: 0.7494 seconds

I wrote this in Python using the numpy and scipy libraries to manipulate matrices. The software is running on a Windows 8 machine with an Intel i7-4700MQ running at 2.4 GHz, with 12 GB of RAM.

Ether 06-12-2014 12:21

Re: OPR Programming Challenge
 
Quote:

Originally Posted by saikiranra (Post 1411889)
Out of 50 runs, these are the average times to:

1. Read from file to create data structures: 0.3636 seconds
2. Make sorted team lists and Bs in Ax=B: 0.0019 seconds
3. Make matrix A in Ax=B: 0.0675 seconds
4. Create Cholesky factorization and find OPR and CCWM values (DPR = OPR - CCWM): 0.2793 seconds
5. Write to file: 0.0242 seconds
(Record these times: 0.0128 seconds)

Total: 0.7494 seconds

I wrote this in Python using the numpy and scipy libraries to manipulate matrices. The software is running on a Windows 8 machine with an Intel i7-4700MQ running at 2.4 GHz, with 12 GB of RAM.

Nice work.

For those of you reading along, the timing above is for a dataset containing 2696 teams, 8921 matches, 17842 alliance scores.

@ Michael Hill: feel free to jump in here now if you want :-)






Michael Hill 06-12-2014 16:35

Re: OPR Programming Challenge
 
Quote:

Originally Posted by Ether (Post 1411921)
Nice work.

For those of you reading along, the timing above is for a dataset containing 2696 teams, 8921 matches, 17842 alliance scores.

@ Michael Hill: feel free to jump in here now if you want :-)

Fair enough. It's been a while since I've messed around with this, but I dug up my MATLAB script I originally did this with. I got it down to 0.518726 seconds (average over 10 runs.)

There are some optimizations that are done with this. For example, using sparse matrices lets MATLAB do a much quicker job of left divides (which is not the same as inverting the matrix). MATLAB is actually fairly smart and decides which form of factorization to use based on the data that's inputted, so it (generally) will choose the fastest method. Sparse is very fast.

Code:

tic
alldata = dlmread('C:\Users\Michael Hill\Documents\Projects\OPR\scores.dat');
teams = [alldata(:,1:3); alldata(:,4:6)];
reddiff = [alldata(:,7)-alldata(:,8)];
bluediff = [alldata(:,8)-alldata(:,7)];
scores = [alldata(:,7); alldata(:,8)];
diffscores = [reddiff; bluediff];
a = size(teams);
A = zeros(max(max(teams)));
B = zeros(max(max(teams)),2);
for i = 1:a(1)
    A(teams(i,1),teams(i,1)) = A(teams(i,1),teams(i,1)) + 1;
    A(teams(i,1),teams(i,2)) = A(teams(i,1),teams(i,2)) + 1;
    A(teams(i,1),teams(i,3)) = A(teams(i,1),teams(i,3)) + 1;
    A(teams(i,2),teams(i,1)) = A(teams(i,2),teams(i,1)) + 1;
    A(teams(i,2),teams(i,2)) = A(teams(i,2),teams(i,2)) + 1;
    A(teams(i,2),teams(i,3)) = A(teams(i,2),teams(i,3)) + 1;
    A(teams(i,3),teams(i,1)) = A(teams(i,3),teams(i,1)) + 1;
    A(teams(i,3),teams(i,2)) = A(teams(i,3),teams(i,2)) + 1;
    A(teams(i,3),teams(i,3)) = A(teams(i,3),teams(i,3)) + 1;
    B(teams(i,1),1) = B(teams(i,1),1) + scores(i);
    B(teams(i,2),1) = B(teams(i,2),1) + scores(i);
    B(teams(i,3),1) = B(teams(i,3),1) + scores(i);
    B(teams(i,1),2) = B(teams(i,1),2) + diffscores(i);
    B(teams(i,2),2) = B(teams(i,2),2) + diffscores(i);
    B(teams(i,3),2) = B(teams(i,3),2) + diffscores(i);
end

%gpuA = gpuArray(A);
%gpuB = gpuArray(B);
A = sparse(A);
B = sparse(B);
A(~any(A,2), :) = [];
A(:, ~any(A,1)) = [];
B(~any(B,2), :) = [];
X = A\B(:,1);
Y = A\B(:,2);
D = X-Y;

tdupe = sort([teams(:,1);teams(:,2);teams(:,3)]);
[junk, index] = unique(tdupe,'first');
teamList = tdupe(sort(index));
numTeams = size(teamList);
out = zeros(numTeams(1),4);
for i = 1:size(teamList)
    out(i,1) = teamList(i);
    out(i,2) = gather(X(i));
    out(i,3) = gather(Y(i));
    out(i,4) = gather(D(i));
end

% expectedRed = zeros(a(1)/2,1);
% expectedBlue = zeros(a(1)/2,1);
% for i=1:a(1)/2
%    index1 = find(out(:,1)==alldata(i,1));
%    index2 = find(out(:,1)==alldata(i,2));
%    index3 = find(out(:,1)==alldata(i,3));
%    index4 = find(out(:,1)==alldata(i,4));
%    index5 = find(out(:,1)==alldata(i,5));
%    index6 = find(out(:,1)==alldata(i,6));
%    opr1 = out(index1,2);
%    opr2 = out(index2,2);
%    opr3 = out(index3,2);
%    opr4 = out(index4,2);
%    opr5 = out(index5,2);
%    opr6 = out(index6,2);
%    expectedRed(i) = opr1+opr2+opr3;
%    expectedBlue(i) = opr4+opr5+opr6;
% end
%
% diffExpRed = expectedRed - alldata(:,7);
% diffExpBlue = expectedBlue - alldata(:,8);
%
% diffExp = [diffExpBlue; diffExpRed];
%
% C = zeros(gather(max(max(teams))));
% for i = 1:a(1)
%    C(teams(i,1)) = C(teams(i,1)) + diffExp(i);
%    C(teams(i,2)) = C(teams(i,2)) + diffExp(i);
%    C(teams(i,3)) = C(teams(i,3)) + diffExp(i);
% end
% gpuC = gpuArray(C);
% gpuC(~any(gpuC,2), :) = [];
% Z = gpuA\gpuC(:,1);
% for i = 1:size(teamList)
%    out(i,5) = gather(Z(i));
% end
disp(out);
toc

There are some remnants in the comments of me messing with gpuArrays on MATLAB, and I'm too lazy to get rid of them. I have a Quadro K1100M Graphics card in my laptop, so I was messing around with using CUDA cores. It turns out that MATLAB has not implemented sparse matrices with CUDA processing, so they have to be full matrices, which actually slows it down (fairly significantly) over using sparse matrices with calculation on the CPU. For the record, my CPU is an Intel i7-4702HQ @ 2.2 GHz with 16 GB of RAM (My Computer is a Dell Precision M3800).

EDIT: I actually just realized there was some code that slows it down in there. the "gather"s in developing the "out" matrix are more remnants of GPU computing that I didn't take out. They are used for getting the variables out of GPU memory to my RAM. After removing them, It took total computation time to about 0.48 seconds.

Ether 06-12-2014 16:44

Re: OPR Programming Challenge
 
Quote:

Originally Posted by Michael Hill (Post 1411941)
EDIT: I actually just realized there was some code that slows it down in there. the "gather"s in developing the "out" matrix are more remnants of GPU computing that I didn't take out. They are used for getting the variables out of GPU memory to my RAM. After removing them, It took total computation time to about 0.48 seconds.

Here's another optimization you can try. Replace this code:

X = A\B(:,1);
Y = A\B(:,2);

... with this:

XY = A\B

... then you'll have only one left-divide, and the XY vector will be a 2-column vector containing X and Y.



Michael Hill 06-12-2014 19:13

Re: OPR Programming Challenge
 
Quote:

Originally Posted by Ether (Post 1411942)
Here's another optimization you can try. Replace this code:

X = A\B(:,1);
Y = A\B(:,2);

... with this:

XY = A\B

... then you'll have only one left-divide, and the XY vector will be a 2-column vector containing X and Y.



Made it a bit faster (0.02 seconds). I expected it to be a lot more, but surprisingly, reading the file into MATLAB is the longest operation at 0.31 seconds (That's reading in the file and creating my A and B matrices). It surprises me it takes that long with a the SSD I have.

Ether 09-12-2014 11:40

Re: OPR Programming Challenge
 
Quote:

Originally Posted by Michael Hill (Post 1411968)
reading the file into MATLAB is the longest operation at 0.31 seconds (That's reading in the file and creating my A and B matrices)

Could I talk you into putting separate tic/toc statements around

Code:

alldata = dlmread('C:\Users\Michael Hill\Documents\Projects\OPR\scores.dat');
and

Code:

for i = 1:a(1)
    A(teams(i,1),teams(i,1)) = A(teams(i,1),teams(i,1)) + 1;
    A(teams(i,1),teams(i,2)) = A(teams(i,1),teams(i,2)) + 1;
    A(teams(i,1),teams(i,3)) = A(teams(i,1),teams(i,3)) + 1;
    A(teams(i,2),teams(i,1)) = A(teams(i,2),teams(i,1)) + 1;
    A(teams(i,2),teams(i,2)) = A(teams(i,2),teams(i,2)) + 1;
    A(teams(i,2),teams(i,3)) = A(teams(i,2),teams(i,3)) + 1;
    A(teams(i,3),teams(i,1)) = A(teams(i,3),teams(i,1)) + 1;
    A(teams(i,3),teams(i,2)) = A(teams(i,3),teams(i,2)) + 1;
    A(teams(i,3),teams(i,3)) = A(teams(i,3),teams(i,3)) + 1;
    B(teams(i,1),1) = B(teams(i,1),1) + scores(i);
    B(teams(i,2),1) = B(teams(i,2),1) + scores(i);
    B(teams(i,3),1) = B(teams(i,3),1) + scores(i);
    B(teams(i,1),2) = B(teams(i,1),2) + diffscores(i);
    B(teams(i,2),2) = B(teams(i,2),2) + diffscores(i);
    B(teams(i,3),2) = B(teams(i,3),2) + diffscores(i);
end





Michael Hill 11-12-2014 17:15

Re: OPR Programming Challenge
 
Quote:

Originally Posted by Ether (Post 1412465)
Could I talk you into putting separate tic/toc statements around

Code:

alldata = dlmread('C:\Users\Michael Hill\Documents\Projects\OPR\scores.dat');
and

Code:

for i = 1:a(1)
    A(teams(i,1),teams(i,1)) = A(teams(i,1),teams(i,1)) + 1;
    A(teams(i,1),teams(i,2)) = A(teams(i,1),teams(i,2)) + 1;
    A(teams(i,1),teams(i,3)) = A(teams(i,1),teams(i,3)) + 1;
    A(teams(i,2),teams(i,1)) = A(teams(i,2),teams(i,1)) + 1;
    A(teams(i,2),teams(i,2)) = A(teams(i,2),teams(i,2)) + 1;
    A(teams(i,2),teams(i,3)) = A(teams(i,2),teams(i,3)) + 1;
    A(teams(i,3),teams(i,1)) = A(teams(i,3),teams(i,1)) + 1;
    A(teams(i,3),teams(i,2)) = A(teams(i,3),teams(i,2)) + 1;
    A(teams(i,3),teams(i,3)) = A(teams(i,3),teams(i,3)) + 1;
    B(teams(i,1),1) = B(teams(i,1),1) + scores(i);
    B(teams(i,2),1) = B(teams(i,2),1) + scores(i);
    B(teams(i,3),1) = B(teams(i,3),1) + scores(i);
    B(teams(i,1),2) = B(teams(i,1),2) + diffscores(i);
    B(teams(i,2),2) = B(teams(i,2),2) + diffscores(i);
    B(teams(i,3),2) = B(teams(i,3),2) + diffscores(i);
end





Sorry it's taken so long, I've been somewhat busy the last few days at home.

The first one is ~0.025 sec and the second (around the for loop) is ~0.235 sec

Ether 11-12-2014 18:15

Re: OPR Programming Challenge
 
Quote:

Originally Posted by Michael Hill (Post 1413281)
...the second (around the for loop) is ~0.235 sec

I wonder would happen if you wrote that in C and compiled it to a callable routine.



Greg McKaskle 13-12-2014 21:02

Re: OPR Programming Challenge
 
2 Attachment(s)
I've been a bit busy, but since the light is at the end of the tunnel, I met with Jim, a mathematician on our team to see if he would do this with our sparse matrix tools. The first screenshot shows the breakdown of times. The second is the code written in LV.

He is going to tinker to see if he can find a better way to build the sparse matrix, since most of the time is spent before invoking the solver.

This was timed on a Windows VM that has 4 cores running on my macbook 2.7GHz core i7. Jim was running on a desktop machine which I don't have details for, and he wasn't writing the file. His was somewhat faster. My cores are only about 70% utilized since most of the time isn't spent in the solver.

Greg McKaskle

Ether 14-12-2014 19:31

Re: OPR Programming Challenge
 

OK guys.

I just wrote, compiled, and ran a 32-bit single-core native app on an 8-year-old Pentium D machine running 32-bit XP Pro SP3, and timed it using RDTSC instructions embedded in the code.

It took 11.9 milliseconds to read the raw data file (cached in RAM) and generate the alliance score vectors and the sparse design matrix.

Using 16-bit unsigned integers for the team numbers and scores, generating the sparse matrix directly from the raw data, and compiling to native code saves a lot of runtime.



Ether 15-12-2014 18:56

Re: OPR Programming Challenge
 
1 Attachment(s)
Quote:

Originally Posted by Ether (Post 1414055)
I just wrote, compiled, and ran a 32-bit single-core native app on an 8-year-old Pentium D machine running 32-bit XP Pro SP3, and timed it using RDTSC instructions embedded in the code.

Here's the Delphi code. I wonder how fast it would run on a modern 64-bit machine. Maybe somebody would like to port it to C and try it.

Ether 18-12-2014 15:29

Re: OPR Programming Challenge
 
Quote:

Originally Posted by Ether (Post 1414224)
Here's the Delphi code. I wonder how fast it would run on a modern 64-bit machine. Maybe somebody would like to port it to C and try it.

Based on some PMs I have received, I should clarify a few things.
  • The posted Delphi code reads the raw 8-column (r1 r2 r3 b1 b2 b3 rs bs) whitespace-delimited alliance scores text data file (from cached RAM) and constructs two matrices [At] and [b]. It takes ~12ms to do this on an 8-year-old Pentium D machine.

  • [A]=[At]' is the the binary design matrix

  • [b] is the matrix of alliance scores (not the team OPR scores)

  • [A][x]≈[b] is the overdetermined system of linear equations

  • [A][x]≈[b] can be solved with one line of code in Octave (or MatLab) as follows: [x]=[A]\[b]. [x] will be the matrix whose column vectors minimize the sum of the squares of the corresponding column vectors in the residuals matrix [r]=[b]-[A][x]

  • But it is much faster (and acceptably stable and accurate for OPR purposes) to create and solve the Normal Equations [N][x]=[d] instead

  • [N] and [d] can be formed from [At] and [b] as follows: [N]=[At][At]' and [d]=[At][b]

  • [N][x]=[d] can be solved with one line of code in Octave (or MatLab) as follows: [x]=[N]\[d]

  • solving [N][x]=[d] is faster than solving [A][x]≈[b] because 1) [N] is a smaller matrix than [A], and 2) N is symmetric positive definite so Cholesky factorization can be used

  • the computations [A]=[At]', N=[At][A], and [d]=[At][b] take about 20ms on the 8-year-old Pentium D machine

  • the Normal Equations solution [x]=[N]\[d] takes about 210ms on the 8-year-old Pentium D machine, using a single core

  • there may be Cholesky factoring algorithms which would permit multiple cores to be used for factoring

See this post for some additional details:
http://www.chiefdelphi.com/forums/sh...00#post1404300


Greg McKaskle 19-12-2014 22:32

Re: OPR Programming Challenge
 
3 Attachment(s)
Less busy now, so I met with Jim a few we made a few more passes. The attached image shows the top level LV diagram. And the zip has the code saved in LV 2014. The other image shows the time breakdown for the different portions.

The code runs in around 20ms on my laptop running a VM. It iterates until the residual is about 3 digits after the decimal.

Building the sparse matrix creates the diagonal terms and upper portion independently, which substantially speeds the elimination of duplicate terms. The complete solver was swapped out for one based on conjugate gradient.

The commented code loads from disk, the enabled code has the data in RAM, in a constant. Loading from disk adds another 20ms.

Greg McKaskle

Ether 19-12-2014 22:59

Re: OPR Programming Challenge
 
Quote:

Originally Posted by Greg McKaskle (Post 1415357)
Less busy now, so I met with Jim a few we made a few more passes. The attached image shows the top level LV diagram. And the zip has the code saved in LV 2014. The other image shows the time breakdown for the different portions.

Thanks for posting this Greg. Very impressive performance.

Quote:

It iterates until the residual is about 3 digits after the decimal.
What did you use for the initial guess for the iterative solver?



Greg McKaskle 20-12-2014 07:11

Re: OPR Programming Challenge
 
The initial guess is the zero vector. It takes 11 iterations to converge.

Greg McKaskle


All times are GMT -5. The time now is 22:16.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi