Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   Programming (http://www.chiefdelphi.com/forums/forumdisplay.php?f=51)
-   -   OPR Programming Challenge (http://www.chiefdelphi.com/forums/showthread.php?t=130783)

Ether 11-12-2014 18:15

Re: OPR Programming Challenge
 
Quote:

Originally Posted by Michael Hill (Post 1413281)
...the second (around the for loop) is ~0.235 sec

I wonder would happen if you wrote that in C and compiled it to a callable routine.



Greg McKaskle 13-12-2014 21:02

Re: OPR Programming Challenge
 
2 Attachment(s)
I've been a bit busy, but since the light is at the end of the tunnel, I met with Jim, a mathematician on our team to see if he would do this with our sparse matrix tools. The first screenshot shows the breakdown of times. The second is the code written in LV.

He is going to tinker to see if he can find a better way to build the sparse matrix, since most of the time is spent before invoking the solver.

This was timed on a Windows VM that has 4 cores running on my macbook 2.7GHz core i7. Jim was running on a desktop machine which I don't have details for, and he wasn't writing the file. His was somewhat faster. My cores are only about 70% utilized since most of the time isn't spent in the solver.

Greg McKaskle

Ether 14-12-2014 19:31

Re: OPR Programming Challenge
 

OK guys.

I just wrote, compiled, and ran a 32-bit single-core native app on an 8-year-old Pentium D machine running 32-bit XP Pro SP3, and timed it using RDTSC instructions embedded in the code.

It took 11.9 milliseconds to read the raw data file (cached in RAM) and generate the alliance score vectors and the sparse design matrix.

Using 16-bit unsigned integers for the team numbers and scores, generating the sparse matrix directly from the raw data, and compiling to native code saves a lot of runtime.



Ether 15-12-2014 18:56

Re: OPR Programming Challenge
 
1 Attachment(s)
Quote:

Originally Posted by Ether (Post 1414055)
I just wrote, compiled, and ran a 32-bit single-core native app on an 8-year-old Pentium D machine running 32-bit XP Pro SP3, and timed it using RDTSC instructions embedded in the code.

Here's the Delphi code. I wonder how fast it would run on a modern 64-bit machine. Maybe somebody would like to port it to C and try it.

Ether 18-12-2014 15:29

Re: OPR Programming Challenge
 
Quote:

Originally Posted by Ether (Post 1414224)
Here's the Delphi code. I wonder how fast it would run on a modern 64-bit machine. Maybe somebody would like to port it to C and try it.

Based on some PMs I have received, I should clarify a few things.
  • The posted Delphi code reads the raw 8-column (r1 r2 r3 b1 b2 b3 rs bs) whitespace-delimited alliance scores text data file (from cached RAM) and constructs two matrices [At] and [b]. It takes ~12ms to do this on an 8-year-old Pentium D machine.

  • [A]=[At]' is the the binary design matrix

  • [b] is the matrix of alliance scores (not the team OPR scores)

  • [A][x]≈[b] is the overdetermined system of linear equations

  • [A][x]≈[b] can be solved with one line of code in Octave (or MatLab) as follows: [x]=[A]\[b]. [x] will be the matrix whose column vectors minimize the sum of the squares of the corresponding column vectors in the residuals matrix [r]=[b]-[A][x]

  • But it is much faster (and acceptably stable and accurate for OPR purposes) to create and solve the Normal Equations [N][x]=[d] instead

  • [N] and [d] can be formed from [At] and [b] as follows: [N]=[At][At]' and [d]=[At][b]

  • [N][x]=[d] can be solved with one line of code in Octave (or MatLab) as follows: [x]=[N]\[d]

  • solving [N][x]=[d] is faster than solving [A][x]≈[b] because 1) [N] is a smaller matrix than [A], and 2) N is symmetric positive definite so Cholesky factorization can be used

  • the computations [A]=[At]', N=[At][A], and [d]=[At][b] take about 20ms on the 8-year-old Pentium D machine

  • the Normal Equations solution [x]=[N]\[d] takes about 210ms on the 8-year-old Pentium D machine, using a single core

  • there may be Cholesky factoring algorithms which would permit multiple cores to be used for factoring

See this post for some additional details:
http://www.chiefdelphi.com/forums/sh...00#post1404300


Greg McKaskle 19-12-2014 22:32

Re: OPR Programming Challenge
 
3 Attachment(s)
Less busy now, so I met with Jim a few we made a few more passes. The attached image shows the top level LV diagram. And the zip has the code saved in LV 2014. The other image shows the time breakdown for the different portions.

The code runs in around 20ms on my laptop running a VM. It iterates until the residual is about 3 digits after the decimal.

Building the sparse matrix creates the diagonal terms and upper portion independently, which substantially speeds the elimination of duplicate terms. The complete solver was swapped out for one based on conjugate gradient.

The commented code loads from disk, the enabled code has the data in RAM, in a constant. Loading from disk adds another 20ms.

Greg McKaskle

Ether 19-12-2014 22:59

Re: OPR Programming Challenge
 
Quote:

Originally Posted by Greg McKaskle (Post 1415357)
Less busy now, so I met with Jim a few we made a few more passes. The attached image shows the top level LV diagram. And the zip has the code saved in LV 2014. The other image shows the time breakdown for the different portions.

Thanks for posting this Greg. Very impressive performance.

Quote:

It iterates until the residual is about 3 digits after the decimal.
What did you use for the initial guess for the iterative solver?



Greg McKaskle 20-12-2014 07:11

Re: OPR Programming Challenge
 
The initial guess is the zero vector. It takes 11 iterations to converge.

Greg McKaskle


All times are GMT -5. The time now is 22:16.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi