Quote:
Originally Posted by Michael Hill
EDIT: I actually just realized there was some code that slows it down in there. the "gather"s in developing the "out" matrix are more remnants of GPU computing that I didn't take out. They are used for getting the variables out of GPU memory to my RAM. After removing them, It took total computation time to about 0.48 seconds.
|
Here's another optimization you can try. Replace this code:
X = A\B(:,1);
Y = A\B(:,2);
... with this:
XY = A\B
... then you'll have only one left-divide, and the XY vector will be a 2-column vector containing X and Y.