calling all "R Statistical Package" gurus

*Given m-by-n matrix [A], (m>n), and m-by-1 column vector ** for the overdetermined system of linear equations

[A][x] = **,

What is the recommended way to use R to find the n-by-1 column vector [x] which minimizes the L1 norm of **-[A][x] ?

aka minx |**-[A][x]|[sub]1[/sub], aka Least Absolute Deviation (LAD) ?


This paper might be of use to you.

I have personally used rgenoud, but my code for it is sitting on the desktop in my lab at university and wouldn’t be much use to you because it maximize the cost function described in this paper.

Whew. I somehow managed to get an MS in physics without taking a single linear algebra course (I don’t know how, and I don’t advise it!). When I’ve had this sort of problem since entering the workforce, I just trusted MATLAB to figure out the best method.

I already have an Octave (Matlab) script that does this. This thread is about the recommended way to do it using R.

I found this statement:

Package quantreg contains variations of simplex and of interior point routines ( nlrq(), crq()). It provides an interface to L1 regression in the R code of function rq(). [SPLP, LP, IPM]

… at this webpage:

I also found this statement:

Koenker’s quantile regression package quantreg contains L1 (aka LAD, least absolute deviations)-regression as a special case

… at this webpage:

http://cran.cnr.berkeley.edu/web/views/Robust.html

Attached is a zip file containing an [A] matrix and ** column vector in CSV format. Would some kind R guru please show how to use the above method to do L1 regression on the attached data? (see original post in this thread)

***

A and b.zip (1.23 KB)


A and b.zip (1.23 KB)

I should add the following qualifier: the above is true for small datasets (like analyzing one event). For larger datasets (like doing world OPR for an entire season’s worth of events), not so much.

Matlab/Octave has good support for sparse matrices to effectively process larger datasets.

Is the difference just with how easy it is to write the code / input the data, or do they do the calculations differently? If the latter is true, would different CAS programs fall into a few different groups, or would they all do the calculations differently?

*R uses the rq() function in the quantreg package which does quantile regression. It converts the problem to SDP and solves it. If you set the tau parameter of rq() to 50%, it will do L1 regression.

Python has a convex optimization package that contains a function for L1 regression. You don’t have to know how to setup the problem, just pass it your A & b data.

AMPL is a modeling language which allows you to state the problem as a constrained minimization, and then AMPL expands it into a linear program format that can be passed to your linear solver of choice.

Octave has glpk, the GNU Linear Programming kit. You have to manually code the problem as a linear program and then pass it to glpk() for solution.

Matlab has the linprog() function which does the same thing as Octave’s glpk(). It also has the CVX package, which allows you to use Matlab as a modeling language, but it is not intended for large problems.