![]() |
loading a COO file into Python
Greetings Python gurus. I'm looking for a better (faster) way to load a plaintext COO (row,column,value tuple) file into Python. The values are all integers. File Aijv.dat is a ~7.5 MB plaintext file in COO format representing a 178420x2696 sparse matrix with 535260 non-zero entries. Here's a portion of code showing Python taking almost 19 seconds to load Aijv.dat: Code:
Python 2.7.5 (default, May 15 2013, 22:43:36)Code:
GNU Octave, version 3.6.4 |
Re: loading a COO file into Python
Write a native module to do it faster? :rolleyes:
|
Re: loading a COO file into Python
I could certainly do that as a last resort if Python doesn't already have a built-in function to do it. Long experience has taught me to explore the capabilities of the language to avoid re-inventing the wheel. |
Re: loading a COO file into Python
The simplest solution is often the easiest one. I'd personally just parse it myself and read line by line with a file reader, parse into an object and add each one into an array.
My 2c. |
Re: loading a COO file into Python
Here's a link to the 7zipped Aij.dat file. If your solution is the simplest and easiest one, please show us how you propose to "parse it myself and read line by line with a file reader, parse into an object and add each one into an array". |
Re: loading a COO file into Python
Apparently pandas has something for this that outperforms numpy.loadtxt?
http://wesmckinney.com/blog/a-new-hi...ne-for-pandas/ And here: http://akuederle.com/stop-using-numpy-loadtxt |
Re: loading a COO file into Python
Thanks Dustin. I'll try that. |
Re: loading a COO file into Python
I'm curious as to why you're using Python in the first place if performance is a concern.
|
Re: loading a COO file into Python
Well this is a bit unexpected. Looks like I get to play the role of Python guru momentarily :) Python is supported by an immense library of highly optimized math and science routines. It's as fast (and faster in some cases) as Matlab/Octave, and in some cases actually easier to use. It's free, has a large installation base, and the code is quite readable... all of which makes it a convenient vehicle for sharing solutions... such as how to efficiently compute World OPR or other metrics using both min |Ax-b|2 and min |Ax-b]1 etc. |
Re: loading a COO file into Python
This only took 3.4 seconds to run on my Win 8.1 system.
Code:
import os |
Re: loading a COO file into Python
Quote:
It took ~5.3 seconds on my 10-year-old PentiumD Win32 machine: Code:
>>> start = time.time( )Also: Code:
>>> start=time.time()BTW, the benchmark time on my machine appears to be about 0.5 seconds. That's the total time it took a compiled Win32 app to read Aijv.dat into a transposed array. |
Re: loading a COO file into Python
Quote:
Just ran it on a slower machine and it loaded Aijv.dat in less than one second. You are my Python guru Dustin:) |
Re: loading a COO file into Python
Quote:
My guess is that because data isn't already a numpy array, it has to do a lot of work to convert it to one first, and then do the transpose operation. If I'm correct, a second transpose on the result of the first transpose would be very fast, similar to the operation on Ajiv. If I understand what transpose does, it most likely is just moving the pointers to the various axes around -- thus why it's so fast. My understanding of how numpy arrays work is that they try really hard to not actually move data around, but many common operations can be done by just creating/moving various pointers around. |
| All times are GMT -5. The time now is 04:42. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi