|
|
|
![]() |
|
|||||||
|
||||||||
![]() |
| Thread Tools | Rate Thread | Display Modes |
|
#1
|
||||
|
||||
|
loading a COO file into Python
Greetings Python gurus. I'm looking for a better (faster) way to load a plaintext COO (row,column,value tuple) file into Python. The values are all integers. File Aijv.dat is a ~7.5 MB plaintext file in COO format representing a 178420x2696 sparse matrix with 535260 non-zero entries. Here's a portion of code showing Python taking almost 19 seconds to load Aijv.dat: Code:
Python 2.7.5 (default, May 15 2013, 22:43:36)
[MSC v.1500 32 bit (Intel)] on win32
>>> import numpy
>>> import time
>>> import scipy
>>> import scipy.sparse as sp
>>> import scipy.sparse.linalg
>>> start=time.time()
>>> Aijv = numpy.loadtxt('Aijv.dat', 'int')
>>> time.time()-start
18.844000101089478
Code:
GNU Octave, version 3.6.4
+ tic;
+ Aijv = dlmread ('Aijv.dat');
+ toc
Elapsed time is 5.547 seconds.
|
|
#2
|
||||
|
||||
|
Re: loading a COO file into Python
Write a native module to do it faster?
![]() |
|
#3
|
||||
|
||||
|
Re: loading a COO file into Python
I could certainly do that as a last resort if Python doesn't already have a built-in function to do it. Long experience has taught me to explore the capabilities of the language to avoid re-inventing the wheel. |
|
#4
|
||||
|
||||
|
Re: loading a COO file into Python
The simplest solution is often the easiest one. I'd personally just parse it myself and read line by line with a file reader, parse into an object and add each one into an array.
My 2c. |
|
#5
|
||||
|
||||
|
Re: loading a COO file into Python
Here's a link to the 7zipped Aij.dat file. If your solution is the simplest and easiest one, please show us how you propose to "parse it myself and read line by line with a file reader, parse into an object and add each one into an array". |
|
#6
|
||||
|
||||
|
Re: loading a COO file into Python
Apparently pandas has something for this that outperforms numpy.loadtxt?
http://wesmckinney.com/blog/a-new-hi...ne-for-pandas/ And here: http://akuederle.com/stop-using-numpy-loadtxt |
|
#7
|
||||
|
||||
|
Re: loading a COO file into Python
Thanks Dustin. I'll try that. |
|
#8
|
||||
|
||||
|
Re: loading a COO file into Python
I'm curious as to why you're using Python in the first place if performance is a concern.
|
|
#9
|
||||
|
||||
|
Re: loading a COO file into Python
Well this is a bit unexpected. Looks like I get to play the role of Python guru momentarily ![]() Python is supported by an immense library of highly optimized math and science routines. It's as fast (and faster in some cases) as Matlab/Octave, and in some cases actually easier to use. It's free, has a large installation base, and the code is quite readable... all of which makes it a convenient vehicle for sharing solutions... such as how to efficiently compute World OPR or other metrics using both min |Ax-b|2 and min |Ax-b]1 etc. |
|
#10
|
||||
|
||||
|
Re: loading a COO file into Python
This only took 3.4 seconds to run on my Win 8.1 system.
Code:
import os
import time
time_start = time.time( )
data = [ ]
for line in open( r'D:\temp\Aijv.dat', 'r' ):
data.append( [ int( x ) for x in line.rstrip( ).split( ' ' ) ] )
print( '{0:.2f} secs'.format( time.time( ) - time_start ) )
|
|
#11
|
||||
|
||||
|
Re: loading a COO file into Python
Thanks. That's quite an improvement.
It took ~5.3 seconds on my 10-year-old PentiumD Win32 machine: Code:
>>> start = time.time( )
>>> data = [ ]
>>> for line in open( r'k:/data/Aijv big data.dat', 'r' ):
... data.append( [ int( x ) for x in line.rstrip( ).split( ' ' ) ] )
...
>>> time.time()-start
5.296999931335449
>>> start=time.time()
>>> Aijv = np.loadtxt('k:/data/Aijv big data.dat', 'int');
>>> time.time()-start
18.858999967575073
Also: Code:
>>> start=time.time() >>> Ajiv = np.transpose(Aijv) >>> time.time()-start 0.0 >>> start=time.time() >>> datajiv = np.transpose(data) >>> time.time()-start 3.937999963760376 BTW, the benchmark time on my machine appears to be about 0.5 seconds. That's the total time it took a compiled Win32 app to read Aijv.dat into a transposed array. |
|
#12
|
||||
|
||||
|
Re: loading a COO file into Python
Quote:
Just ran it on a slower machine and it loaded Aijv.dat in less than one second. You are my Python guru Dustin ![]() Last edited by Ether : 10-10-2016 at 15:36. |
|
#13
|
||||
|
||||
|
Re: loading a COO file into Python
Quote:
![]() My guess is that because data isn't already a numpy array, it has to do a lot of work to convert it to one first, and then do the transpose operation. If I'm correct, a second transpose on the result of the first transpose would be very fast, similar to the operation on Ajiv. If I understand what transpose does, it most likely is just moving the pointers to the various axes around -- thus why it's so fast. My understanding of how numpy arrays work is that they try really hard to not actually move data around, but many common operations can be done by just creating/moving various pointers around. |
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|