Go to Post Not everything has to be cashed in or has to have instant recognition. Sometimes being a part of a program of value and opportunity is enough. And, what you put into the program is what you get out of it. - JaneYoung [more]
Home
Go Back   Chief Delphi > Technical > Programming > Python
CD-Media   CD-Spy  
portal register members calendar search Today's Posts Mark Forums Read FAQ rules

 
Reply
Thread Tools Rate Thread Display Modes
  #1   Spotlight this post!  
Unread 06-10-2016, 15:40
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,012
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
loading a COO file into Python


Greetings Python gurus.

I'm looking for a better (faster) way to load a plaintext COO (row,column,value tuple) file into Python. The values are all integers.

File Aijv.dat is a ~7.5 MB plaintext file in COO format representing a
178420x2696 sparse matrix with 535260 non-zero entries.

Here's a portion of code showing Python taking almost 19 seconds to load Aijv.dat:
Code:
Python 2.7.5 (default, May 15 2013, 22:43:36)
[MSC v.1500 32 bit (Intel)] on win32

>>> import numpy
>>> import time
>>> import scipy
>>> import scipy.sparse as sp
>>> import scipy.sparse.linalg

>>> start=time.time()
>>> Aijv = numpy.loadtxt('Aijv.dat', 'int')
>>> time.time()-start
18.844000101089478
The exact same file takes only 5.5 seconds to load in Octave:
Code:
GNU Octave, version 3.6.4

+ tic;
+ Aijv = dlmread ('Aijv.dat');
+ toc
Elapsed time is 5.547 seconds.
Reply With Quote
  #2   Spotlight this post!  
Unread 06-10-2016, 16:34
euhlmann's Avatar
euhlmann euhlmann is offline
CTO, Programmer
AKA: Erik Uhlmann
FRC #2877 (LigerBots)
Team Role: Leadership
 
Join Date: Dec 2015
Rookie Year: 2015
Location: United States
Posts: 304
euhlmann has much to be proud ofeuhlmann has much to be proud ofeuhlmann has much to be proud ofeuhlmann has much to be proud ofeuhlmann has much to be proud ofeuhlmann has much to be proud ofeuhlmann has much to be proud ofeuhlmann has much to be proud of
Re: loading a COO file into Python

Write a native module to do it faster?
__________________
Creator of SmartDashboard.js, an extensible nodejs/webkit replacement for SmartDashboard


https://ligerbots.org
Reply With Quote
  #3   Spotlight this post!  
Unread 06-10-2016, 16:47
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,012
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: loading a COO file into Python


I could certainly do that as a last resort if Python doesn't already have a built-in function to do it.

Long experience has taught me to explore the capabilities of the language to avoid re-inventing the wheel.


Reply With Quote
  #4   Spotlight this post!  
Unread 06-10-2016, 18:00
tjf's Avatar
tjf tjf is offline
FIRST Year, Best Year
AKA: Tim Flynn
FRC #1257 (Parallel Universe)
Team Role: College Student
 
Join Date: Jun 2016
Rookie Year: 2016
Location: New Jersey
Posts: 123
tjf is a splendid one to beholdtjf is a splendid one to beholdtjf is a splendid one to beholdtjf is a splendid one to beholdtjf is a splendid one to beholdtjf is a splendid one to beholdtjf is a splendid one to behold
Re: loading a COO file into Python

The simplest solution is often the easiest one. I'd personally just parse it myself and read line by line with a file reader, parse into an object and add each one into an array.

My 2c.
__________________

1257 (2016) - Student
1257 (2017) - Business Mentor (up in your business!)
KD2KRT
Reply With Quote
  #5   Spotlight this post!  
Unread 06-10-2016, 18:12
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,012
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: loading a COO file into Python


Here's a link to the 7zipped Aij.dat file.

If your solution is the simplest and easiest one, please show us how you propose to "parse it myself and read line by line with a file reader, parse into an object and add each one into an array".


Reply With Quote
  #6   Spotlight this post!  
Unread 06-10-2016, 23:36
virtuald's Avatar
virtuald virtuald is offline
RobotPy Guy
AKA: Dustin Spicuzza
FRC #1418 (), FRC #1973, FRC #4796, FRC #6367 ()
Team Role: Mentor
 
Join Date: Dec 2008
Rookie Year: 2003
Location: Boston, MA
Posts: 1,035
virtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant future
Re: loading a COO file into Python

Apparently pandas has something for this that outperforms numpy.loadtxt?

http://wesmckinney.com/blog/a-new-hi...ne-for-pandas/

And here:

http://akuederle.com/stop-using-numpy-loadtxt
__________________
Maintainer of RobotPy - Python for FRC
Creator of pyfrc (Robot Simulator + utilities for Python) and pynetworktables/pynetworktables2js (NetworkTables for Python & Javascript)

2017 Season: Teams #1973, #4796, #6369
Team #1418 (remote mentor): Newton Quarterfinalists, 2016 Chesapeake District Champion, 2x Innovation in Control award, 2x district event winner
Team #1418: 2015 DC Regional Innovation In Control Award, #2 seed; 2014 VA Industrial Design Award; 2014 Finalists in DC & VA
Team #2423: 2012 & 2013 Boston Regional Innovation in Control Award


Resources: FIRSTWiki (relaunched!) | My Software Stuff
Reply With Quote
  #7   Spotlight this post!  
Unread 07-10-2016, 00:36
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,012
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: loading a COO file into Python


Thanks Dustin.

I'll try that.


Reply With Quote
  #8   Spotlight this post!  
Unread 07-10-2016, 09:12
euhlmann's Avatar
euhlmann euhlmann is offline
CTO, Programmer
AKA: Erik Uhlmann
FRC #2877 (LigerBots)
Team Role: Leadership
 
Join Date: Dec 2015
Rookie Year: 2015
Location: United States
Posts: 304
euhlmann has much to be proud ofeuhlmann has much to be proud ofeuhlmann has much to be proud ofeuhlmann has much to be proud ofeuhlmann has much to be proud ofeuhlmann has much to be proud ofeuhlmann has much to be proud ofeuhlmann has much to be proud of
Re: loading a COO file into Python

I'm curious as to why you're using Python in the first place if performance is a concern.
__________________
Creator of SmartDashboard.js, an extensible nodejs/webkit replacement for SmartDashboard


https://ligerbots.org
Reply With Quote
  #9   Spotlight this post!  
Unread 07-10-2016, 10:12
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,012
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: loading a COO file into Python


Well this is a bit unexpected. Looks like I get to play the role of Python guru momentarily

Python is supported by an immense library of highly optimized math and science routines. It's as fast (and faster in some cases) as Matlab/Octave, and in some cases actually easier to use.

It's free, has a large installation base, and the code is quite readable... all of which makes it a convenient vehicle for sharing solutions... such as how to efficiently compute World OPR or other metrics using both min |Ax-b|2 and min |Ax-b]1 etc.


Reply With Quote
  #10   Spotlight this post!  
Unread 07-10-2016, 15:17
vScourge's Avatar
vScourge vScourge is offline
Videogame Developer
AKA: Adam Pletcher
FRC #4096 (Ctrl-Z)
Team Role: Coach
 
Join Date: Jan 2014
Rookie Year: 2012
Location: Champaign, IL
Posts: 31
vScourge is on a distinguished road
Re: loading a COO file into Python

This only took 3.4 seconds to run on my Win 8.1 system.

Code:
import os
import time

time_start = time.time( )

data = [ ]

for line in open( r'D:\temp\Aijv.dat', 'r' ):
	data.append( [ int( x ) for x in line.rstrip( ).split( ' ' ) ] )

print( '{0:.2f} secs'.format( time.time( ) - time_start ) )
Reply With Quote
  #11   Spotlight this post!  
Unread 10-10-2016, 14:29
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,012
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: loading a COO file into Python

Quote:
Originally Posted by vScourge View Post
This only took 3.4 seconds to run on my Win 8.1 system.
Thanks. That's quite an improvement.

It took ~5.3 seconds on my 10-year-old PentiumD Win32 machine:
Code:
>>> start = time.time( )
>>> data = [ ]
>>> for line in open( r'k:/data/Aijv big data.dat', 'r' ):
...     data.append( [ int( x ) for x in line.rstrip( ).split( ' ' ) ] )
...
>>> time.time()-start
5.296999931335449

>>> start=time.time()
>>> Aijv = np.loadtxt('k:/data/Aijv big data.dat', 'int');
>>> time.time()-start
18.858999967575073

Also:
Code:
>>> start=time.time()
>>> Ajiv = np.transpose(Aijv)
>>> time.time()-start
0.0

>>> start=time.time()
>>> datajiv = np.transpose(data)
>>> time.time()-start
3.937999963760376
Why does it take so much longer to transpose data than Aijv ?


BTW, the benchmark time on my machine appears to be about 0.5 seconds. That's the total time it took a compiled Win32 app to read Aijv.dat into a transposed array.


Reply With Quote
  #12   Spotlight this post!  
Unread 10-10-2016, 14:55
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,012
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: loading a COO file into Python

Quote:
Originally Posted by virtuald View Post
Apparently pandas has something for this that outperforms numpy.loadtxt?
WOW. Pandas is FAST.

Just ran it on a slower machine and it loaded Aijv.dat in less than one second.

You are my Python guru Dustin



Last edited by Ether : 10-10-2016 at 15:36.
Reply With Quote
  #13   Spotlight this post!  
Unread 10-10-2016, 21:00
virtuald's Avatar
virtuald virtuald is offline
RobotPy Guy
AKA: Dustin Spicuzza
FRC #1418 (), FRC #1973, FRC #4796, FRC #6367 ()
Team Role: Mentor
 
Join Date: Dec 2008
Rookie Year: 2003
Location: Boston, MA
Posts: 1,035
virtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant futurevirtuald has a brilliant future
Re: loading a COO file into Python

Quote:
Originally Posted by Ether View Post
WOW. Pandas is FAST.

Just ran it on a slower machine and it loaded Aijv.dat in less than one second.

You are my Python guru Dustin


Glad to help.

My guess is that because data isn't already a numpy array, it has to do a lot of work to convert it to one first, and then do the transpose operation. If I'm correct, a second transpose on the result of the first transpose would be very fast, similar to the operation on Ajiv.

If I understand what transpose does, it most likely is just moving the pointers to the various axes around -- thus why it's so fast. My understanding of how numpy arrays work is that they try really hard to not actually move data around, but many common operations can be done by just creating/moving various pointers around.
__________________
Maintainer of RobotPy - Python for FRC
Creator of pyfrc (Robot Simulator + utilities for Python) and pynetworktables/pynetworktables2js (NetworkTables for Python & Javascript)

2017 Season: Teams #1973, #4796, #6369
Team #1418 (remote mentor): Newton Quarterfinalists, 2016 Chesapeake District Champion, 2x Innovation in Control award, 2x district event winner
Team #1418: 2015 DC Regional Innovation In Control Award, #2 seed; 2014 VA Industrial Design Award; 2014 Finalists in DC & VA
Team #2423: 2012 & 2013 Boston Regional Innovation in Control Award


Resources: FIRSTWiki (relaunched!) | My Software Stuff
Reply With Quote
Reply


Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 06:21.

The Chief Delphi Forums are sponsored by Innovation First International, Inc.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi