View Single Post
  #3   Spotlight this post!  
Unread 29-04-2009, 22:24
Andrew Schreiber Andrew Schreiber is offline
Joining the 900 Meme Team
FRC #0079
 
Join Date: Jan 2005
Rookie Year: 2000
Location: Misplaced Michigander
Posts: 4,062
Andrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond repute
Re: FIRST Event Data in XML format

Here is the python script for the ranking of the teams. It SHOULD work for all of the regionals for which the event page has data.

In the spirit of freedom all code is licensed under the GPL
Quote:
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Obviously, I can't enforce the license but I ask that you make any programs that utilize portions of this code available to the community in a timely fashion.


Code:
#!/usr/bin/env python
#The above MUST be the first line in order to be able to execute on *nix systems. 
#To exec this you must have permission to do so
#chmod +x [name]

import re
import urllib2
import sys

if len(sys.argv) < 2:
	sys.exit("Must provide event code")

event_code = sys.argv[1]

def removeHTML(string):
	string = re.sub("<.*?>","",string)
	string = re.sub("</.*>\\n","",string)
	string = re.sub("\\n","",string)
	return string
try:
	url_buffer = urllib2.urlopen('http://www2.usfirst.org/2009comp/events/'+event_code+'/rankings.html')
	page_data = url_buffer.read()


	#	sys.exit("This is not a valid event code. If you believe this to be invalid please report a bug.")

	page_list = list(re.findall("(<TD .*?>[0-9].*</TD>\\n)",page_data))

	if len(page_list) == 0: #This is the case with the oddly formatted pages
		#We strip out the stuff that is mucking us up
		page_data = re.sub("<p.*\\n.*style.*?>","",page_data);
		page_data = re.sub("<o:p.*/p>","",page_data);
		#and parse again
		page_list = list(re.findall("<td .*?>\\n.*\\n.*</td>",page_data))
		if len(page_list) == 0:#if it is still 0 there is something else wrong and we need to report a bug
			sys.exit("This is not a valid event code. If you believe this to be invalid please report a bug.")

	print "<Event>"
	for i in range(0,len(page_list)/9):
		print "\t<Ranking>"
		print "\t\t<Rank>"+removeHTML(page_list[9*i])+"</Rank>"
		print "\t\t<Team_Number>"+removeHTML(page_list[9*i+1])+"</Team_Number>"
		print "\t\t<Wins>"+removeHTML(page_list[9*i+2])+"</Wins>"
		print "\t\t<Losses>"+removeHTML(page_list[9*i+3])+"</Losses>"
		print "\t\t<Ties>"+removeHTML(page_list[9*i+4])+"</Ties>"
		print "\t\t<Plays>"+removeHTML(page_list[9*i+5])+"</Plays>"
		print "\t\t<QS>"+removeHTML(page_list[9*i+6])+"</QS>"
		print "\t\t<RS>"+removeHTML(page_list[9*i+7])+"</RS>"
		print "\t\t<MP>"+removeHTML(page_list[9*i+8])+"</MP>"
		print "\t</Ranking>"
	print "</Event>"
except:
	sys.exit("This is not a valid event code. If you believe this to be invalid please report a bug.")
Attached is the output of this script when run with glr as an option. (Really Brandon, no XML format allowed?)

I make no claims as to the efficiency, this is my first foray into Python.
Attached Files
File Type: txt RankingExample.txt (7.7 KB, 84 views)
__________________




.

Last edited by Andrew Schreiber : 29-04-2009 at 22:29.