Here is the python script for the ranking of the teams. It SHOULD work for all of the regionals for which the event page has data.
In the spirit of freedom all code is licensed under the GPL
Quote:
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
Obviously, I can't enforce the license but I ask that you make any programs that utilize portions of this code available to the community in a timely fashion.
Code:
#!/usr/bin/env python
#The above MUST be the first line in order to be able to execute on *nix systems.
#To exec this you must have permission to do so
#chmod +x [name]
import re
import urllib2
import sys
if len(sys.argv) < 2:
sys.exit("Must provide event code")
event_code = sys.argv[1]
def removeHTML(string):
string = re.sub("<.*?>","",string)
string = re.sub("</.*>\\n","",string)
string = re.sub("\\n","",string)
return string
try:
url_buffer = urllib2.urlopen('http://www2.usfirst.org/2009comp/events/'+event_code+'/rankings.html')
page_data = url_buffer.read()
# sys.exit("This is not a valid event code. If you believe this to be invalid please report a bug.")
page_list = list(re.findall("(<TD .*?>[0-9].*</TD>\\n)",page_data))
if len(page_list) == 0: #This is the case with the oddly formatted pages
#We strip out the stuff that is mucking us up
page_data = re.sub("<p.*\\n.*style.*?>","",page_data);
page_data = re.sub("<o:p.*/p>","",page_data);
#and parse again
page_list = list(re.findall("<td .*?>\\n.*\\n.*</td>",page_data))
if len(page_list) == 0:#if it is still 0 there is something else wrong and we need to report a bug
sys.exit("This is not a valid event code. If you believe this to be invalid please report a bug.")
print "<Event>"
for i in range(0,len(page_list)/9):
print "\t<Ranking>"
print "\t\t<Rank>"+removeHTML(page_list[9*i])+"</Rank>"
print "\t\t<Team_Number>"+removeHTML(page_list[9*i+1])+"</Team_Number>"
print "\t\t<Wins>"+removeHTML(page_list[9*i+2])+"</Wins>"
print "\t\t<Losses>"+removeHTML(page_list[9*i+3])+"</Losses>"
print "\t\t<Ties>"+removeHTML(page_list[9*i+4])+"</Ties>"
print "\t\t<Plays>"+removeHTML(page_list[9*i+5])+"</Plays>"
print "\t\t<QS>"+removeHTML(page_list[9*i+6])+"</QS>"
print "\t\t<RS>"+removeHTML(page_list[9*i+7])+"</RS>"
print "\t\t<MP>"+removeHTML(page_list[9*i+8])+"</MP>"
print "\t</Ranking>"
print "</Event>"
except:
sys.exit("This is not a valid event code. If you believe this to be invalid please report a bug.")
Attached is the output of this script when run with glr as an option. (Really Brandon, no XML format allowed?)
I make no claims as to the efficiency, this is my first foray into Python.