Go to Post On day two of build season, I made the mistake of saying "We have six weeks, what could possibly go wrong?" - Zach Herbst [more]
Home
Go Back   Chief Delphi > Technical > Technical Discussion
CD-Media   CD-Spy  
portal register members calendar search Today's Posts Mark Forums Read FAQ rules

 
Closed Thread
Thread Tools Rate Thread Display Modes
  #1   Spotlight this post!  
Unread 29-04-2009, 17:11
Andrew Schreiber Andrew Schreiber is offline
Joining the 900 Meme Team
FRC #0079
 
Join Date: Jan 2005
Rookie Year: 2000
Location: Misplaced Michigander
Posts: 4,068
Andrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond repute
FIRST Event Data in XML format

Good Afternoon,

A recent project required me to parse the team history and event pages from the USFirst.org website. Then another project forced me to redo the same task, obviously I reused most of the code but this became quite tiresome because handling everything as text strings has some major drawbacks, foremost among them being that I have to use regular expressions for everything. As a result I decided to create some scripts that will scrape the site and return XML data for various things. For example, one of the scripts pulls the ranking data from an event. The following is a small example from the Lansing Event. (I truncated the results, the actual output does contain all the teams)

Code:
<Event>
        <Ranking>
                <Rank>1</Rank>
                <Team_Number>67</Team_Number>
                <Wins>12</Wins>
                <Losses>0</Losses>
                <Ties>0</Ties>
                <Plays>12</Plays>
                <QS>24.00</QS>
                <RS>51.75</RS>
                <MP>117</MP>
        </Ranking>
        <Ranking>
                <Rank>2</Rank>
                <Team_Number>1</Team_Number>
                <Wins>10</Wins>
                <Losses>2</Losses>
                <Ties>0</Ties>
                <Plays>12</Plays>
                <QS>20.00</QS>
                <RS>46.83</RS>
                <MP>95</MP>
        </Ranking>
</Event>
My primary question is, would the FIRST community be interested in these scripts? If so, what pages would you like to see (so I can prioritize writing them) They are being written in Python but the heavy lifting is all done by regular expressions so they should be adaptable to any language.
__________________




.

Last edited by Andrew Schreiber : 29-04-2009 at 17:15. Reason: My Grammar was bad, I tried to make it less bad.
  #2   Spotlight this post!  
Unread 29-04-2009, 22:05
Unsung FIRST Hero
Greg Marra Greg Marra is offline
[automate(a) for a in tasks_to_do]
FRC #5507 (Robotic Eagles)
Team Role: Mentor
 
Join Date: Oct 2004
Rookie Year: 2005
Location: San Francisco, CA
Posts: 2,031
Greg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond repute
Re: FIRST Event Data in XML format

Quote:
Originally Posted by Andrew Schreiber View Post
My primary question is, would the FIRST community be interested in these scripts? If so, what pages would you like to see (so I can prioritize writing them) They are being written in Python but the heavy lifting is all done by regular expressions so they should be adaptable to any language.
I say open source whenever you can!
  #3   Spotlight this post!  
Unread 29-04-2009, 22:24
Andrew Schreiber Andrew Schreiber is offline
Joining the 900 Meme Team
FRC #0079
 
Join Date: Jan 2005
Rookie Year: 2000
Location: Misplaced Michigander
Posts: 4,068
Andrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond repute
Re: FIRST Event Data in XML format

Here is the python script for the ranking of the teams. It SHOULD work for all of the regionals for which the event page has data.

In the spirit of freedom all code is licensed under the GPL
Quote:
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Obviously, I can't enforce the license but I ask that you make any programs that utilize portions of this code available to the community in a timely fashion.


Code:
#!/usr/bin/env python
#The above MUST be the first line in order to be able to execute on *nix systems. 
#To exec this you must have permission to do so
#chmod +x [name]

import re
import urllib2
import sys

if len(sys.argv) < 2:
	sys.exit("Must provide event code")

event_code = sys.argv[1]

def removeHTML(string):
	string = re.sub("<.*?>","",string)
	string = re.sub("</.*>\\n","",string)
	string = re.sub("\\n","",string)
	return string
try:
	url_buffer = urllib2.urlopen('http://www2.usfirst.org/2009comp/events/'+event_code+'/rankings.html')
	page_data = url_buffer.read()


	#	sys.exit("This is not a valid event code. If you believe this to be invalid please report a bug.")

	page_list = list(re.findall("(<TD .*?>[0-9].*</TD>\\n)",page_data))

	if len(page_list) == 0: #This is the case with the oddly formatted pages
		#We strip out the stuff that is mucking us up
		page_data = re.sub("<p.*\\n.*style.*?>","",page_data);
		page_data = re.sub("<o:p.*/p>","",page_data);
		#and parse again
		page_list = list(re.findall("<td .*?>\\n.*\\n.*</td>",page_data))
		if len(page_list) == 0:#if it is still 0 there is something else wrong and we need to report a bug
			sys.exit("This is not a valid event code. If you believe this to be invalid please report a bug.")

	print "<Event>"
	for i in range(0,len(page_list)/9):
		print "\t<Ranking>"
		print "\t\t<Rank>"+removeHTML(page_list[9*i])+"</Rank>"
		print "\t\t<Team_Number>"+removeHTML(page_list[9*i+1])+"</Team_Number>"
		print "\t\t<Wins>"+removeHTML(page_list[9*i+2])+"</Wins>"
		print "\t\t<Losses>"+removeHTML(page_list[9*i+3])+"</Losses>"
		print "\t\t<Ties>"+removeHTML(page_list[9*i+4])+"</Ties>"
		print "\t\t<Plays>"+removeHTML(page_list[9*i+5])+"</Plays>"
		print "\t\t<QS>"+removeHTML(page_list[9*i+6])+"</QS>"
		print "\t\t<RS>"+removeHTML(page_list[9*i+7])+"</RS>"
		print "\t\t<MP>"+removeHTML(page_list[9*i+8])+"</MP>"
		print "\t</Ranking>"
	print "</Event>"
except:
	sys.exit("This is not a valid event code. If you believe this to be invalid please report a bug.")
Attached is the output of this script when run with glr as an option. (Really Brandon, no XML format allowed?)

I make no claims as to the efficiency, this is my first foray into Python.
Attached Files
File Type: txt RankingExample.txt (7.7 KB, 85 views)
__________________




.

Last edited by Andrew Schreiber : 29-04-2009 at 22:29.
  #4   Spotlight this post!  
Unread 30-04-2009, 14:14
Andrew Schreiber Andrew Schreiber is offline
Joining the 900 Meme Team
FRC #0079
 
Join Date: Jan 2005
Rookie Year: 2000
Location: Misplaced Michigander
Posts: 4,068
Andrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond repute
Re: FIRST Event Data in XML format

Update to this, the corrections I made for Dallas and Connecticut did not work, I will be trying to fix those later tonight.

As a consolation prize I tossed together a quick (read simple and not pretty) page you can grab xml data from using scripts though I would prefer you run the script on your own machines. If you want a one off piece of data feel free to use it.

http://schreiaj.ath.cx/share/FRC_Parsers/ranking.php

The page takes a couple of arguments, Event_Code which is the event code used by FIRST, these can be found on frclinks.com. It also takes HTML_Display which is either true or false. A true value will encode the page such that the tags for the xml show up in the browser, otherwise they will not. HTML_Display is optional but without an Event_Code the page will not load anything.

An example is

http://schreiaj.ath.cx/share/FRC_Par...Event_Code=GLR

It will load the Lansing District event to display in the browser. Any questions feel free to ask.

I will be making the updated script available as soon as possible. Sorry about that.

EDIT: The Championship divisions rankings do work, FRClinks has the wrong code for them, it is the full name of the division. ie, Newton is Newton.
__________________




.

Last edited by Andrew Schreiber : 30-04-2009 at 14:21.
  #5   Spotlight this post!  
Unread 30-04-2009, 14:41
Unsung FIRST Hero
Greg Marra Greg Marra is offline
[automate(a) for a in tasks_to_do]
FRC #5507 (Robotic Eagles)
Team Role: Mentor
 
Join Date: Oct 2004
Rookie Year: 2005
Location: San Francisco, CA
Posts: 2,031
Greg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond repute
Re: FIRST Event Data in XML format

Quote:
Originally Posted by Andrew Schreiber View Post
EDIT: The Championship divisions rankings do work, FRClinks has the wrong code for them, it is the full name of the division. ie, Newton is Newton.
FIRST's system is inconsistent about the Divisions. They also had their match results posted differently than other events this season. I am not sure if this is indicative of an overall change in the system, or just Divisions being weird.
  #6   Spotlight this post!  
Unread 30-04-2009, 21:31
rsisk's Avatar
rsisk rsisk is offline
The GURU Channel
AKA: Richard Sisk
FRC #2493 (Robokong)
Team Role: Mentor
 
Join Date: Jan 2008
Rookie Year: 2007
Location: Riverside, CA
Posts: 2,749
rsisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond repute
Send a message via MSN to rsisk
Re: FIRST Event Data in XML format

After parsing First web pages myself for the Regional Twitter Accounts, I now have a newly found respect for what the team at The Blue Alliance has done to gather data

Seemed like every week was a scramble to adapt to something new. And then when I found out that Einstein's data was not posted real time, well I put the NASA feed projected on a wall at home and posted match scores by hand.

What a joy it would be if First offered some way to get to this information besides parsing their web sites. Not 100% sure what I would expect, maybe a web service that made the data available?

The FMSFRC twitter feed came close to offering some data in a real time feed format, and maybe that is the answer. But now it is tough to go back and scrape all that data from twitter pages if you didn't get it during the realtime feed.
  #7   Spotlight this post!  
Unread 30-04-2009, 22:12
Andrew Schreiber Andrew Schreiber is offline
Joining the 900 Meme Team
FRC #0079
 
Join Date: Jan 2005
Rookie Year: 2000
Location: Misplaced Michigander
Posts: 4,068
Andrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond repute
Re: FIRST Event Data in XML format

Quote:
Originally Posted by rsisk View Post
What a joy it would be if First offered some way to get to this information besides parsing their web sites. Not 100% sure what I would expect, maybe a web service that made the data available?

The FMSFRC twitter feed came close to offering some data in a real time feed format, and maybe that is the answer. But now it is tough to go back and scrape all that data from twitter pages if you didn't get it during the realtime feed.
Richard, I will be making the entire FRCFMS feed (That twitter still lets me grab) available as soon as I get time to do it. I have a script written to do it. If someone would like to run it I can give instructions on doing it.

Yes FIRST would make all of our lives simpler if they would find a standard and stick to it. Either let us have an API we can make calls to (Published well before kickoff) or at least have a standardized page layout and don't change it without warning us and telling us about the changes. One of the additional reasons for this project is that we have a STANDARD way of accessing data.

If anyone would like to offer assistance feel free to shoot me a PM.
__________________




.
  #8   Spotlight this post!  
Unread 01-05-2009, 17:12
Unsung FIRST Hero
Nate Smith Nate Smith is offline
FRC Key Volunteer Trainer
AKA: CrazyNate
no team
 
Join Date: Jun 2001
Rookie Year: 1998
Location: Old Town, Maine
Posts: 1,029
Nate Smith is a splendid one to beholdNate Smith is a splendid one to beholdNate Smith is a splendid one to beholdNate Smith is a splendid one to beholdNate Smith is a splendid one to beholdNate Smith is a splendid one to beholdNate Smith is a splendid one to beholdNate Smith is a splendid one to behold
Send a message via AIM to Nate Smith Send a message via Yahoo to Nate Smith
Re: FIRST Event Data in XML format

Quote:
Originally Posted by Andrew Schreiber View Post
Yes FIRST would make all of our lives simpler if they would find a standard and stick to it. Either let us have an API we can make calls to (Published well before kickoff) or at least have a standardized page layout and don't change it without warning us and telling us about the changes. One of the additional reasons for this project is that we have a STANDARD way of accessing data.
On my to-do list depending on how things go....can't say much more than that right now...I might be able to share more info in PM, if you ask nicely...
__________________
Nate Smith
nsmith@smythsoft.com
12 seasons, 4 teams, and more time logged behind the scorekeeper's table than I care to remember...
returning for 2011? only time will tell...
  #9   Spotlight this post!  
Unread 02-05-2009, 13:01
Andrew Schreiber Andrew Schreiber is offline
Joining the 900 Meme Team
FRC #0079
 
Join Date: Jan 2005
Rookie Year: 2000
Location: Misplaced Michigander
Posts: 4,068
Andrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond repute
Re: FIRST Event Data in XML format

Nate, I was just grumbling. Im already providing XML information for a couple of the pages and am working on the others.

As an update:

http://schreiaj.ath.cx/share/FRC_Par...alschedule.php will provide the qualification schedules for the regionals that are not bizarre.

http://schreiaj.ath.cx/share/FRC_Parsers/ranking.php will provide the qualification ranking data.

Both pages take the following options:

Event_Code - Event Code from frclinks.com. Since I now use frclinks to find the pages the exact codes given on there are what need to be used.

Year - 2008, 2009 are currently supported.

HTML_Display [true,false] - This decides whether to escape the tags so that they display in the browser. If you are parsing the xml in a script I would suggest leaving this false (or blank). If you plan on copying and pasting the xml anywhere from the browser use true.

Currently I am working on parsing the team history pages and will post that as soon as I am done.
__________________




.
  #10   Spotlight this post!  
Unread 02-05-2009, 13:09
Unsung FIRST Hero
Greg Marra Greg Marra is offline
[automate(a) for a in tasks_to_do]
FRC #5507 (Robotic Eagles)
Team Role: Mentor
 
Join Date: Oct 2004
Rookie Year: 2005
Location: San Francisco, CA
Posts: 2,031
Greg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond reputeGreg Marra has a reputation beyond repute
Re: FIRST Event Data in XML format

Quote:
Originally Posted by Andrew Schreiber View Post
http://schreiaj.ath.cx/share/FRC_Par...alschedule.php will provide the qualification schedules for the regionals that are not bizarre.
I believe that bizarre page formatting comes from the pages being opened in Microsoft Word, edited, and saved again. Some result pages are full of Word HTML markup.
  #11   Spotlight this post!  
Unread 02-05-2009, 13:19
Andrew Schreiber Andrew Schreiber is offline
Joining the 900 Meme Team
FRC #0079
 
Join Date: Jan 2005
Rookie Year: 2000
Location: Misplaced Michigander
Posts: 4,068
Andrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond reputeAndrew Schreiber has a reputation beyond repute
Re: FIRST Event Data in XML format

Quote:
Originally Posted by Greg Marra View Post
I believe that bizarre page formatting comes from the pages being opened in Microsoft Word, edited, and saved again. Some result pages are full of Word HTML markup.

Don't get me started on Word and HTML

On an unrelated note, in the spirit of open source all the code is available http://schreiaj.ath.cx/share/FRC_Parsers/ and the current versions I am working on at the moment are at http://schreiaj.ath.cx/share/FRC_Parsers/Parsers_Beta/
__________________




.
  #12   Spotlight this post!  
Unread 03-05-2009, 04:18
Nibbles Nibbles is offline
Interstellar Hitchhiker
AKA: Austin Wright
FRC #0498 (Cobra Commanders)
Team Role: Alumni
 
Join Date: Jan 2008
Rookie Year: 2003
Location: Arizona
Posts: 103
Nibbles is just really niceNibbles is just really niceNibbles is just really niceNibbles is just really niceNibbles is just really nice
Re: FIRST Event Data in XML format

Where did you find frclinks.com? That's a nifty idea.

I don't know how well one parsing method works against others, a regex will work as long as they don't add a new table to the document, and don't add non-numerical data. Likewise, an HTML parser will, and will also properly handle entities like &lt; , but any change in structure will not work (though that is a simple parameter change telling it the new path to the data). I just use the DOM and SimpleXML parsers in PHP, Python (eewww, Python) must have something similar.

I have an initiative to standardize how FIRST data is published, XML Interchange format. An example that mixes the rankings and schedule:
Code:
<event season="2009" code="GLR">
        <team number="67" game:rank="1" game:win="12" game:lost="0" game:tie="0" game:plays="0" game:qs="24.00" game:rs="51.75" game:mp="117" />
        ...

        <match type="qualification" number="1" time="11:45">
            <alliance name="red">
                <team position="1" number="1940"/>
                <team position="2" number="216"/>
                <team position="3" number="123"/>
            </alliance>
            <alliance name="blue">
                <team position="1" number="1896"/>
                <team position="2" number="468"/>
                <team position="3" number="894"/>
            </alliance>
        </match>
        ...
</event>
Where game is some XML namespace (if you like).

As for licensing, as a rule of thumb, if the code is shorter then the license would be, I put it in public domain.
__________________
Help standardize match data! Use the XML interchange format. (Specification page)
AAA_awright on Freenode IRC chat. (Join us at ##FRC on chat.freenode.net, or in your browser)

Last edited by Nibbles : 03-05-2009 at 04:37.
Closed Thread


Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
XML Interchange format Nibbles Scouting 25 03-10-2008 02:09
How to write image data to binary PGM file format(P5)? tommy_chai Programming 0 08-10-2007 08:22
Scouting Data Interchange Format proegssilb Scouting 7 06-06-2007 19:29
White Paper Discuss: Karthik's Championship Event Data CD47-Bot Extra Discussion 20 13-04-2004 08:17
XML of the FIRST Q & A system Jack Website Design/Showcase 11 16-01-2004 23:24


All times are GMT -5. The time now is 02:55.

The Chief Delphi Forums are sponsored by Innovation First International, Inc.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi