![]() |
FIRST Event Data in XML format
Good Afternoon,
A recent project required me to parse the team history and event pages from the USFirst.org website. Then another project forced me to redo the same task, obviously I reused most of the code but this became quite tiresome because handling everything as text strings has some major drawbacks, foremost among them being that I have to use regular expressions for everything. As a result I decided to create some scripts that will scrape the site and return XML data for various things. For example, one of the scripts pulls the ranking data from an event. The following is a small example from the Lansing Event. (I truncated the results, the actual output does contain all the teams) Code:
<Event> |
Re: FIRST Event Data in XML format
Quote:
|
Re: FIRST Event Data in XML format
1 Attachment(s)
Here is the python script for the ranking of the teams. It SHOULD work for all of the regionals for which the event page has data.
In the spirit of freedom all code is licensed under the GPL Quote:
Code:
#!/usr/bin/env pythonI make no claims as to the efficiency, this is my first foray into Python. |
Re: FIRST Event Data in XML format
Update to this, the corrections I made for Dallas and Connecticut did not work, I will be trying to fix those later tonight.
As a consolation prize I tossed together a quick (read simple and not pretty) page you can grab xml data from using scripts though I would prefer you run the script on your own machines. If you want a one off piece of data feel free to use it. http://schreiaj.ath.cx/share/FRC_Parsers/ranking.php The page takes a couple of arguments, Event_Code which is the event code used by FIRST, these can be found on frclinks.com. It also takes HTML_Display which is either true or false. A true value will encode the page such that the tags for the xml show up in the browser, otherwise they will not. HTML_Display is optional but without an Event_Code the page will not load anything. An example is http://schreiaj.ath.cx/share/FRC_Par...Event_Code=GLR It will load the Lansing District event to display in the browser. Any questions feel free to ask. I will be making the updated script available as soon as possible. Sorry about that. EDIT: The Championship divisions rankings do work, FRClinks has the wrong code for them, it is the full name of the division. ie, Newton is Newton. |
Re: FIRST Event Data in XML format
Quote:
|
Re: FIRST Event Data in XML format
After parsing First web pages myself for the Regional Twitter Accounts, I now have a newly found respect for what the team at The Blue Alliance has done to gather data :)
Seemed like every week was a scramble to adapt to something new. And then when I found out that Einstein's data was not posted real time, well I put the NASA feed projected on a wall at home and posted match scores by hand. What a joy it would be if First offered some way to get to this information besides parsing their web sites. Not 100% sure what I would expect, maybe a web service that made the data available? The FMSFRC twitter feed came close to offering some data in a real time feed format, and maybe that is the answer. But now it is tough to go back and scrape all that data from twitter pages if you didn't get it during the realtime feed. |
Re: FIRST Event Data in XML format
Quote:
Yes FIRST would make all of our lives simpler if they would find a standard and stick to it. Either let us have an API we can make calls to (Published well before kickoff) or at least have a standardized page layout and don't change it without warning us and telling us about the changes. One of the additional reasons for this project is that we have a STANDARD way of accessing data. If anyone would like to offer assistance feel free to shoot me a PM. |
Re: FIRST Event Data in XML format
Quote:
|
Re: FIRST Event Data in XML format
Nate, I was just grumbling. Im already providing XML information for a couple of the pages and am working on the others.
As an update: http://schreiaj.ath.cx/share/FRC_Par...alschedule.php will provide the qualification schedules for the regionals that are not bizarre. http://schreiaj.ath.cx/share/FRC_Parsers/ranking.php will provide the qualification ranking data. Both pages take the following options: Event_Code - Event Code from frclinks.com. Since I now use frclinks to find the pages the exact codes given on there are what need to be used. Year - 2008, 2009 are currently supported. HTML_Display [true,false] - This decides whether to escape the tags so that they display in the browser. If you are parsing the xml in a script I would suggest leaving this false (or blank). If you plan on copying and pasting the xml anywhere from the browser use true. Currently I am working on parsing the team history pages and will post that as soon as I am done. |
Re: FIRST Event Data in XML format
Quote:
|
Re: FIRST Event Data in XML format
Quote:
Don't get me started on Word and HTML :mad: On an unrelated note, in the spirit of open source all the code is available http://schreiaj.ath.cx/share/FRC_Parsers/ and the current versions I am working on at the moment are at http://schreiaj.ath.cx/share/FRC_Parsers/Parsers_Beta/ |
Re: FIRST Event Data in XML format
Where did you find frclinks.com? That's a nifty idea.
I don't know how well one parsing method works against others, a regex will work as long as they don't add a new table to the document, and don't add non-numerical data. Likewise, an HTML parser will, and will also properly handle entities like < , but any change in structure will not work (though that is a simple parameter change telling it the new path to the data). I just use the DOM and SimpleXML parsers in PHP, Python (eewww, Python) must have something similar. I have an initiative to standardize how FIRST data is published, XML Interchange format. An example that mixes the rankings and schedule: Code:
<event season="2009" code="GLR">As for licensing, as a rule of thumb, if the code is shorter then the license would be, I put it in public domain. |
| All times are GMT -5. The time now is 16:12. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi