![]() |
FRC Event Data in XML Format
I would like to announce yet another way to browse the FIRST website. I have often wished that I could have a Windows Gadget or a Dashboard widget that I could configure to scroll the schedule, or the rankings, or the match results. The problem has always been getting the data. I assume that Gameday has the data, and I know that TBA also has a wonderful API but I am not sure how often their data is updated. As a result I decided to pull data directly from the FIRST site and give it back out as XML. Enter http://frcfeed.appspot.com/ The goal of this site (also built on the Google App Engine) is to enable us to develop all sorts of applications without first having to deal with the hassle that is FIRST's website (anyone who has ever looked at the website's html will agree that a lot of it is ugly)
This site will return XML. For example the Awards for the 2009 Greater Toronto regional can be accessed at, http://frcfeed.appspot.com/2009/Grea...Toronto/awards upon viewing source you will see: Code:
<?xml version="1.0" encoding="UTF-8"?>Results and Schedule require either qualification or elimination be specified but the other two don't need it, adding it won't hurt anything though. A word on the event name or code, FIRST specifies a bunch of codes for events, (ex: dt for Detroit) I hate remembering those, I assume that everyone hates remembering those. I decided that I would rather specify an event by its official name (removing the words regional, mi district, division, and field to save typing) However, I realize that some people would prefer to use a code, this supports BOTH, if you specify something it doesn't recognize as an event name it will assume you know what you are doing and treat it as an event code. It will return nothing if you get the wrong name or code. Currently it defaults to 2010 for the year if you don't specify a year, there is no data for these events so you should probably specify 2009 as the year. Enjoy! ps: This is not to step on the feet of any of the other great services out there, TBA is amazing and I have heard great things about Gameday. If they offer a similar services consider this an alternative. |
Re: FRC Event Data in XML Format
Good job, Andrew. I am a novice at parsing xml but would it be helpful to put each award on a new line?
It seems that Pat and Andrew have too great pieces of code out. There would be a sizeable explosion of awesomeness if theses two combined. |
Re: FRC Event Data in XML Format
Quote:
What do you mean put each award on a new line? I am also looking into (as soon as I get a bit more spare time) adding the option to have the data come back as csv, xml, or json. The nice thing about json would be that you would only have to make a request to the page and then eval the results in Javascript. |
Re: FRC Event Data in XML Format
I mean put a break between awards.
Code:
<?xml version="1.0" encoding="UTF-8"?> |
Re: FRC Event Data in XML Format
It would probably aid in human reading it but for an automated parser it shouldn't make a difference.
|
Re: FRC Event Data in XML Format
I agree with Andrew on leaving out the <br>. This is all data, no formatting allowed.
Thanks for putting this together Andrew, can't wait to use it as a data feed for the Regional Twitter Accounts this year. Now I can spend my time adding new features rather than battling First's crazy formatting for their match scores. |
Re: FRC Event Data in XML Format
There have been a couple changes to this tonight (since I am slowly becoming nocturnal in an attempt to thwart kids on my lawn on Devil's Night)
First, you will probably not notice but I am caching results now, from your point of view it means faster page loads. For me it means less worrying about how long it takes to parse the page because I am no longer trying to do it multiple times at once (which is just plain stupid) The downside, results are no longer IMMEDIATE, they are potentially delayed by 5 minutes. I can't see this being a problem for most applications, in all honesty that is only 1 match delayed and I wouldn't doubt that FIRST caches there website anyway so updates probably only come out every N minutes. Besides, id rather have a slightly delayed service than one that doesn't work. Just as a point of reference, previously each time the page loaded it was taking just over a second of cpu time, now it is taking a tenth of a second or less. Much more usable. Second, you will probably notice that the results have been sorted now (except Awards) Up until now they have been seemingly random (yeah, I know they aren't but they look like they were arranged by a monkey and it bothered me) Now, Schedule and Results are sorted by Match Number and Rankings are sorted by Rank. Hope these work for you. Upcoming changes (read as, after I carve a pumpkin using a dremel)
Any other requests? |
Re: FRC Event Data in XML Format
Any chance of getting an option to bypass the cache? For some applications a 5 minute delay is a lifetime
|
Re: FRC Event Data in XML Format
Really? I can lower the cache time to maybe a match length but would prefer to still use the cache due to the performance increase.
I suppose in the worst case at 5 minutes you are 2 matches behind, if we go with a match length the absolute worst you can do is one match behind. Would a 2 minute cache be more acceptable? I may also try making the caching system smarter, obviously for regionals that are already completed the cached version is fine for a lot longer than for regionals that are running at the moment. Either that or take a peek at the cached version (if it exists) and if a match should have occurred (based on the original schedule) discard the cached version... I don't know. I will try to lower that number as much as I can. Who knows, this whole thing could crash and burn 1st week since that is the first time it will really be tested under real load. |
Re: FRC Event Data in XML Format
Last year I was able to get twitter messages out on the Regional Twitter Accounts within a couple of minutes of posting on the First site. Adding a 5 minute delay to that would make the decision to use a consistent XML interface versus the headache of parsing the First site directly a bit tougher.
How about distributing the parsing load as a way to improve performance? |
Re: FRC Event Data in XML Format
Looking at actual match data though, Michigan events were running on a 6-8 minute cycle time last year, some quick glances at other events shows this is pretty much normal. For now I will set the cache to expire every minute, this should mean we get match results quickly and without hammering too hard on anyone's machines. If I am consistently near my limits I will adjust the cache accordingly but it will NOT go to more than 50% of the average cycle time. (for last year this would be roughly 4 minutes) Would this be an acceptable solution?
Also, this is a heads up, the XML will change slightly, spaces in the tags shouldn't be there, I was really tired when I wrote that. I will also be putting an enclosing <Event> tag around everything. (if anyone has suggestions on what the XML should look like please tell me now) |
Re: FRC Event Data in XML Format
Seems like a reasonable solution.
So, if I understand correctly, you will parse a particular page once per minute? I assume you can also specify a time range when the page should be parsed? That way you only parse the pages where change is expected. |
Re: FRC Event Data in XML Format
Currently, no I can't specify a time range, I could probably set up a way of flagging events as completed and set the cache on those regionals to infinity (ie, as long as google keeps it in cache)
Currently the process the page takes is: Figure out what they want. Check to see if the data is in the cache. If it is serve the cached data If it isn't go grab the html, parse the data (this request gets to be the lucky one that takes a longer time) Put the new data in the cache and tell the cache to dump it in N seconds Serve the new data. Data is only cached when someone requests it, I am assuming that if someone REALLY wants to display data from 2007 they can wait a little bit for it. The theory I am operating on is that if one person wants the data there are probably other people who will request it shortly (ie the event is running) Obviously someone could just be curious about old match results but I am not too worried about that. |
Re: FRC Event Data in XML Format
OK, I see what you are doing, I was thinking in my poll driven parsing model. How about different cache rates for different request types? Match results should be the most real time as possible, standings less real time, and awards the longest cache time.
It's almost like you need an application to manage the cache times to get the best trade off between performance and responsiveness. For example, crank down the cache time when matches are running for an event, then crank them up to infinity after the event. |
Re: FRC Event Data in XML Format
Andrew, just a suggestion: it looks like the Last-Modified HTTP header is set correctly for the usfirst.org pages you're looking at. Thus, if your script can handle it, you should be able to do a HTTP HEAD request, look at the Last-Modified field, and check that against the last time you parsed the page to determine if the data has changed and needs to be re-parsed.
|
Re: FRC Event Data in XML Format
Minor changes;
Set the cache time to 1 minute. I really like Dave's idea and will look at implementing that next time I get a chance to really muck with it. I also defaulted the year to 2009, come January I need to remember to default it to 2010. There is also now some text at the landing page telling you the options. This might get a little nicer over time but is by no means a priority. |
Re: FRC Event Data in XML Format
Quote:
|
Re: FRC Event Data in XML Format
http://frcfeed.appspot.com/2009/ct/q...ations/results
Code:
Traceback (most recent call last):Would it be possible to produce JSON instead of XML? It's easier to parse in many situations. [s]Also, how frequently is this data refreshed? Do you generate it live off FIRST's servers, or are you caching it?[/s] I see there's a one minute cache time. Thanks! -Greg |
Re: FRC Event Data in XML Format
Quote:
Here is my gift for you http://frcfeed.appspot.com/2009/detroit/rankings/json The request system has been made a little more robust. It still doesn't report what you are missing but it will dump you to the instruction page. You can request it in xml, json, or human. Human just makes it easier to read for a human, best way is to just try it out and see yourself. It really shouldn't be used for stuff, it is mostly for debugging on my part. If you find any bugs just let me know. Just for reference, the URL is now: frcfeed.appspot.com/{year/}event/{qualification or elimination/}what{/format} |
Re: FRC Event Data in XML Format
Quote:
|
Re: FRC Event Data in XML Format
If their server supports it, you could turn the cache down to one or two minutes and send an If-Modified-Since header, in which case the server will tell you if nothing has changed and not send a response. In addition, for games in years past, why not increase the cache to an hour or even a day? The score probably won't change.
A few notes on syntax, looking at /2009/Greater Toronto/awards: Spaces are not allowed in tag names - it is parsed as an attribute without a value. There is a leading nonbreaking space ( ) in many of the tag values (Home town, name, team). is not a predefined entity, you will need a DTD declaration if you want to define it. Or just replace them with a standard space. It looks like FIRST doesn't use any entity other than , but you will need to be careful if in the future they do, that has to get parsed correctly. The ampersands need to be escaped, too, into "&" . As for URL parsing, are you using a regular expression? Code:
(/(\d{4}))?(/(.+))(/(qualification|elimination))?(/(awards|schedule|rankings|results))(/(json|xml|human))? |
Re: FRC Event Data in XML Format
Quote:
I will get to doing the escaping stuff today, I'll replace the spaces in the tag name with a _ and pull out the characters. I will also take care of ampersand and quotes while I am at it. Thanks for your comments. |
Re: FRC Event Data in XML Format
Further proving I have no life, I have made the changes to the XML document to replace the spaces with underscores.
Just for fun I also added the ability to slice the results. Somewhere after specifying the event you can add a two numbers with a - in between (1-8 for example) as an element in the request and it will only return elements 1-8. This doesn't work on Awards, but that wouldn't make any sense anyway. For example this new feature can be seen at http://frcfeed.appspot.com/dt/rankings/1-8/human/ which will return the top 8 teams in Detroit. Enjoy. Edit: You don't have to specify the beginning OR the end, you can use -8 to represent the top 8 or you could use 30- to return the bottom ten. Anyone who has ever used Python slices should be familiar with this concept. |
Re: FRC Event Data in XML Format
I like the "human" format idea (a lot) for quick data reference. But, since the data is already in an XML format, would it be possible to make the "human" page list the data in a table (with the XML tags as column titles, and such) instead of just the plain-text XML? It's not a crucial addition, but an idea nonetheless.
|
Re: FRC Event Data in XML Format
It also occurs to me that releasing the source code to this project would probably be highly beneficial for the FIRST community.
Any chance of a Google Code project so that people can check the code out themselves and play with it? Hopefully you'll even get improvements contributed back! [edit] Also, any chance of getting "attending teams" on here? [/edit] |
Re: FRC Event Data in XML Format
Quote:
I will take a look at grabbing attending teams. Just throwing the parser at it didn't work but I think I might be able to make it work by tweaking some of the options but I haven't done that yet. Again, I'll let you know. As for making the data tabular, using FIRST's own site is probably a better use for that. |
Re: FRC Event Data in XML Format
The source code is now available via subversion at http://code.google.com/p/frcfeed/ If you are interested in making changes let me know and we can work something out. I apologize that the code is as sloppy looking as it is I am working on cleaning it up and adding more comments.
A word on the eval(...) statements, I felt this would be the easiest way of handling multiple formats or multiple new options. In the next couple days I will try to put a wiki page together on adding new formats and new options. http://code.google.com/p/frcfeed/wiki/HowToUse Link to Wiki on how to use. Expect this to be fleshed out more over the next couple days. If you find parts that are hard to read just let me know and I will try to reword them. |
Re: FRC Event Data in XML Format
There have been some minor updates:
The request can come in any order now, as long as you specify an What and an Event and the When where applicable it will do its best to figure out what you meant. Human format has been tweaked to be tabular. XML option should be a little more standard. I am now using an XML generator instead of just string concatenation. Got rid of all evals in the python code. I also cleaned up the code to use one function in the place of 4. Also, as a small project I decided to put together some a Javascript object for getting data. I forgot how ugly Javascript's object stuff is. I have included it below. Code:
function Event(eventCode,year){Code:
kettering = new Event("Kettering",2009); |
Re: FRC Event Data in XML Format
Due to the fact that I continue to have little to no life I started a complete rewrite of the parser. It currently handles:
I don't have any exact numbers because I haven't figured it out yet but the new parser seems to run significantly faster. Before I wrap it up and release it I am asking for some feedback, What additional pages would you like to see? Formats? Options? |
Re: FRC Event Data in XML Format
Well, I have placed the new version out a day later than I wanted to but it is out. I added Team_List, Team_Info and Team_History as options. Team_List requires an event and an optional year (year currently defaults to the current year) Team_Info and Team_History require a team number.
So, during testing I decided to see what events had errors. The results are below. Code:
mn-rankings: GoodRunning all those requests used 1% of my CPU time and 3% of my bandwidth. The good news? This is without any sort of caching so I will not be adding any caching to this version in the foreseeable future unless I start hitting my limits. |
Re: FRC Event Data in XML Format
Was that run for 2009?
The lack of data for UT makes sense, but MD doesn't. MD might have some leading garbage data that throws the code off. Some of them are year dependent too, like EIN vs CMP. |
Re: FRC Event Data in XML Format
Quote:
If I were to switch The Blue Alliance over to pulling from FRCFeeds, I would pull down every current-week event's match results every five minutes. In a day, this makes for (8 events * 12 requests per hour * 24 hours per day = 2300 requests) Is that OK? [I intend to run my own clone of the FRCFeeds source if I do, but I am just posing a thought question] |
Re: FRC Event Data in XML Format
Make that two of us that would do what TBA does. I do the same for Regional Twitter Accounts.
|
Re: FRC Event Data in XML Format
Quote:
Quote:
App Engine cites that it should handle 5 million page views per day for free realistically I should be able to see about half that. This was a run of ~200 and it used ~3% and 1% of my daily quotas. Meaning with both TBA and Twitter Feeds it should be able to handle it. (That being said, without doing a full test I make no promises) As for running a local version. I will put the code up within the next few days. I have some minor changes that have to be made to run in a non app engine environment. |
Re: FRC Event Data in XML Format
Quote:
|
Re: FRC Event Data in XML Format
Quote:
Code:
CPU Time 1% 0.09 of 6.50 CPU hoursCurrently there is a 16% error rate, I hope to get that number much lower in the upcoming week. About 12 of the errors are due to the fact that some of the characters on the pages "can't be decoded by the ascii codec" (That is the error I see in the logs) No idea how to fix that but I have a hunch it will be really simple. As for the rest, I think it is just garbage HTML I have to get rid of before the parser gets a hold of it. Really FIRST, is it too hard to make code that is valid? |
Re: FRC Event Data in XML Format
Quote:
I am surprised you used 3% bandwidth so quickly if it's supposed to handle 5 million pageviews. You only requests a hundred or so pages, right? |
Re: FRC Event Data in XML Format
Quote:
Check these links for more info: http://www.amk.ca/python/howto/unicode http://eric.themoritzfamily.com/2008...s-and-unicode/ Good luck! |
Re: FRC Event Data in XML Format
Quote:
My calls to the FIRST website take incoming bandwidth (app engine makes a distinction) so for every call I get I am using that miniscule amount of bandwidth to get their http request and then using a decent amount to grab the html from the FIRST website. Thanks for the tips on handling those strings. |
Re: FRC Event Data in XML Format
4 months later...
Just a heads up the new version went live ~20 minutes ago. It does work with 2010 stuff. I tweaked the request stuff slightly. Standard query string stuff ?event=[event code]&what=[qm,em,qs,awards,rankings,standings,teams]&format[json,xml] I also added a filters option, you can use the following syntax: &filters=([filter-name]-[team]) You can gang multiple filters together within one filters arg or gang multiple filters args together. For example: http://frcfeed.appspot.com/?event=nj...5)&format=json will return any matches that 25 and 11 played together at New Jersey. Current options are: won lost played isRed isBlue Filters within a filters arg are ANDed together and multiple filters args are ORed together. The filters for any of 11's matches or any of 25's matches would be &filters=(played-11)&filters(played-25). Hope it is useful. |
Re: FRC Event Data in XML Format
Andrew have you updated this for 2011?
I tried the event code TN (for a rookie regional) and it came up with an error. Are you only supporting xml/json or do other formats work? Also can you add a what that is only team numbers and not the names? |
Re: FRC Event Data in XML Format
I'm working on a project that could really, really benefit from this feed. Any word on if this will be working this season, or should I start hashing out a solution from the Twitter feed? (Pun intended)
|
Re: FRC Event Data in XML Format
Quote:
Quote:
Those links should always contain the latest working versions of those functions but none of those have changed since I initially created them. (FRCHelper.py occasionally gets a new filter or two but chances are you won't need those). If you need help using any of the stuff or have features you really would like added just let me know. |
Re: FRC Event Data in XML Format
<sarcasm>Oh great. I get to learn Python.</sarcasm> :P
I have until the first week of regionals to finish the web app, but I'll probably go ahead and download the source to use on my server. Also, thank you for creating this. It's a huge time-saver for me. :) |
| All times are GMT -5. The time now is 19:46. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi