![]() |
XML Interchange format
At the very least, this is what I will be using as a data backup mechanism (moving away from SQL dumps). I hope other people gather the time to implement it and post their data in this format as well.
It is generic, and allows for not just FRC, but FTC and FLL as well, it allows for team number or information changes, since it keeps data per season, not once for all of history. It allows for score components to be posted and attributed to multiple teams. It is flexible enough to allow for more then two alliances, or varying numbers of teams per alliance. In many events, I have both the score and penalty components posted, and for some events I have more fine-grained data: scored points, bonus, and penalties, all seperate (which will appear in the dump when I get around to importing the score components into my database). The dump of all the data I have will appear at http://team498.org/scouting/data/latest.frc.xml Technical mumble follows. The specification for the format appears at http://docs.google.com/Doc?id=dcz67k4q_38f2rp7sfc. If anyone is interested in helping engineer the details of the format PM me, or visit me on IRC (see my footer), I can add you as an editor to the document. You should have good problem-solving skills, think about different situations that the format might be used in, know about XML and it's related technologies (XQuery, XPath, XSLT, namespaces), and be familiar with RFC 2119 and when to use each keyword. Basically, be able to think: is there a reason a certain feature should be required, is there any reason that someone might need to leave an attribute out, how should parsers deal with missing data, etc. Talking with other people about this, there was some concern over the selection of XML as the data format. While XML is certainly not appropriate for everything (including RPC calls imo), it really stands out for FIRST data because it is human readable, can be queried for data with XQuery (if you want to do statistics for example), it can be easily transformed into other formats (you could generate static HTML pages using XSL), and it has namespace support, so it can be extended. Using namespaces, you could link to game videos (<v:video href="http://firstvideoarchive.com/2008Archive/index.php?dir=Arizona/&file=az_qf2m3.wmv"/> for example) or game data specific to one season, you can add a s2008:laps="4" or s2007:lift="498:12 330:4" attribute to each team element under each match, to record laps a robot made or if it lifted any other robots. You could write a program that stores competition data using a native XML database, e.g. http://xml.apache.org/xindice/ or http://exist.sourceforge.net/ (I think it might be slow, but who knows, it might be innovative). |
Re: XML Interchange format
1 Attachment(s)
I think you've done some good work so far, but there are a few rough spots. The biggest thing I've seen so far is that the document itself contains no meta data about the date of creation, or the time span where it was valid.
Another thing, which may just be matter of preference, is that the you use one letter abbreviations in several places. Even if for no other reason than readability full words seem like a better choice. If anyone is interested I've attached a file, which almost conforms to this spec, that was generated using the TBA Api. |
Re: XML Interchange format
I dont have a major issue with it being XML (there are better formats for this though). My question is why is this not two files one for team data and one for match data, you have a lot of duplicate data in that file. I am not sure why you list all the teams at a comp listed at the top level of it, and then in each of the matches?
It is a good start it just needs some major layout work IHMO. |
Re: XML Interchange format
Some comments, although I have not given the document a thorough read:
We should give some serious thought to the representation of match numbers. "21" for Quarter Finals 2 Match 1 makes some sense, but what happens if there is ever a series of 8 ties and we go to game 10? The Blue Alliance represents this data poorly now. That's it for now. |
Re: XML Interchange format
Quote:
I used abbreviations for alliance color because that is how I stored it in my database (a CHAR(1)), and the keep names descriptive, values small philosophy followed me into XML. I think you are right on that however, so I'll change it. I think all lowercase works, is that a good idea? Or Title Case? I think meta data is outside of the scope of the format? There is RDF, which is aimed at adding metadata to XML documents, I think that might be the solution if that is completely necessary. Quote:
As for redundant data, you could figure out which teams are in each event by scanning each child <alliance> tag, but that is harder to do. It also makes semantic sense, a team element under an event tag means the team was part of that event, the same way it does for the alliance tag. Similar thinking was behind adding the score="" attribute to the alliance tag, it is redundant, but it simplifies it down, and it allows you to not know each component, and have only the score. Quote:
For elimination matches, a captain designation would be very useful. It didn't cross my mind, I don't keep any elimination data at all (I should be, because it is the only time the same teams replay each other under the same conditions, useful for statistics work). Good point about the penalties. The score attribute was added as a convenience, and I don't know how programs would deal with discrepancies between the score/penalty elements and score attribute. If there was a penalty system that awarded points to the other team/alliance, it would be hard to implement because the data format follows the premise that data belongs to other data, e.g. a score belongs to an alliance, which is a part of a match, etc. In that case, you would be faced with the issue of who owns the penalty. Off the top of my head, you would give a <penalty value="0" name="somepenalty for -10"/> to the offending side, then award <score name="somepenalty" value="10"/> to the other side. Making the score attribute mandatory seems like a good idea. That would allow the listed components to not be comprehensive, that is, the listed score components are not 100% of the score. It might follow that you could have "points" and "penalties" attributes that are also authoritative over the respective score and penalty components. There are way to many date formats, YYYY-MM-DD is the international standard, defined in ISO 8601 (not a public document, but information about it is out there). It helped tremendously that that is what SQL uses for dates as well. As for converting between dates, well, it isn't simple anywhere I think. I would like to get timezone data in there somewhere. The location attribute is simply what FIRST gives teams, it isn't readily available in any other format. It shouldn't be hard to split locations by commas I think, if you really need to parse the data, or use a library or API like Google Local which doesn't have a problem with mixed data. Location is for human reference mostly, so making it atomic like that isn't really necessary, and it might make it harder to display a simple location. Can you think of a situation where atomic location data is necessary? FIRST gives match ID's to elimination matches too, I was thinking that those would be used, along with the name attribute to name them with dashes. A problem I didn't consider (again because I don't store elimination data) is that you have multiple matches with the same ID. I think previously, to keep practice and qualification data, I just kept two seperate events, but that doesn't really make much sense in a more semantic format like this one. Probably, only allow one match number per match type. |
Re: XML Interchange format
Quote:
Quote:
|
Re: XML Interchange format
Quote:
That brings back to the question why not just make the raw SQL database readable to the world then we can make queries off of it. XML just is not a database replacement when you have relational data, which is why there is dups. |
Re: XML Interchange format
Quote:
|
Re: XML Interchange format
Quote:
If you want use that is any better then a text find, like "find all teams not in North America" (a query I have been interested in myself before) then even fine-grained, seperate country/province/city fields are not going to cut it. It seems like a dedicated API (Google Maps comes to mind) would be the best solution if you really need to interact with the location field. There shouldn't be too much difference between a simple string and multiple atomic fields, plus the single location attribute is simpler. I added times for each match, something which I assumed I added but did not somehow. It is in "YYYY-MM-DD HH:mm:ss" format, local timezone (FIRST-specified time). For names of matches and alliances, what case should be used? Lowercase seems to fit with me, just keep things ultra-consistent. "red" "blue" "elimination" "qualification" etc. It was brought to my attention that I was having encoding problems, MySQL was sending the data in iso-8859-1 (apparently I don't want latin1_swedish_ci collation). I don't know much about character encoding, but I think I figured out how to set the connection encoding to UTF-8, the XML default encoding ("SET NAMES utf8"). |
Re: XML Interchange format
Quote:
Quote:
|
Re: XML Interchange format
Why do we not make a standard sqlite database, and then allow data from a standard XML file to be uploaded to it, could be written in java or python and tada its cross compatible magic. I fear inconstancies will stem from duplicate data.
SQLite would work great for this. |
Re: XML Interchange format
Quote:
Quote:
And listing the teams in an event isn't redundant: it shows the teams that went to an event, regardless of if they played any matches or not. You have the exact same thing in a relational database, a table that links teams to an event. Code:
CREATE TABLE `team_event` ( |
Re: XML Interchange format
Where did you get that data list?
My high school team (228) is not in there... // Actually, after looking through the list, there are a lot of missing teams. 230 (Gaelhawks) was the next to come to mind. 1071 (Max) is another, along with at least dozen other teams, many of them from Connecticut... |
Re: XML Interchange format
Quote:
Everyone, however, should be in there about half way down, even if I don't have the nicknames (I should spend some time importing those sooner or later). |
Re: XML Interchange format
sorry that file is not really human readable for most people. To leverage it for any scouting data or otherwise a script is going to be parsing it.
I would look at a many-to-many relationship before saying it cant be done. I see this as a great transitionary format? I.E. database to database. I would be more then willing to write SQLite python scripts to both export and import data in this format. I am not trying to trash your work you have obviously put some time in this, I am just pointing out some weaknesses. As for adding your own extra types of data I would frown on that as it would pull away from your standard. Lets include what we want, and make revisions as a group, then in the meta data you should be able to write what version of the format is being used, and the scripts will all be happy. |
| All times are GMT -5. The time now is 15:27. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi