![]() |
XML Interchange format
At the very least, this is what I will be using as a data backup mechanism (moving away from SQL dumps). I hope other people gather the time to implement it and post their data in this format as well.
It is generic, and allows for not just FRC, but FTC and FLL as well, it allows for team number or information changes, since it keeps data per season, not once for all of history. It allows for score components to be posted and attributed to multiple teams. It is flexible enough to allow for more then two alliances, or varying numbers of teams per alliance. In many events, I have both the score and penalty components posted, and for some events I have more fine-grained data: scored points, bonus, and penalties, all seperate (which will appear in the dump when I get around to importing the score components into my database). The dump of all the data I have will appear at http://team498.org/scouting/data/latest.frc.xml Technical mumble follows. The specification for the format appears at http://docs.google.com/Doc?id=dcz67k4q_38f2rp7sfc. If anyone is interested in helping engineer the details of the format PM me, or visit me on IRC (see my footer), I can add you as an editor to the document. You should have good problem-solving skills, think about different situations that the format might be used in, know about XML and it's related technologies (XQuery, XPath, XSLT, namespaces), and be familiar with RFC 2119 and when to use each keyword. Basically, be able to think: is there a reason a certain feature should be required, is there any reason that someone might need to leave an attribute out, how should parsers deal with missing data, etc. Talking with other people about this, there was some concern over the selection of XML as the data format. While XML is certainly not appropriate for everything (including RPC calls imo), it really stands out for FIRST data because it is human readable, can be queried for data with XQuery (if you want to do statistics for example), it can be easily transformed into other formats (you could generate static HTML pages using XSL), and it has namespace support, so it can be extended. Using namespaces, you could link to game videos (<v:video href="http://firstvideoarchive.com/2008Archive/index.php?dir=Arizona/&file=az_qf2m3.wmv"/> for example) or game data specific to one season, you can add a s2008:laps="4" or s2007:lift="498:12 330:4" attribute to each team element under each match, to record laps a robot made or if it lifted any other robots. You could write a program that stores competition data using a native XML database, e.g. http://xml.apache.org/xindice/ or http://exist.sourceforge.net/ (I think it might be slow, but who knows, it might be innovative). |
Re: XML Interchange format
1 Attachment(s)
I think you've done some good work so far, but there are a few rough spots. The biggest thing I've seen so far is that the document itself contains no meta data about the date of creation, or the time span where it was valid.
Another thing, which may just be matter of preference, is that the you use one letter abbreviations in several places. Even if for no other reason than readability full words seem like a better choice. If anyone is interested I've attached a file, which almost conforms to this spec, that was generated using the TBA Api. |
Re: XML Interchange format
I dont have a major issue with it being XML (there are better formats for this though). My question is why is this not two files one for team data and one for match data, you have a lot of duplicate data in that file. I am not sure why you list all the teams at a comp listed at the top level of it, and then in each of the matches?
It is a good start it just needs some major layout work IHMO. |
Re: XML Interchange format
Some comments, although I have not given the document a thorough read:
We should give some serious thought to the representation of match numbers. "21" for Quarter Finals 2 Match 1 makes some sense, but what happens if there is ever a series of 8 ties and we go to game 10? The Blue Alliance represents this data poorly now. That's it for now. |
Re: XML Interchange format
Quote:
I used abbreviations for alliance color because that is how I stored it in my database (a CHAR(1)), and the keep names descriptive, values small philosophy followed me into XML. I think you are right on that however, so I'll change it. I think all lowercase works, is that a good idea? Or Title Case? I think meta data is outside of the scope of the format? There is RDF, which is aimed at adding metadata to XML documents, I think that might be the solution if that is completely necessary. Quote:
As for redundant data, you could figure out which teams are in each event by scanning each child <alliance> tag, but that is harder to do. It also makes semantic sense, a team element under an event tag means the team was part of that event, the same way it does for the alliance tag. Similar thinking was behind adding the score="" attribute to the alliance tag, it is redundant, but it simplifies it down, and it allows you to not know each component, and have only the score. Quote:
For elimination matches, a captain designation would be very useful. It didn't cross my mind, I don't keep any elimination data at all (I should be, because it is the only time the same teams replay each other under the same conditions, useful for statistics work). Good point about the penalties. The score attribute was added as a convenience, and I don't know how programs would deal with discrepancies between the score/penalty elements and score attribute. If there was a penalty system that awarded points to the other team/alliance, it would be hard to implement because the data format follows the premise that data belongs to other data, e.g. a score belongs to an alliance, which is a part of a match, etc. In that case, you would be faced with the issue of who owns the penalty. Off the top of my head, you would give a <penalty value="0" name="somepenalty for -10"/> to the offending side, then award <score name="somepenalty" value="10"/> to the other side. Making the score attribute mandatory seems like a good idea. That would allow the listed components to not be comprehensive, that is, the listed score components are not 100% of the score. It might follow that you could have "points" and "penalties" attributes that are also authoritative over the respective score and penalty components. There are way to many date formats, YYYY-MM-DD is the international standard, defined in ISO 8601 (not a public document, but information about it is out there). It helped tremendously that that is what SQL uses for dates as well. As for converting between dates, well, it isn't simple anywhere I think. I would like to get timezone data in there somewhere. The location attribute is simply what FIRST gives teams, it isn't readily available in any other format. It shouldn't be hard to split locations by commas I think, if you really need to parse the data, or use a library or API like Google Local which doesn't have a problem with mixed data. Location is for human reference mostly, so making it atomic like that isn't really necessary, and it might make it harder to display a simple location. Can you think of a situation where atomic location data is necessary? FIRST gives match ID's to elimination matches too, I was thinking that those would be used, along with the name attribute to name them with dashes. A problem I didn't consider (again because I don't store elimination data) is that you have multiple matches with the same ID. I think previously, to keep practice and qualification data, I just kept two seperate events, but that doesn't really make much sense in a more semantic format like this one. Probably, only allow one match number per match type. |
Re: XML Interchange format
Quote:
Quote:
|
Re: XML Interchange format
Quote:
That brings back to the question why not just make the raw SQL database readable to the world then we can make queries off of it. XML just is not a database replacement when you have relational data, which is why there is dups. |
Re: XML Interchange format
Quote:
|
Re: XML Interchange format
Quote:
If you want use that is any better then a text find, like "find all teams not in North America" (a query I have been interested in myself before) then even fine-grained, seperate country/province/city fields are not going to cut it. It seems like a dedicated API (Google Maps comes to mind) would be the best solution if you really need to interact with the location field. There shouldn't be too much difference between a simple string and multiple atomic fields, plus the single location attribute is simpler. I added times for each match, something which I assumed I added but did not somehow. It is in "YYYY-MM-DD HH:mm:ss" format, local timezone (FIRST-specified time). For names of matches and alliances, what case should be used? Lowercase seems to fit with me, just keep things ultra-consistent. "red" "blue" "elimination" "qualification" etc. It was brought to my attention that I was having encoding problems, MySQL was sending the data in iso-8859-1 (apparently I don't want latin1_swedish_ci collation). I don't know much about character encoding, but I think I figured out how to set the connection encoding to UTF-8, the XML default encoding ("SET NAMES utf8"). |
Re: XML Interchange format
Quote:
Quote:
|
Re: XML Interchange format
Why do we not make a standard sqlite database, and then allow data from a standard XML file to be uploaded to it, could be written in java or python and tada its cross compatible magic. I fear inconstancies will stem from duplicate data.
SQLite would work great for this. |
Re: XML Interchange format
Quote:
Quote:
And listing the teams in an event isn't redundant: it shows the teams that went to an event, regardless of if they played any matches or not. You have the exact same thing in a relational database, a table that links teams to an event. Code:
CREATE TABLE `team_event` ( |
Re: XML Interchange format
Where did you get that data list?
My high school team (228) is not in there... // Actually, after looking through the list, there are a lot of missing teams. 230 (Gaelhawks) was the next to come to mind. 1071 (Max) is another, along with at least dozen other teams, many of them from Connecticut... |
Re: XML Interchange format
Quote:
Everyone, however, should be in there about half way down, even if I don't have the nicknames (I should spend some time importing those sooner or later). |
Re: XML Interchange format
sorry that file is not really human readable for most people. To leverage it for any scouting data or otherwise a script is going to be parsing it.
I would look at a many-to-many relationship before saying it cant be done. I see this as a great transitionary format? I.E. database to database. I would be more then willing to write SQLite python scripts to both export and import data in this format. I am not trying to trash your work you have obviously put some time in this, I am just pointing out some weaknesses. As for adding your own extra types of data I would frown on that as it would pull away from your standard. Lets include what we want, and make revisions as a group, then in the meta data you should be able to write what version of the format is being used, and the scripts will all be happy. |
Re: XML Interchange format
Quote:
SQLite, Postgres, MySQL, relational databases in general are for storing data. They are not for transferring it. Even when backing up relational databases, do you copy the binary files? No, you export the SQL statements that can re-create the database from scratch as a backup. SQLite is not a data format, it is a database. There is a distinct difference. Quote:
Yes, XML has weaknesses. Relational data also has weaknesses. There is no one killer data format, and you have to choose what makes sense and understand it won't always be the correct choice. Again, for sharing data, XML is well suited because you can put comments in it, it is human readable. I can open it up in a web browser and inspect the data - huge plus, at least one person has already done so. Namespaces are a plus, it is extendable, and elements are inherited from other elements, that is to say there is an ancestry. It is made very clear that teams can belong to seasons, events, and matches, scores can belong to a team or a match, and so on. Such flexibility is hard with relational data, which is fairly strict. Shipping around data in a relational format confines DBAs to a straitjacket, I am re-working my own database right now (have been for two weeks on and off, and there isn't a real end in sight) because I described alliances, scores, and events in a bad way, and am really paying for it with my time. |
Re: XML Interchange format
I am wondering why you are bashing python, as it is an excellent language for scripting, which is what this is all about. Do you really want to write your database scripts in C? I dont think so. I was speaking about writing a TurboGears or Django app which is fully python css and html (none of the php mumbojumbo already went through that stage of my life). One of the cool things about these apps is the model.py file which defines your database very simply and in an Easy-To-Read way.
If you read what I wrote I was not wanting to ship a SQLite file around. But have a standard model that can be created that people can put on there computer, and have the scripts to Take Your XML Format and place it in a database, and export the data base to your XML format. It seems to me this could be powerful. People can write scripts on the database in what ever they want (python makes this easy), and they can see the data in the XML format if they want during transition. As for XML as a database format... come on. Even xindice on there front page says it does not serve as a general database. I have played the XML database game in past years and know where it leads. I still use XML for data transmission all the time. |
Re: XML Interchange format
Quote:
Nobody is saying you can't have a SQLite/Python parser, but ultimately not everybody is going to use the same language framework. Once this format is documented and stable you can write you Python scripts , I can write my .net apps, and some guy over there can write his Ruby scripts or Java apps or PHP site. I don't think anyone is intending to use this format for long term storage, not that somebody won't ;), but for inter-team or even intra-team data transportation this could be a very useful format. Quote:
You could still include a format version, but it would only be the base format (i.e. not specifying the 5% usages). If you don't allow this people will either branch the format and create many similar, but incompatible formats, or the format will become a mess of with 95% of the spec being for the 5% use cases (insert generic Windows bloat joke here). |
Re: XML Interchange format
Quote:
|
Re: XML Interchange format
I'm not very savvy in this field, but this whole thing seems interesting. I'm currently working on creating a regional intraweb that teams can connect to.
At first I was going to implement one of the other solutions already floating around, but if this'll be ready before Week 1, I'll be happy in trying it out. Of course, I'll need help integrating it into a CMS/Website. Keep up the good work! :) |
Re: XML Interchange format
Quote:
Why not create a new thread about this intraweb? Also, I can't imagine it will be very accessable if Wi-Fi won't be allowed in the pits. |
Re: XML Interchange format
I know, I've been reading the Wi-Fi threads. I truly hope FIRST doesn't cripple us in that area.
Well, like I said, I don't know what all this thing is about. :D Maybe if you can help me understand how this works, I might be able to write up a simple interface to use for my site. |
Re: XML Interchange format
Quote:
First off, it is an interchange format, used for taking data that one person logs and moving it into another place that doesn't have the data. To read the data, you have to have an XML parser and that can parse the data and then programmatically read it from a tree-like structure (The DOM, an Ent I like to call it ;) and do what you want with it (insert it into a relational database, etc). To create the data, you can reverse the process, create nodes in the DOM and ask the library to create XML. At minimum you have to be able to write UTF-8 characters and serialize the &">< characters to their XML entities, &">< respectively, then print the XML tags and serialized names/values. Of course, the data has to be in the structure defined in the specification document in my first post in the thread, otherwise it wouldn't be exchangeable. |
Re: XML Interchange format
Erm, thanks? :) Still a bit foreign to me, but now I think I understand the basic concept. I'll work with it for a bit. It'll keep my mind off the depressing fact that I'll be teamless this season.
|
Re: XML Interchange format
Ok, still lost, and probably going to need somebody to walk me through getting this to work on a PHP site.
|
Re: XML Interchange format
Quote:
Accessing a season of data would be pretty simple, foreach($root->team as $team) will take you through all the teams in a season or event. I am working on a scouting program (a better version, I really hate the way data is stored and edited in the last version I have posted) that will be making more use of it. If you want to chat about it, see my footer for more info. |
| All times are GMT -5. The time now is 14:56. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi