Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   Scouting (http://www.chiefdelphi.com/forums/forumdisplay.php?f=36)
-   -   XML Interchange format (http://www.chiefdelphi.com/forums/showthread.php?t=69174)

Nibbles 14-09-2008 17:36

XML Interchange format
 
At the very least, this is what I will be using as a data backup mechanism (moving away from SQL dumps). I hope other people gather the time to implement it and post their data in this format as well.

It is generic, and allows for not just FRC, but FTC and FLL as well, it allows for team number or information changes, since it keeps data per season, not once for all of history. It allows for score components to be posted and attributed to multiple teams. It is flexible enough to allow for more then two alliances, or varying numbers of teams per alliance. In many events, I have both the score and penalty components posted, and for some events I have more fine-grained data: scored points, bonus, and penalties, all seperate (which will appear in the dump when I get around to importing the score components into my database).

The dump of all the data I have will appear at http://team498.org/scouting/data/latest.frc.xml


Technical mumble follows.

The specification for the format appears at http://docs.google.com/Doc?id=dcz67k4q_38f2rp7sfc. If anyone is interested in helping engineer the details of the format PM me, or visit me on IRC (see my footer), I can add you as an editor to the document. You should have good problem-solving skills, think about different situations that the format might be used in, know about XML and it's related technologies (XQuery, XPath, XSLT, namespaces), and be familiar with RFC 2119 and when to use each keyword. Basically, be able to think: is there a reason a certain feature should be required, is there any reason that someone might need to leave an attribute out, how should parsers deal with missing data, etc.

Talking with other people about this, there was some concern over the selection of XML as the data format. While XML is certainly not appropriate for everything (including RPC calls imo), it really stands out for FIRST data because it is human readable, can be queried for data with XQuery (if you want to do statistics for example), it can be easily transformed into other formats (you could generate static HTML pages using XSL), and it has namespace support, so it can be extended. Using namespaces, you could link to game videos (<v:video href="http://firstvideoarchive.com/2008Archive/index.php?dir=Arizona/&file=az_qf2m3.wmv"/> for example) or game data specific to one season, you can add a s2008:laps="4" or s2007:lift="498:12 330:4" attribute to each team element under each match, to record laps a robot made or if it lifted any other robots.

You could write a program that stores competition data using a native XML database, e.g. http://xml.apache.org/xindice/ or http://exist.sourceforge.net/
(I think it might be slow, but who knows, it might be innovative).

EHaskins 14-09-2008 20:04

Re: XML Interchange format
 
1 Attachment(s)
I think you've done some good work so far, but there are a few rough spots. The biggest thing I've seen so far is that the document itself contains no meta data about the date of creation, or the time span where it was valid.

Another thing, which may just be matter of preference, is that the you use one letter abbreviations in several places. Even if for no other reason than readability full words seem like a better choice.


If anyone is interested I've attached a file, which almost conforms to this spec, that was generated using the TBA Api.

comphappy 14-09-2008 20:24

Re: XML Interchange format
 
I dont have a major issue with it being XML (there are better formats for this though). My question is why is this not two files one for team data and one for match data, you have a lot of duplicate data in that file. I am not sure why you list all the teams at a comp listed at the top level of it, and then in each of the matches?
It is a good start it just needs some major layout work IHMO.

Greg Marra 14-09-2008 20:46

Re: XML Interchange format
 
Some comments, although I have not given the document a thorough read:
  • Don't abbreviate things when possible. "Red" and "Blue" are better than "R" and "B". You're thinking very FIRST-centric, think abstractly.
  • I'd like to see alliance/team have an optional "captain" attribute.
  • Calculating scores from penalties is nice, but a luxury. Can we have a "final score" attribute that is required, and "unpenalized score" and "penalties" attributes that are optional? What happens if in the future penalties are assessed by giving the opposing team points?
  • The "competition" attribute should have canonical names. "FRC" "FTC" "FLL" "VEX" "BEST" "OCCRA" spring to mind.
  • What is done for date in other XML formats? RSS does this: "<pubDate>Fri, 05 Sep 2008 00:51:30 -0400</pubDate>", so is that an easy format to generate/parse comared to YYYY-MM-DD?
  • Location is "e.g. "Phoenix, AZ, USA"". We're throwing away resolution with that. <location><street /><city /><state /><country /></location>?

We should give some serious thought to the representation of match numbers. "21" for Quarter Finals 2 Match 1 makes some sense, but what happens if there is ever a series of 8 ties and we go to game 10? The Blue Alliance represents this data poorly now.

That's it for now.

Nibbles 14-09-2008 23:19

Re: XML Interchange format
 
Quote:

Originally Posted by EHaskins (Post 765483)
I think you've done some good work so far, but there are a few rough spots. The biggest thing I've seen so far is that the document itself contains no meta data about the date of creation, or the time span where it was valid.

Another thing, which may just be matter of preference, is that the you use one letter abbreviations in several places. Even if for no other reason than readability full words seem like a better choice.


If anyone is interested I've attached a file, which almost conforms to this spec, that was generated using the TBA Api.

Very good work there :)

I used abbreviations for alliance color because that is how I stored it in my database (a CHAR(1)), and the keep names descriptive, values small philosophy followed me into XML. I think you are right on that however, so I'll change it. I think all lowercase works, is that a good idea? Or Title Case?

I think meta data is outside of the scope of the format? There is RDF, which is aimed at adding metadata to XML documents, I think that might be the solution if that is completely necessary.

Quote:

Originally Posted by comphappy (Post 765489)
...My question is why is this not two files one for team data and one for match data, you have a lot of duplicate data in that file. I am not sure why you list all the teams at a comp listed at the top level of it, and then in each of the matches?
It is a good start it just needs some major layout work IHMO.

The idea of this is to have a file format that can store everything in one location. It is flexible enough to keep teams and matches separately, or upload data for only one event for example, but I can't think of a reason to do that, when you could just merge them, when you are trying to store multiple events or seasons of data.

As for redundant data, you could figure out which teams are in each event by scanning each child <alliance> tag, but that is harder to do. It also makes semantic sense, a team element under an event tag means the team was part of that event, the same way it does for the alliance tag. Similar thinking was behind adding the score="" attribute to the alliance tag, it is redundant, but it simplifies it down, and it allows you to not know each component, and have only the score.

Quote:

Originally Posted by Greg Marra (Post 765493)
Some comments, although I have not given the document a thorough read:
  • Don't abbreviate things when possible. "Red" and "Blue" are better than "R" and "B". You're thinking very FIRST-centric, think abstractly.
  • I'd like to see alliance/team have an optional "captain" attribute.
  • Calculating scores from penalties is nice, but a luxury. Can we have a "final score" attribute that is required, and "unpenalized score" and "penalties" attributes that are optional? What happens if in the future penalties are assessed by giving the opposing team points?
  • The "competition" attribute should have canonical names. "FRC" "FTC" "FLL" "VEX" "BEST" "OCCRA" spring to mind.
  • What is done for date in other XML formats? RSS does this: "<pubDate>Fri, 05 Sep 2008 00:51:30 -0400</pubDate>", so is that an easy format to generate/parse comared to YYYY-MM-DD?
  • Location is "e.g. "Phoenix, AZ, USA"". We're throwing away resolution with that. <location><street /><city /><state /><country /></location>?

We should give some serious thought to the representation of match numbers. "21" for Quarter Finals 2 Match 1 makes some sense, but what happens if there is ever a series of 8 ties and we go to game 10? The Blue Alliance represents this data poorly now.

That's it for now.

Abbreviations, that is a good point too.

For elimination matches, a captain designation would be very useful. It didn't cross my mind, I don't keep any elimination data at all (I should be, because it is the only time the same teams replay each other under the same conditions, useful for statistics work).

Good point about the penalties. The score attribute was added as a convenience, and I don't know how programs would deal with discrepancies between the score/penalty elements and score attribute. If there was a penalty system that awarded points to the other team/alliance, it would be hard to implement because the data format follows the premise that data belongs to other data, e.g. a score belongs to an alliance, which is a part of a match, etc. In that case, you would be faced with the issue of who owns the penalty. Off the top of my head, you would give a <penalty value="0" name="somepenalty for -10"/> to the offending side, then award <score name="somepenalty" value="10"/> to the other side.

Making the score attribute mandatory seems like a good idea. That would allow the listed components to not be comprehensive, that is, the listed score components are not 100% of the score. It might follow that you could have "points" and "penalties" attributes that are also authoritative over the respective score and penalty components.

There are way to many date formats, YYYY-MM-DD is the international standard, defined in ISO 8601 (not a public document, but information about it is out there). It helped tremendously that that is what SQL uses for dates as well. As for converting between dates, well, it isn't simple anywhere I think. I would like to get timezone data in there somewhere.

The location attribute is simply what FIRST gives teams, it isn't readily available in any other format. It shouldn't be hard to split locations by commas I think, if you really need to parse the data, or use a library or API like Google Local which doesn't have a problem with mixed data. Location is for human reference mostly, so making it atomic like that isn't really necessary, and it might make it harder to display a simple location. Can you think of a situation where atomic location data is necessary?

FIRST gives match ID's to elimination matches too, I was thinking that those would be used, along with the name attribute to name them with dashes. A problem I didn't consider (again because I don't store elimination data) is that you have multiple matches with the same ID. I think previously, to keep practice and qualification data, I just kept two seperate events, but that doesn't really make much sense in a more semantic format like this one. Probably, only allow one match number per match type.

Greg Marra 15-09-2008 01:06

Re: XML Interchange format
 
Quote:

Originally Posted by Nibbles (Post 765509)
Making the score attribute mandatory seems like a good idea.

Not mandatory, as some matches won't have been played yet. But it should be present if there are penalized or unpenalized scores.

Quote:

Can you think of a situation where atomic location data is necessary?
"Find all teams from Connecticut." It's niche, but useful? I guess a string search can find that too.

comphappy 15-09-2008 02:00

Re: XML Interchange format
 
Quote:

Originally Posted by Nibbles (Post 765509)
As for redundant data, you could figure out which teams are in each event by scanning each child <alliance> tag, but that is harder to do. It also makes semantic sense, a team element under an event tag means the team was part of that event, the same way it does for the alliance tag. Similar thinking was behind adding the score="" attribute to the alliance tag, it is redundant, but it simplifies it down, and it allows you to not know each component, and have only the score.

You should figure out is this raw data or is this processed data. If it is raw data then out with the dup, the processing to regenerate that is tiny, raw data is just that raw. If it is not then there really is no reason to not break it into other file. Just as in a database there are multiple tables.

That brings back to the question why not just make the raw SQL database readable to the world then we can make queries off of it. XML just is not a database replacement when you have relational data, which is why there is dups.

Greg Marra 15-09-2008 13:26

Re: XML Interchange format
 
Quote:

Originally Posted by comphappy (Post 765527)
That brings back to the question why not just make the raw SQL database readable to the world then we can make queries off of it. XML just is not a database replacement when you have relational data, which is why there is dups.

This is just an interchange format. I wish FIRST would serve data in some standard like this so we didn't need to screen scrape it off their site.

Nibbles 16-09-2008 00:34

Re: XML Interchange format
 
Quote:

Originally Posted by Greg Marra (Post 765522)
Not mandatory, as some matches won't have been played yet. But it should be present if there are penalized or unpenalized scores.

"Find all teams from Connecticut." It's niche, but useful? I guess a string search can find that too.

I hadn't considered it to be used as a format for before matches, but that could very well be useful, if it is used as the format for a native XML database or something. How would you specify a match as unplayed? An attribute that specifies that might get redundant, since you don't usually look at the data during an event, but before or after, that would mean having to deal with a bunch of unplayed="unplayed" attributes. Is leaving out the score attribute enough to imply it is scheduled and not played yet?

If you want use that is any better then a text find, like "find all teams not in North America" (a query I have been interested in myself before) then even fine-grained, seperate country/province/city fields are not going to cut it. It seems like a dedicated API (Google Maps comes to mind) would be the best solution if you really need to interact with the location field. There shouldn't be too much difference between a simple string and multiple atomic fields, plus the single location attribute is simpler.

I added times for each match, something which I assumed I added but did not somehow. It is in "YYYY-MM-DD HH:mm:ss" format, local timezone (FIRST-specified time).

For names of matches and alliances, what case should be used? Lowercase seems to fit with me, just keep things ultra-consistent. "red" "blue" "elimination" "qualification" etc.

It was brought to my attention that I was having encoding problems, MySQL was sending the data in iso-8859-1 (apparently I don't want latin1_swedish_ci collation). I don't know much about character encoding, but I think I figured out how to set the connection encoding to UTF-8, the XML default encoding ("SET NAMES utf8").

Greg Marra 16-09-2008 09:26

Re: XML Interchange format
 
Quote:

Originally Posted by Nibbles (Post 765741)
I hadn't considered it to be used as a format for before matches, but that could very well be useful, if it is used as the format for a native XML database or something. How would you specify a match as unplayed? An attribute that specifies that might get redundant, since you don't usually look at the data during an event, but before or after, that would mean having to deal with a bunch of unplayed="unplayed" attributes. Is leaving out the score attribute enough to imply it is scheduled and not played yet?

For TBA, we set scores to -1 to indicate "unknown". I don't think this is a good strategy for the XML format. Maybe a "status" attribute? "future" "playing" "finished"? I am not sure entirely here how much information is too much.

Quote:

For names of matches and alliances, what case should be used? Lowercase seems to fit with me, just keep things ultra-consistent. "red" "blue" "elimination" "qualification" etc.
Lowercase seems best.

comphappy 16-09-2008 18:32

Re: XML Interchange format
 
Why do we not make a standard sqlite database, and then allow data from a standard XML file to be uploaded to it, could be written in java or python and tada its cross compatible magic. I fear inconstancies will stem from duplicate data.

SQLite would work great for this.

Nibbles 16-09-2008 21:02

Re: XML Interchange format
 
Quote:

Originally Posted by Greg Marra (Post 765783)
For TBA, we set scores to -1 to indicate "unknown". I don't think this is a good strategy for the XML format. Maybe a "status" attribute? "future" "playing" "finished"? I am not sure entirely here how much information is too much.
...

We are not confined to integers here, wouldn't a literal "?" work for a played match, but unknown score? It would mean an extra sanity check that would have to be made when parsing the document though. Then again, if you want any support for unknown scores, it is an extra if/then you have to make. Most languages would convert a "?" to 0 or some legal value that doesn't throw an error (atoi in C, and PHP, at the very least), if no support for unknown scores are added.

Quote:

Originally Posted by comphappy (Post 765853)
Why do we not make a standard sqlite database, and then allow data from a standard XML file to be uploaded to it, could be written in java or python and tada its cross compatible magic. I fear inconstancies will stem from duplicate data.

SQLite would work great for this.

This isn't intended as a data storage mechanism, I would suggest reading why I went with XML back in my first post again. A relational database like SQLite also doesn't solve two of the requirements I am addressing: a human readable format, and an extendable format. XML would allow you to add your own data, like how many laps a team made, say, to each team element under an alliance element. To do the same in SQLite you would have to create a new table (how I am planning on doing it for my database), if you wanted to keep the data in a relational format (there are other ways you could store the data, but it wouldn't be a strict relational format).

And listing the teams in an event isn't redundant: it shows the teams that went to an event, regardless of if they played any matches or not. You have the exact same thing in a relational database, a table that links teams to an event.
Code:

CREATE TABLE `team_event` (
  `teamnum` int(10) unsigned NOT NULL,
  `eventid` int(10) unsigned NOT NULL,
  PRIMARY KEY  (`teamnum`,`eventid`)
)

Even in a relational database, you have to put in some redundant data from time to time simply to speed things up. It is much faster to keep a column that keeps track of the number of private messages that you have, and update it every time you receive or delete a message, then it is to count every message in your inbox every time you want to know.

artdutra04 16-09-2008 22:09

Re: XML Interchange format
 
Where did you get that data list?

My high school team (228) is not in there...

// Actually, after looking through the list, there are a lot of missing teams. 230 (Gaelhawks) was the next to come to mind. 1071 (Max) is another, along with at least dozen other teams, many of them from Connecticut...

Nibbles 16-09-2008 22:28

Re: XML Interchange format
 
Quote:

Originally Posted by artdutra04 (Post 765889)
Where did you get that data list?

My high school team (228) is not in there...
...

You should be in the 2008 season team list. I don't have the complete 2007 season team list (the first one in the file), for that season I only have the teams that went to championship or AZ regional. Hopefully I will get that data later on from somewhere.

Everyone, however, should be in there about half way down, even if I don't have the nicknames (I should spend some time importing those sooner or later).

comphappy 17-09-2008 00:36

Re: XML Interchange format
 
sorry that file is not really human readable for most people. To leverage it for any scouting data or otherwise a script is going to be parsing it.

I would look at a many-to-many relationship before saying it cant be done.

I see this as a great transitionary format? I.E. database to database. I would be more then willing to write SQLite python scripts to both export and import data in this format.

I am not trying to trash your work you have obviously put some time in this, I am just pointing out some weaknesses. As for adding your own extra types of data I would frown on that as it would pull away from your standard. Lets include what we want, and make revisions as a group, then in the meta data you should be able to write what version of the format is being used, and the scripts will all be happy.


All times are GMT -5. The time now is 06:27.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi