Log in

View Full Version : Best Data Format?


timothyb89
22-11-2010, 02:57
There's been some interest in making some of the raw data we use in our database available for everyone to use independently, and I'm currently in the process of making that happen... so to anyone who might like to use this, what format would be most accessible / easiest to integrate / etc?

I've already started to add in support for JSON (mostly because of its language / platform friendliness / ease of use) but I'm sure there's some other formats that would work better. XML is toward the top of the list, but from personal experience I know it's a pain to parse and may complicate things for anyone wishing to use it. Exporting as an Excel spreadsheet / PDF document isn't currently at the top of the list, but can certainly be bumped up if there's enough interest

You can see a first taste of the JSON access here, others will accessible similarly: http://glass.frcdb.net/team/1/json
It should work with any team number, and more exported data (event-wide, etc) should be available soon using a similar URL scheme.

Note that this only works on our test site, http://glass.frcdb.net/. Feel free to comment/offer suggestions on other new things as well -- we're adding support for more years ('09 is almost ready) as well as per-event (and soon to be per-team) wikis and other features to help document FRC.

Thanks for any suggestions!

Ether
22-11-2010, 09:23
There's been some interest in making some of the raw data we use in our database available for everyone to use independently, and I'm currently in the process of making that happen... so to anyone who might like to use this, what format would be most accessible / easiest to integrate / etc?

Break the data up into the appropriate relational tables, and export each table as a CSV text file. This universal format can be easily used by anyone.

DonRotolo
22-11-2010, 18:00
I second Ether's comment. CSV can trivially be imported into Excel, Access, parsed into XML or just perused manually if necessary.

I think the goal should be to select a format that is as universally accessible as possible, while maintaining the data value.

timothyb89
05-12-2010, 04:05
I had partially dismissed CSV/TSV/PSV/etc before because I wasn't able to export our data tree to it directly. But it's probably the most accessible format we can offer so I think I'll need to find a way :P

I don't think it could be done in the same manner as JSON or other tree-based formats (XML, etc) so it'd probably multiple files of lists, like:

teams.csv

# number, nickname, location, rookie season, other stats...
1,The Juggernauts,Pontiac, MI USA,1997,...
4,Team 4 ELEMENT,Van Nuys, CA USA,1997,...


events.csv

# name, short name, start date, end date, other stats
Ann Arbor FIRST Robotics District Competition,ann-arbor,1268377200000,1268463600000,...


ann-arbor-matches.csv

# number, red 1, red 2, red 3, blue 1, blue 2, blue 3, red score, blue score
1,68,1998,49,2591,2619,862,1,5
2,2611,1684,3415,2627,3302,1940,0,0


...plus more for standings, awards, and the like. Hopefully that should be easy enough for everyone to parse, and we'll have that up and running in a few days!
We've also finished JSON exporting for events, mostly, e.g. http://glass.frcdb.net/event/colorado/json returns basically the same data as the normal event page. (A somewhat more readable example of event JSON here (http://lhrobotics.pastebin.com/JCsc6YQ7)). With any luck, the 2 current choices should make things pretty accessible!

Ether
05-12-2010, 09:25
I don't think it could be done in the same manner as JSON or other tree-based formats (XML, etc) so it'd probably multiple files of lists

Yes, that's what I meant by "Break the data up into the appropriate relational tables (http://www.wisegeek.com/what-is-a-relational-database.htm), and export each table as a CSV text file"

Nibbles
02-01-2011, 21:37
I've proposed a standard XML format, of which you can find in my footer:
Help standardize match data! Use the XML interchange format. (http://www.chiefdelphi.com/forums/showthread.php?t=69174) (Specification page (http://frcdb.redjacket.ws/eventdata.html))

This sort of data is very well suited for the relational data model. For databases, I have tables containing each season, each event, each team for each season, each match, each alliance in each match (and their score), and each team in each match. I don't have too many unique IDs, multi-column primary keys works (for instance, the team-match primary key is indexed by (event, match, alliance, position) e.g. (300, 2, "R", 2) meaning event ID 300, second match, red alliance, second position in the alliance.

I might look at representing the data points with an RDF data model, a model that makes statements about resources (for instance, (X alliance) scored (x points), or (Y team) has a penalty "crossed the line")... It's still rather hard to use, but could be very cool for analytics.

If you have data you just want to get rid of, use CSV or JSON first, that's easiest to parse and can be converted into other formats with very little scripting.

timothyb89
05-01-2011, 10:46
Well, our JSON interface interface is working nicely, and currently looks something like this:

http://frcdb.net/event/whatever/json (single event, includes match data and standings)
http://frcdb.net/team/####/json
http://frcdb.net/json/teams (all teams)
http://frcdb.net/json/events (all events)


This works fairly well at least (via initial testing with the android app we‘re working on) save for trouble with database speed. It could be tested now save for some hosting issues that I‘m working on resolving (current server is down due to hardware problems...).

Anyway, it‘s a good thing someone‘s thought to make a standard XML format! We‘ve been planning to offer that but lost some priority when it was decidedly too hard to parse, but standardizing it can make this a ton easier.

CSV is still in the works, but converting our data to a relational model tends to be more time consuming than simply dumping it out :P Apparently that‘s what we get for storing everything with mass serialization instead of in an SQL database like we should have...