[TBA]: TBATV v4 Development Log

I am going to post notes in this thread as we work along on developing TBATV v4 on Google App Engine. They’re not terribly intended to be read externally, but nothing in them is secret, just perhaps poorly explained. Feel free to chime in if you have any development insights as we go :slight_smile:

Things are up and running on the dev server, but hitting timeout deadlines is a problem. Generally, I think datastore ops take a long time because they are remote procedure calls that need to dial outside of the server actually executing the code and are blocking. The way around this is to group together datastore ops and execute them all at once, which is a bit more complicated to code but almost certainly better form.

In particular, I am getting punked by getting match results. With 100 matches on an event page, and 6 MatchTeam objects to deal with per Match, this means we are creating hundreds of objects to do a match result update. I am not sure what the best way around this is - probably some very aggressive transaction bucketing.

I assume you’re using Python?

I’ve run scripts before on App Engine that have created a few hundred objects in one go, and they seem to work okay (in the cloud, at least; it totally kills the local development server whose datastore implementation just can’t compete with Bigtable).

I don’t have any experience with this, but I’m pretty sure you can pass an optional RPC object argument that can specify a callback, to both the URL fetcher and the datastore, to make them act asynchronously. That way, you should be able to run operations simultaneously.

Yup, we’re using Python. I’ve actually found the cloud to be slower than the local SDK. Maybe there have been improvements in the SDK, but the roundtrip time on the synchronous RPCs in the cloud adds up.

Making datastore tasks asynchronous and non-blocking is very attractive. I’ll have to read up on RPCs. Thanks for the pointer!

I discovered that App Engine has a feature called “key_names”. Instead of having App Engine assign a numeric ID to a Model in the datastore, you can specify a string to use instead. Later, you can call Model.get_by_key_name(key) instead of Model.all().filter(‘property=’, value).get(), which is faster.

We are going to use these for Models with obvious canonical names. For instance, Teams will have key_names like ‘frc177’, and Events will have key_names like ‘2010ct’. Matches get key_names like ‘2010ct_sf2m1’. Since we can guess the key_name for a Model we expect to exist, we can find them faster.

Neat.

I have not had time to work on the project much lately because of a trip to Boston, but I figured I would post our Models. Maybe people can give some feedback on the schema we’re moving forward with presently.

from google.appengine.ext import db

class Team(db.Model):
    """
    Teams represent FIRST Robotics Competition teams.
    key_name is like 'frc177'
    """
    team_number = db.IntegerProperty(required=True)
    name = db.StringProperty()
    nickname = db.StringProperty()
    address = db.PostalAddressProperty() # If we can scrape this.
    website = db.LinkProperty()
    first_tpid = db.IntegerProperty() #from USFIRST. FIRST team ID number. -greg 5/20/2010


class Event(db.Model):
    """
    Events represent FIRST Robotics Competition events, both official and unofficial.
    key_name is like '2010ct'
    """
    name = db.StringProperty()
    event_type = db.StringProperty() # From USFIRST
    short_name = db.StringProperty() # Should not contain "Regional" or "Division", like "Hartford"
    event_short = db.StringProperty(required=True) # Smaller abbreviation like "CT"
    year = db.IntegerProperty(required=True)
    start_date = db.DateTimeProperty()
    end_date = db.DateTimeProperty()
    venue = db.StringProperty()
    venue_address = db.PostalAddressProperty() # We can scrape this.
    location = db.StringProperty()
    official = db.BooleanProperty(default=False) # Is the event FIRST-official?
    first_eid = db.StringProperty() #from USFIRST
    website = db.StringProperty()


class EventTeam(db.Model):
    """
    EventTeam serves as a join model between Events and Teams, indicating that
    a team will or has competed in an Event.
    """
    event = db.ReferenceProperty(Event,
                                 collection_name='teams')
    team = db.ReferenceProperty(Team,
                                collection_name='events')

class Match(db.Model):
    """
    Matches represent individual matches at Events.
    Matches have many Videos.
    Matches have many Alliances.
    key_name is like 2010ct_qm10 or 2010ct_sf1m2
    """
    event = db.ReferenceProperty(Event,
                                 collection_name='matches',
                                 required=True)
    time = db.DateTimeProperty()
    comp_level = db.StringProperty(required=True,choices=set("Qualifications", "Quarterfinals", "Semifinals", "Finals"])) # This choices set should probably become a global Constant somewhere. How do you do that in Python properly? -greg 5/20/2010
    set_number = db.IntegerProperty(required=True)
    match_number = db.IntegerProperty(required=True)


class MatchTeam(db.Model):
    """
    A join class between Teams and Matches. Serves to store alliance information
    Based on code from: http://code.google.com/appengine/articles/modeling.html
    """
    match = db.ReferenceProperty(Match,
                                 collection_name='teams',
                                 required=True)
    team = db.ReferenceProperty(Team,
                                collection_name='matches',
                                required=True)
    alliance = db.StringProperty(choices=set("red", "blue"]),
                                 required=True)
    substitute = db.BooleanProperty(default=False) #indicate the team was a substitute on the Alliance


class MatchScore(db.Model):
    """
    A one to many relationship class that stores alliance scores for each Match
    """
    match = db.ReferenceProperty(Match,
                                 collection_name='scores',
                                 required=True)
    alliance = db.StringProperty(choices=set("red", "blue"]),
                                 required=True)
    score = db.IntegerProperty()


class TBAVideo(db.Model):
    """
    Store information related to videos of Matches hosted on
    The Blue Alliance.
    """
    match = db.ReferenceProperty(Match,
                                 collection_name='tba_videos',
                                 required=True)
    location = db.StringProperty()

class YoutubeVideo(db.Model):
    """
    Store information related to videos of Matches hosted on YouTube.
    """
    match = db.ReferenceProperty(Match,
                                 collection_name='youtube_videos',
                                 required=True)
    youtube_id = db.StringProperty()

I am not sure if this is the right thread for the Blue Alliance “wish list”, but I will throw this in here anyway.

I would love a column on the page that is W/L/T when I have searched for a particular team. Yes, technically I can just search through the scores, but a W/L/T would save me a lot of time when doing pre-scouting for the championship.

Brainstorm - The Blue Alliance v4 is a better thread for ideas like this. Please keep them coming in that thread! :slight_smile:

I put in some more work this afternoon, and the Datafeed system is to the point that we can get all Events, Teams, and Matches from 2010 regionals posted on the FIRST website. We’ll be importing this data from the TBAv3 SQL rather than re-scraping from FIRST, but this system is important moving forward for 2011.

There are still issues with how long datastore actions take. Some of these will be improved by using Model.get_or_insert() instead of separately looking things up before trying to modify them. This will particularly improve Match insertion, as we need to create MatchTeam and MatchScore objects to accompany each match.

Memcached will play a large role in reducing CPU usage as well. By memcaching certain requests like “give me the HTML for Connecticut 2010’s matches”, we can dodge hitting the datastore at all, instead making a single memcached call. We’ll expire these memcached objects when we make updates to the underlying data they hold (such as getting new Match results), but this should be particularly useful for page views (fast!) and API calls from other apps (repetitive!).

I’m still not 100% certain that the (Match, MatchTeam, MatchScore) object model is what we will ultimately go with. It creates nine objects per Match, which is a bit expensive. It’s also expensive to make sure that we don’t have straggling objects if Teams were erroneously attached to matches. It really does provide nice flexibility going forward though.

Moving forward, the next tasks are to build out mock pages for Events, Teams, and Matches. This will make sure that our object model contains everything we think will go in the final pages. We’re hoping to do a redesign of the site, so we’re not going to put a lot of effort into the visual design at this point. Bare HTML will do.

As we get things more towards “works without elaborate setup” we’ll push the code into either a Google Code or GitHub repository (any opinions here?). We hope the community will help suggest performance improvements or even commit patches!

Instead of memcaching the html you could memcache the data and fill in a django template with it. To me that seems like the cleanest way of putting data into HTML files without mixing python code in with html code.

This is probably what we’ll end up doing. I don’t think rendering templates is very expensive compared to everything else, so that should be clean and easy.

Here is a screenshot of AppStats showing the problem with the Match, MatchTeam, MatchScore object model. All of the database gets and puts to insert a Match add up to a lot of CPU time.

I wonder if storing everything in Match using a red1, red2, red3, blue1, blue2, blue3 system or a red_teams, blue_teams list reference property is a better idea. I’d really love for these relation objects to be faster, but when we render a full event worth of matches, it takes almost 20 seconds.

Does anyone know if it is possible to non-lazily query a bunch of objects ReferenceProperties all at once? Like say, “I want all the Matches, and their MatchTeam objects, and the Team objects on the other end of those”?





I am going to change the Match model to be less flexible so we can speed up datastore performance.

I am thinking we will have a Match have Match.teams, which will be a ListProperty that stores the Teams in the match. Separately, we will store Match.alliances, which will contain a dictionary shaped like {“red”: “frc177”, “frc195”, “frc125”], “blue”: “frc433”, “frc190”, “frc222”]}. The teams property will basically be an index to let us quickly search by team, and the alliances property will store the actual structure of the alliances.

We’ll add another property to Matches called “game”, where we write down which FRC game was being played. This way, if they change the game structure in the future (or we get data on past games), we’ll easily be able to adjust Controllers to handle it without having to muck with the model.

New concept:


class Match(db.Model):
    """
    Matches represent individual matches at Events.
    Matches have many Videos.
    key_name is like 2010ct_qm10 or 2010ct_sf1m2
    """
    event = db.ReferenceProperty(Event,
                                 required=True)
    time = db.DateTimeProperty()
    comp_level = db.StringProperty(required=True,choices=set("Qualifications", "Quarterfinals", "Semifinals", "Finals"])) # This choices set should probably become a global Constant somewhere. How do you do that in Python properly? -greg 5/20/2010
    set_number = db.IntegerProperty(required=True)
    match_number = db.IntegerProperty(required=True)
    teams = db.ListProperty(Team) #Primarily for indexing and searching
    alliances = db.StringProperty #Store a Dictionary as a JSON string
    scores = db.StringProperty #Store a Dictionary as a JSON string

This will reduce the number of Datastore lookups to display a Match (assuming we’re not interested in more than the team number) by just under an order of magnitude. I think that’s a good thing!

Similarly, by changing from having a bunch of EventTeam objects, we could make teams a ListProperty of an Event. Then to get all of the Teams at an Event would require just finding the Event, instead of finding all of the EventTeams. This is switching away from a many-to-many relationship to many one-to-many relationships.

I haven’t had much time to work on development, but I think these new ideas will remove some of the major roadblocks that existed.

Follow @tba_dev](http://twitter.com/tba_dev) to get the latest on stuff being committed to our github! :]

:confused: Will IRI and other big off-season event ever be added to The Blue Alliance? I know matches from these events have been recorded before… but not now?
Just wondering