The vision system has a few different modes. One mode (that most directly applies to the challenge) returns the location and bounding boxes of blobs of color. In the code, it most likely takes these bounding boxes, sorts them by the similarity to the target color, and plots a path depending on the blob's X position or something. It would not be difficult to A) decrease the range of colors or B) apply another level of filtering to determine the height of the blob off the playing field to filter out humans or robots (tall ones at least).
All I am waiting for is someone to pick up a serial CompactFlash interface that are always in Circuit Cellar, and write the frame captures and show a post-match movie

.