Team 900 mentor here
No argument. There’s a good bit of science there, but probably just as much trial and error…
How many different architectures did you try before you decided on this?
We picked the overall architecture of cascaded of CNNs pretty early on. Initially we tried just a single CNN but couldn’t find a balance between speed and accuracy that worked for us. We used LBP and Haar cascade classifiers last year so we at least understood the concept.
The architecture of each individual network continues to be tweaked the more we learn about them. The nets in the paper we borrowed from were too powerful - they’re doing face detection, we’re just looking for gray blobs so our nets needed to be simpler for a) performance and b) to prevent overfitting. So part of the effort was shrinking them down to a useful size and then tweaking learning rates and other parameters to get the most out of them.
What were the specs of the computer you trained on? (And how long did it take?)
To add to Marshall’s info :
The nets we use aren’t that complex. We did train on Marshall’s monster Titan X machine, and that was nice and fast so we could iterate quickly. On the other extreme, some of the smaller nets could be trained overnight on a laptop running CPU code. That gave us some flexibility to play outside of the lab and then do a full run on the big system the next day.
Converting the individual input images to the database format used by caffe was a big bottleneck. As was not formatting our Linux drives with enough inodes. We had millions of 24x24 training images, and preprocessing them a few ways led to a file system with 10s of millions of small files. We ended up running out of inodes which meant that even though we had disk space free the file system couldn’t create new files. We’ll know better next year.
Did you find a discrepancy between boulder detection in your space as opposed to at competition?
Yes, or even in different locations in the lab. Gray boulders on off-white floors were our nemesis. Plus the boulders are reflective so the harsh red and blue lighting led to some really interesting color variation. I should post some of the stills - we really came to appreciate that gray isn’t really just gray, no matter what your brain thinks it knows.
We initially grabbed a lot of data using the chroma-key process described in our data acquisition paper. That got us a good baseline to work with.
At that point, we captured videos from random places around our school and saw what didn’t work. We used the imageclipper tool (see our github repo) to manually generate additional images of the boulders, and then used some tricks to multiply that data (adding noise, random rotations and brightness variations, etc). After a few iterations of this process we had a reasonable amount of boulder data to work with.
The other issue was false positives - detecting boulders that aren’t actually there. Our initial set of negative (non-boulder) images was just random subsets of images we know didn’t have boulders in them. Once we had the system up and running, we could run the detection code on full videos we know didn’t have boulders in them. Luckily that’s pretty much any random video not specifically related to the 2016FRC game. We captured images of anything detected in these videos and used them as additional negative images - basically retraining the net on things the last iteration got wrong.
Both helped accuracy a lot, with the down side being that it generated a lot of data.
Moving forward, I suggest taking a look at the winner of ilsvrc 2015.
MSRA sounds like some sort of antibiotic resistant flesh-eating bacteria
I’d love to have the resources to run a 150-layer deep network on the robot.
But yeah, there’s lots of really cool new things out there and only so much time to keep up. I’d love to get some time to try out something like Yolo, single-shot multibox or Faster-RCNN and see if it can scale down to and embedded system. Running 1 net per input from should be more efficient than thousands if we can get the complexity down to something that’ll fit on an embedded GPU.
Excellent work 900. I always look forward to these every year.
Cool! Glad to know people are reading. We have a lot of fun working with these projects.