paper: ZebraVision 4.0 Neural Networks

Thread created automatically to discuss a document in CD-Media.

ZebraVision 4.0 Neural Networks
by: arallen

Document describing the structure, implementation and utilization of neural networks for tracking game objects on the field in real time. We used the Caffe library from Berkeley Vision, the OpenCV library, and the DIGITS software from NVIDIA to create the neural network.

Document describing the structure, implementation and utilization of neural networks for tracking game objects on the field in real time. We used the Caffe library from Berkeley Vision, the OpenCV library, and the DIGITS software from NVIDIA to create the neural network. This specific network was developed to track boulders for the FRC 2015-2016 season but can be applied to track virtually any object with proper data collection and training.

Authors: Alexander Allen, Benjamin Decker

Paper by Adobe Research Team:
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Li_A_Convolutional_Neural_2015_CVPR_paper.pdf

Demonstration:
https://youtu.be/v9gof9Rafks

GitHub:
https://github.com/FRC900/2016VisionCode

Caffe Library:
http://caffe.berkeleyvision.org/

NVIDIA DIGITS:
https://developer.nvidia.com/digits

Zebravision4.0NeuralNets.pdf (1.09 MB)

This is probably the best work we did all season and I’m very proud of our team for this and especially of the students involved. Thank you!!!

Based on my experience with deep learning, as well as many of my friend, it is really hit or miss when it comes to if a model will work or or not.

How many different architectures did you try before you decided on this?

What were the specs of the computer you trained on? (And how long did it take?)

Did you find a discrepancy between boulder detection in your space as opposed to at competition?

Moving forward, I suggest taking a look at the winner of ilsvrc 2015.

Excellent work 900. I always look forward to these every year.

I’ll answer these as best as I can but I’ll see about getting a student to add clarity.

We tried and are trying a lot of things. I doubt we will ever find “the perfect model”. It’s an ever-evolving system but we’re getting a lot better at generalizing it and making it repeatable, which is good.

The computer we train on is a 12 core Xeon 2680v3 with 32GB of DDR4. Most of the training is done on an Nvidia Titan X. We’ve run into problems based on the SSD speeds though and the computer will see a storage upgrade in the coming months to help speed things up further. It’s a zippy machine though. Our training can take anywhere from 1 to 8 hours depending on a bunch of factors (though we’ve had a few sessions take from 12-24 hours I believe). I’ll see if I can get a student to be more specific about this. I own the system but the students use it for this. I use it for running VMs inside of sandboxes so I can take them to pieces and experiment for work (I am a Solutions Architect for a big IT company).

We did find discrepancies and we’ve done our best to minimize them. We explain the methodology we use for capturing game piece images in the paper. It’s a chromakey solution that we add digital noise to. It’s surprisingly good once you get it tuned in.

We’ll definitely take a look at it. I think from an architecture standpoint the next big leap for us will be to roll this into ROS on the Jetson to provide a better wrapper for all of it.

Thanks! Glad to see some positivity. :smiley:

Thank you for sharing!

Hello,

Thanks for your interest,

As far as the architecture goes, we probably went through hundreds during the course of the competition season. The main problems we were trying to minimize while changing the network was the time it took to process one iteration of the network and over-fitting (finding only balls that look exactly like the training data). As our data set grew we often had to made small tweaks to the network to better optimize it. By the end of the competition season we had started to automate the process of changing the network parameters, training the network, testing it on a separate set of data, then reporting back.

We did find discrepancies between the performance in our lab and the performance at a competition. Primarily this came from changes in lighting (we had theorized that the networks were heavily dependent on the lighting of the ball and were working to fix that later in the season) and the fact that in a competition, the background of the image is much more dynamic than at a lab tracking a ball against a wall or floor.

We will certainly be looking to improve the efficiency of our process as the next season approaches and we may get to see this working live on the field!

Team 900 mentor here

No argument. There’s a good bit of science there, but probably just as much trial and error…

How many different architectures did you try before you decided on this?

We picked the overall architecture of cascaded of CNNs pretty early on. Initially we tried just a single CNN but couldn’t find a balance between speed and accuracy that worked for us. We used LBP and Haar cascade classifiers last year so we at least understood the concept.

The architecture of each individual network continues to be tweaked the more we learn about them. The nets in the paper we borrowed from were too powerful - they’re doing face detection, we’re just looking for gray blobs so our nets needed to be simpler for a) performance and b) to prevent overfitting. So part of the effort was shrinking them down to a useful size and then tweaking learning rates and other parameters to get the most out of them.

What were the specs of the computer you trained on? (And how long did it take?)

To add to Marshall’s info :

The nets we use aren’t that complex. We did train on Marshall’s monster Titan X machine, and that was nice and fast so we could iterate quickly. On the other extreme, some of the smaller nets could be trained overnight on a laptop running CPU code. That gave us some flexibility to play outside of the lab and then do a full run on the big system the next day.

Converting the individual input images to the database format used by caffe was a big bottleneck. As was not formatting our Linux drives with enough inodes. We had millions of 24x24 training images, and preprocessing them a few ways led to a file system with 10s of millions of small files. We ended up running out of inodes which meant that even though we had disk space free the file system couldn’t create new files. We’ll know better next year.

Did you find a discrepancy between boulder detection in your space as opposed to at competition?

Yes, or even in different locations in the lab. Gray boulders on off-white floors were our nemesis. Plus the boulders are reflective so the harsh red and blue lighting led to some really interesting color variation. I should post some of the stills - we really came to appreciate that gray isn’t really just gray, no matter what your brain thinks it knows.

We initially grabbed a lot of data using the chroma-key process described in our data acquisition paper. That got us a good baseline to work with.

At that point, we captured videos from random places around our school and saw what didn’t work. We used the imageclipper tool (see our github repo) to manually generate additional images of the boulders, and then used some tricks to multiply that data (adding noise, random rotations and brightness variations, etc). After a few iterations of this process we had a reasonable amount of boulder data to work with.

The other issue was false positives - detecting boulders that aren’t actually there. Our initial set of negative (non-boulder) images was just random subsets of images we know didn’t have boulders in them. Once we had the system up and running, we could run the detection code on full videos we know didn’t have boulders in them. Luckily that’s pretty much any random video not specifically related to the 2016FRC game. We captured images of anything detected in these videos and used them as additional negative images - basically retraining the net on things the last iteration got wrong.

Both helped accuracy a lot, with the down side being that it generated a lot of data.

Moving forward, I suggest taking a look at the winner of ilsvrc 2015.

MSRA sounds like some sort of antibiotic resistant flesh-eating bacteria :slight_smile:

I’d love to have the resources to run a 150-layer deep network on the robot.

But yeah, there’s lots of really cool new things out there and only so much time to keep up. I’d love to get some time to try out something like Yolo, single-shot multibox or Faster-RCNN and see if it can scale down to and embedded system. Running 1 net per input from should be more efficient than thousands if we can get the complexity down to something that’ll fit on an embedded GPU.

Excellent work 900. I always look forward to these every year.

Cool! Glad to know people are reading. We have a lot of fun working with these projects.

Machine learning is really interesting to me, but unfortunately this whitepaper goes over my head. Is there a good starting point that you can recommend?

Andrew ng’s course on coursera is an excellent start.

This is also a really good resource : http://cs231n.github.io/. Pretty sure our students put it in the paper but I wanted to make sure it didn’t get lost.

I have a question for you guys. Did you ever actually use this in competition? I find that this wouldn’t be that useful in actual competition unless you could track the ball, auto rotate and intake it with the press of a button. Were you able to implement that?

Still, very impressive as always! I look forward to seeing what you do every year!

  • Drew

Sadly not but we are very close to it and the purpose of our work has been to set ourselves up for the future, not necessarily for the current game. We’re very focused on making the process repeatable and to learn from it and how to simplify/improve implementation.

This is really the next evolution of our work last year with tracking/retrieving the recycling bins.

Our hope is to program and complete a full “cycle” with our current robot once school is back in session and all of our students are back. We’ve even got a trick up our sleeve for tracking robot pose thanks to our friends over at Kauai Labs that should make this all a lot easier than it may at first seem.

That would be really awesome if you completed that goal. I would love to see a follow up video if you ever get that working!

Thanks for the quick and informative response.

  • Drew