Detecting gears with deep learning

Using deep learning to find gears! This model was trained for around 3 hours with a dataset of only ~420 images:

Short video: https://www.youtube.com/watch?v=ia-qFr3SDM8

I’ll continue adding more images to the dataset (and possibly images of fuel).

Nice, similar to something we’re looking at. What localization method did you use? I’m guessing either R-CNN or YOLO v2?

Neural networks are so last year. :wink:

I kid! Great job with this. I hope you guys share your code and take some time to share your process after the season is over.

I actually used SSD. It’s supposedly more accurate than YOLO with similar speed.

While cool, it looks like gear detection won’t be that useful for our robot. I’m planning on uploading the data, pre-trained model, and instructions on how to run it sometime this week.

That’s awesome, I’ve been trying to do this year, but still have not gotten enough data.

What I did was take a couple of videos (13 so far) of the gears while varying the angle, lighting, etc. Then I converted the videos into individual frames and annotated them in MATLAB. Took me two afternoons to get a bit over 400 annotated images.

I feel like an idiot for not thinking of the hat. That should have been my first instinct. In all seriousness though that’s amazing. Did you use any backend for the training like theano or tensorflow. And what did you computer didn’t you use for training.

I used caffe and trained on a GTX 1080. Seeing how it only took 3 hours to get pretty decent results, I bet the same training could be accomplished by a laptop in a day.

I have a GTX 1080 on my laptop, so Ill try a Single shot model with Caffe later today. Ill update you if I get any results.

Just curious, why would you need to detect gears autonomously?

Well, why not? First its a really fun challenge for programmers. Its also a great learning experience for anyone interested in neural nets and machine learning. But specific for the game, it depend on your team. One use would be to automatically intake a gear. If you are on the other side of the field, visibility becomes a major issue. Especially if you are directly behind the airship. If you can track a gear, then you can press a button and have the robot intake the gear.

Our robot doesn’t really have any use for this, but I thought it would be fun to do nonetheless.

Can you please upload source code soon? We want to experiment since we’re on our off-season

Sorry I didn’t see this sooner. Anyway here is the data + code:

https://github.com/Team334/gear-data

https://github.com/Team334/gear-detector

Was this done on DIGITS with the detectNet network? Also how did you single-shot test go?

So I don’t know what OP did, but I’m guessing he used DIGITS. As for my single-shot test, it performed very well on a local machine. Training took roughly 4 hours on a titan X. However my end goal was to export the model and have it run on a phone on the robot which we were using to detect the tape. The phone was not powerful enough to sustain the network and the battery drained very quickly. Best way in my opinion is to use a Jetson onboard to run it.

If I understand correctly, the single shot detection you mentioned is just using 1 frame and getting the bounding boxes, which can be done with DIGIT’s object detection training?

May I ask what phone you were using? From what I know, to increase processing speed, you can prune your model, by eliminating perceptrons that are not needed (lower weight). Now, I don’t exactly know how to figure this out nor do I know whether you can do this on DIGITS. I haven’t been able to work on anything like this recently, but I’ll certainly look into it when I get time at the robotics lab.

I’m intrigued by the idea of using machine learning to detect objects. Can someone detail me the process of creating and training neural networks in C++? I’m most likely going to be working on a Nvidia Jetson TX1.

This is the best resource: GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.. Scroll down to find the tutorial, specifically object detection. Here’s some pointers to give before you immerse yourself:

  1. The tutorial is for using NVIDIA’s DIGITS, but there are other implementations
  2. Training takes place on a Desktop with a powerful GPU with is utilized by NVIDIA’s CUDA framework which basically allows for processing with the GPU cores.
  3. Deploying the code is a matter of downloading the model and putting it on the Jetson, which is detailed in the tutorial. If I remember correctly, it’s in C++.
  4. You’ll need to take hundreds to thousands of pictures and label them to put into training the network, which is super grueling. Initially just test it out with the given datasets.

Would perhaps a program work for this? I could feed a program a videofeed decomposer that would take a video with the gear in a variety of positions and decompose into thousands of frames.