I actually used SSD. It’s supposedly more accurate than YOLO with similar speed.
While cool, it looks like gear detection won’t be that useful for our robot. I’m planning on uploading the data, pre-trained model, and instructions on how to run it sometime this week.
What I did was take a couple of videos (13 so far) of the gears while varying the angle, lighting, etc. Then I converted the videos into individual frames and annotated them in MATLAB. Took me two afternoons to get a bit over 400 annotated images.
I feel like an idiot for not thinking of the hat. That should have been my first instinct. In all seriousness though that’s amazing. Did you use any backend for the training like theano or tensorflow. And what did you computer didn’t you use for training.
I used caffe and trained on a GTX 1080. Seeing how it only took 3 hours to get pretty decent results, I bet the same training could be accomplished by a laptop in a day.
Well, why not? First its a really fun challenge for programmers. Its also a great learning experience for anyone interested in neural nets and machine learning. But specific for the game, it depend on your team. One use would be to automatically intake a gear. If you are on the other side of the field, visibility becomes a major issue. Especially if you are directly behind the airship. If you can track a gear, then you can press a button and have the robot intake the gear.
So I don’t know what OP did, but I’m guessing he used DIGITS. As for my single-shot test, it performed very well on a local machine. Training took roughly 4 hours on a titan X. However my end goal was to export the model and have it run on a phone on the robot which we were using to detect the tape. The phone was not powerful enough to sustain the network and the battery drained very quickly. Best way in my opinion is to use a Jetson onboard to run it.
If I understand correctly, the single shot detection you mentioned is just using 1 frame and getting the bounding boxes, which can be done with DIGIT’s object detection training?
May I ask what phone you were using? From what I know, to increase processing speed, you can prune your model, by eliminating perceptrons that are not needed (lower weight). Now, I don’t exactly know how to figure this out nor do I know whether you can do this on DIGITS. I haven’t been able to work on anything like this recently, but I’ll certainly look into it when I get time at the robotics lab.
I’m intrigued by the idea of using machine learning to detect objects. Can someone detail me the process of creating and training neural networks in C++? I’m most likely going to be working on a Nvidia Jetson TX1.
The tutorial is for using NVIDIA’s DIGITS, but there are other implementations
Training takes place on a Desktop with a powerful GPU with is utilized by NVIDIA’s CUDA framework which basically allows for processing with the GPU cores.
Deploying the code is a matter of downloading the model and putting it on the Jetson, which is detailed in the tutorial. If I remember correctly, it’s in C++.
You’ll need to take hundreds to thousands of pictures and label them to put into training the network, which is super grueling. Initially just test it out with the given datasets.
Would perhaps a program work for this? I could feed a program a videofeed decomposer that would take a video with the gear in a variety of positions and decompose into thousands of frames.