Vision rings or yolo for auto alignment?


This year, our team (Iron Panthers - 5026) has focused their energy on making sure live video streaming is low latency so that they can adapt to changing conditions and perform well in both sandstorm and teleop. We do this with a straight up gstreamer feed operating on a jetson and feeding to gstreamer on the DS (using pseye usb cameras). Under 100msec of latency with 2 cameras in parallel. Works great!

At our first competition (CVR) we decided to record the resulting POV match videos with a plan to using these for some vision alignment work. As the team worked on classic vision alignment approaches, we thought it might be interesting to also try training something like yolo to recognize important field elements using the POV videos. Who knows, maybe this would work well?

Thus came about the use of - a very nice saas service to label datasets and kick off training (no, I have no affiliation with them) - we extracted close to 400 images from various POV videos and labelled key things. We started with: hatch covers, red robots, blue robots and cargo
After training with around 50 images for this dataset, it looked promising, so the team got busy. We also got more ambitious and labelled close to 400 images in total from various POV videos and included things like: alignmentmarks, hole, stickyvelcro in addition to the above.

After training a network on these, we then tried running it on video from SVR (all offline).
For this, we copied the trained model and used this code to generate labelled video. The resulting detections are shown here:

Pretty cool. If anybody has experience running yolo on any of the jetson’s… would love to get some insights as to whether yolo is likely to be fast enough on the jetson or tx1 to run live and publish to networktables so we could then do auto alignment for an off-season project. My guess is that tiny-yolo might even work well enough - but that is a separate training exercise for another day.

We’ll be in our pit at Houston (Newton) if anybody wants to come by and discuss.


Anything on the jetson should be using TensorRT, that’ll accelerate it by a lot. I’d recommend SSD instead of YOLO, higher mAP and a lot more resources (I think there’s reference impls for RT).

I’d still recommend classic vision (retro tape + opencv) for anything where you need precision - neural networks are less consistent and aren’t going to give you accurate bounding box locations.

If you’re looking to start doing more machine learning stuff, I’d highly recommend learning from the ground up, so to speak. That’s starting with things like implementing basic feedforward nets and testing activation functions and stuff like that. That way, when you encounter issues and need to troubleshoot more complex nets like object detection, you’ll have a better basis and foundational knowledge to debug things - I see a lot of teams try to use neural networks but encounter significant difficulty in the debugging/tuning process.

I’d also recommend way more images than 400, but your results are pretty great for a preliminary model.

Looking forward to seeing what you guys come up with!

1 Like

We use gstreamer on the Jetson this year as well. I also attempted to train a network to detect the orange balls. I’ll connect with you in your pit. I’d love to hear more about it.


Do you mind sharing your labeled dataset on


Here you go:
qadimages.tar (8.9 MB)

In turn, if anybody is able to get a trained network running from these… would be great if you could share.




Some benchmarks on a TX2 :

I’d expect ~1/2 perf on a TX1, give or take? So as always, it depends on what you mean by “fast enough”.

And a repo for getting Yolo running using tensorRT, which might give improved performance :