We have started to explore detecting Cargo (red and blue balls) using our Jetson TX2. So far it looks really promising. I am starting this topic to share my experience, and hopefully learn from others.
I may post the steps we took, if there is interest from the community. But to get started, here are the resources that were critical to getting it working:
The first thing to know is that we are trying to do Object Detection (not Image Classification.)
The jetson-inference repo on GitHub has been our main resource. There are links at the bottom to YouTube videos. I found these to be the most helpful:
We tried training a model with Azure and exporting it as an ONNX model, but it didn’t seem to be compatible. If anyone else has had luck with this, or any other cloud provide, please share. In the meantime we are training the model on our Jetson TX2.
We tried using the Supervisely tool (as recommended by WPILib for use with Axon,) but the VOC data set we exported did not have the structure expected by the training tools. Instead of spending time troubleshooting we just switched to CVAT.
I have found that Roboflow has a good selection of export formats. I switched from Supervisely because I needed an easy way to export my dataset. They even have python code to download directly into a Jupyter notebook or just as a zip.
I was even able to import my Supervisely dataset and had minimal tweaking of the classification boxes.
We bought the TX2 in 2018 and never used it. I think it was around $400. Yes, it is FRC legal. There are very few rules for “custom circuits” beyond writing safety.
We are planning to use a Jetson Nano on the robot. We have a couple on order, but we had to resort to eBay for around $120 in a “kit” since they are out of stock at all the usual outlets.
Same issue here. We’re following a very similar course of action as you guys have with the jetson-inference repo. Purchased a Jetson Nano 2gb in the fall for R&D and now want to have it on the competition robot, but can’t find any Nanos in stock anywhere except for all the sketchy ebay listings and overpriced kits.
Crossing our fingers it comes back in stock soon somewhere but will most likely resort a kit as well.
How is detecting cargo at a distance going for you guys? For us we’ve been struggling to reliably detect cargo after about 2.5m.
I’ll be testing out some other ideas for increasing this effective distance this week but just wondering if you have figured out a solution to this sort of issue already!
We’re seeing similar results, but it might be good enough for our purpose.
Apparently, with the defaults the model is trained and detection runs at 300x300 pixels, so that explains why it becomes difficult to see distant objects. I’m currently working on training our model at 512x512 to see how it performs.
Ha! Seems like we’re right on the same course as each other!
I tried modifying the training config for 512x512 yesterday afternoon and it yielded significantly better results. 12-15 feet detection was working much better, but my concern now with 512x512 is the reduced performance. ~15fps is what I was getting on 512x512 without a output preview.
Pretty similar. The 300x300 network itself was taking about 32ms per image(CPU ops with CUDA ops at ~20ms), but actual FPS was between 25-30fps without preview.
The preview tool I have written seemed to drop fps to about 20 fps at 300x300, as I have it running in a separate thread and rate-limited so it only writes an image to the MJPG stream every few frames.
I am streaming the preview with CameraServer so I can view it on the driver station. I’m running on the same thread and not dropping any frames. With preview I’m getting about 25 fps at 300x300.
We’ve been doing pretty well with our Jetson Nano 2GB. At first we tried to use Axon to train a model with data labeled on supervise.ly, but then noticed that Axon was training both the red and blue balls with the same tag, even though we’d labeled red and blue separately in our training data.
After this, we started just training our models using supervise.ly by connecting a Windows machine w/graphics card via WSL. We tried SSD MobileNetV2, but ended up settling on YOLO v3 Tiny due to its better performance.
Don’t remember the performance off the top of my head, but inferencing was running at ~18 FPS (with image preview) with an input image of 640x480. Pretty good detection as well, out to 20+ feet.
Our current roadblock is actually sending the data back to the roboRIO now. Ideally we want to send the detected objects over NetworkTables, but we can’t find good documentation about using NetworkTables in C++ on a coprocessor. Has anyone dealt with this issue before?
We’re currently using an input image at 1280x720 and getting 25 fps with our whole pipeline. I would use a lower resolution image, but 720 gives us the right field-of-view (FOV) with the camera we’re using. We found that the input image size doesn’t matter much for detection speed since it scales the image down anyway. The FOV has the biggest impact on how far away objects are detected because larger FOV means objects appear smaller at a closer distance.
I have used NetworkTables on a coprocessor with Java and Python, but not C++. If I was going to give C++ a try I would probably start with this test in the ntcore code. Good luck!
Sounds like an interesting thing to try! How bad is the delay from the Jetson to the dashboard for you with the cameraserver?
We’re trying to figure out if it’ll be possible for the camera feed to be usable as a reference for the drivers from time to time during the match, but currently the delay makes this impossible. I’m also concerned about the network rate limiting during matches making the delay even worse.
We’re scaling the image down to 640x360 for streaming and the delay is not bad. The bandwidth limit during matches could certainly be a problem if you have other cameras streaming as well. We’re not planning to use the stream during a match.
If you’re interested, I posted relevant bits of the code to this thread yesterday. I was able to use CvSource.isEnabled to skip processing the image for streaming if nobody is watching.
cameraserver vs gstreamer, any idea on latency differences? I heard you can <100ms with gstreamer, if you build the gstreamer pipeline with omx I believe you can do h264 which will save bandwidth. I’ve only tried gstreamer on a raspberry pi though (and mjpeg)
I don’t know about gstreamer but I like using the WPILib libraries as a default for FRC because they work together, and it’s less libraries for the kids to learn and understand. For example, with CameraServer you don’t have to do anything special to get a view of your camera in shuffleboard. I only deviate off the path if the purpose-built components don’t perform or don’t provide the needed functionality.
We are using a jetson nano and went through something similiar as y’all. We first used supervisely to make our dataset, we then gave it to roboflow and exported that dataset to colab. From there we trained three models: the first one was mobilenetv2, then we trained a faster RCNN model, and lastly a yolov5 model. We built the yolov5 model to an onnx file however we had problems with tensorrt (based off of the jetson inference repo) and the onnx file. We didn’t know what params to give detectnet for the input and output layers so we gave values that sort of worked (as in it passed that section of building) but we don’t know how to actually find what names to give it. Does anyone have any suggestions to fixing the compatibility issue with a yolov5 onnx file and tensorrt with jetson-inference?
I have only used mobilenetv2 so I am not much help. From what I can tell, YOLO is not supported by jetson-inference.
The Nvidia Developer forum is a great resource. If you post in the right place with a specific question, you will probably get a response from an Nvidia employee within 24-hours.