Using a Raspberry Pi for camera tracking

I was thinking of using a raspberry pi as an on-board computer for camera tracking because it is cheap and lightweight. I was wondering if anyone has tried doing that or if it’s a good idea or not.

It’s a good idea and has been a success. :slight_smile:

I believe the folks on team 340 (GRR) did camera vision tracking on the Raspberry PI this year. If you look on the “Summer of FIRST Project” thread, they mention the project.


I know it is possible and a good idea. Don’t quote me on it but I believe 118 used the Raspberry Pi’s enhanced I/O equivalent, a BeagleBone, for the vision tracking on their 2012 robot.

3173 used the pi and had vision tracking software available but due to lack of test time it never made our competition bot. I believe that in the time since the season has ended our programmers have made it work quite successfully. I will try to get one of our programmers to post something about this.

The most impressive vision tracking I have seen in person was Aperture’s (3142 I believe). I know they have a white paper up on it, I will post a link when I find it. They used the Kinect with a Pi I believe.

EDIT: Found the paper

Thank you all, this was very helpful.

This is what I know… as I spoke with their developer on this:

They used a beagleboard ( running embedded Linux with Ethernet and USB interfaces

They used HTTP GET calls via libcurl library… processed using OpenCV and finally sent UDP packets across to the cRIO as input for the control loops.

One thing we didn’t discuss… which is something I may want to talk about at some point… is the danger of sending UDP packets if the robot is not listening to them. This can flood the buffers and corrupt tcp/ip causing the driver station to lose connection. The solution we tried to overcome this issue is to open the listener immediately on its own thread (task) that starts on power-up. This should work as the time it takes for the camera to power on (about 30 seconds)… is much later than what it takes for the cRIO to power up and start listening.

Oh yes and we both use WindRiver c++ :wink:

You can also check out this paper:
I think if you’re going to use the Kinect, you should use its depth sensor rather than just using it as an IR camera. The depth sensing it does is incredibly powerful (though it has some quirks too).

We were going to use a raspberry pi for vision tracking this year, but our robot couldn’t really aim, so it was pointless for us. We did still get it working before ship, though.

On a hardware level, we powered it by splicing the usb cable it comes with to a spare 12v-5v power converter we had. We installed the Arch linux distribution for ARM, and sent the data back over FRC’s own NetworkTables.

Basically, we used the python bindings for OpenCV, and we just loaded the MJPEG file with opencv.VideoCapture(“”). Then we did some math (we actually found our position by using the angles of elevation to all the goals we could see to find distances, and then “triangulating” ourselves)

Lessons learned:
Keep a spare SD card, with everything you need installed. Our raspberry pi inexplicably stopped working at some point, and needed a reinstall, which we wouldn’t have been able to do at a regional.
Do everything in one language, don’t mix and match python and java.
Figure out if we actually need vision tracking before we build it. (hopefully not a problem for you)

All in all, we won’t be using it next year. I think we’ll put the classmate on the robot, so it has an external battery.

This is a great link! and I’d like to highlight something you said from it here:

Your team’s vision system really inspired us to take another look at vision too though. Using the dashboard to do the processing helps in so many ways. The biggest I think is that you can “see” what the algorithm is doing at all times. When we wanted to see what our Kinect code is doing, we had to drag a monitor, keyboard, mouse, power inverter all onto the field. It was kind of a nightmare.

From our experience seeing the algorithm is so important… like when tuning the thresholds dynamically. We also wanted to capture some raw video and do offline testing and tweaking of the footage to fix bugs in the algorithm code (and to improve it, by eliminating more false positives).

I think the ability to see the algorithm is one valid argument to the question “I was wondering if anyone has tried doing that or if it’s a good idea or not.”

The only drawback with dashboard processing is bandwidth using mjpeg. If you want 640x480 resolution using default settings it costs about 11-13mbps. For this season we are capped at 7mbps and anything above 5 starts to introduce lag (as written in the fms white paper). We are looking into using h264 solution that gives 1.2 for good lighting - 5 for poor lighting using full 640x480 quality. This will roughly yield a 5ms latency, which should be plenty fast for closed loop processing. If more teams start to use vision for next season, we should all really want to encourage all teams to use lower bandwidth so that controls will continue to be responsive (i.e. everybody wins).

That’s not necessarily true. Take a look at the images in 341’s whitepaper. It was quite an ah-ha moment for us when we realized that their 640x480 images were the same size as our 320x240 images from that year.

I think you alluded to it when you said “poor lighting”, but that’s not how I’d look at it. It’s lighting optimized for the task required. There’s been a lot of discussion about using the hold exposure setting to get the correct conditions.

One technique that can be used to “optimise the lighting conditions” and to help minimise bandwidth is as follows:

Throw as much light at the retroreflective tape as is reasonably possible. This could be done by using multiple concentric LED rings. (We are using 3.).
This will result in the reflected light being substantially brighter than surrounding area. In fact, and hopefully so, it will saturate the camera’s detector in the reflected light region.
Now, reduce the exposure (time), and lock it, to the minimum amount that still generates a useful, but not quite saturated, image of the target. Doing this will also reduce the amount of signal coming from anywhere else that is not the target, and practically eliminate those parts of the image. What remains in the image is not much more than the target it’s self.
When the camera compresses that image, the amount of data sent is minimal, and thus reduces the bandwidth required to send the images across the network.

Ah-ha… I think I get what you are saying… that is to use a fast exposure to get better compression for darker images that are optimal for target processing. That makes sense, but would limit the use of the camera to just targeting. The context I posted was for a dual use-case of being able to view images with video levels balanced and obtaining targeting information from that.

Thanks for this link… I was actually looking for it again… there is something in there I’d like to quote again here because it is really great advise.

From Jared341:

Changing the default camera settings is the most important thing you can do in order to obtain reliable tracking and stay underneath the bandwidth cap.

In particular, there are six settings to pay attention to:

  1. Resolution. The smaller you go, the less bandwidth you use but the fewer pixels you will have on the target. If you make all of the other changes here, you should be able to stay at 640x480.

  2. Frames per second. “Unlimited” results in a 25 to 30 fps rate under ideal circumstances. Depending on how you use the camera in a control loop, this may be overkill. Experiment with different caps.

  3. White balance. You do NOT want automatic white balance enabled! Failing to do so makes your code more susceptible to being thrown off by background lighting in the arena. All of our Axis cameras have a white balance “hold” setting - use it.

  4. Exposure time/priority. You want a very dark image, except for the illuminated regions of the reflective tape. Set the exposure time to something very short. Put the camera in a bright scene (e.g. hold up a white frisbee a foot or two in front of the lens) and then do a “hold” on exposure priority. Experiment with different settings. You want virtually all black except for a very bright reflection off of the tape. This is for two purposes: 1) it makes vision processing much easier (fewer false detections), 2) it conserves bandwidth, since dark areas of the image are very compact after JPEG compression. The camera doesn’t know what you are looking for, so it will try to send you the entire scene as well as it can. But if it can’t see the “background” very well, you are “tricking” the camera into only giving you the part you need!

  5. Compression. As the WPI whitepaper says, this makes a huge difference in bandwidth. Use a minimum of 30, but you may be able to get away with more (we are using 50 this year). Experiment with it.

  6. Brightness. You can do a lot of fine tuning of the darkness of the image with the brightness slider.

I would say use an O-Droid device. such as the X2 or U2. 1706 used it and was running the vision code at >25 fps. It is much more powerful than the raspberry pi and the same size. We powered it through the 5V port in the power distrubution board. It needed a heat sync, so we 3d printed a case for it, which allowed an installment of a fan, and wired that into the power board too.

Team 3946 successfully used a Raspberry Pi to get distance and angle from center data from the camera over a TCP Socket Connection in its own thread for use in an (almost successful) auto-aim system.

Our code is available on github:

Robot Side (Java): This class establishes a TCP Socket Client in a new thread and attempts to connect to the Raspberry Pi, from there you can design and subsystem and/or command to do the calls for new data:
Raspberry Pi Side (Python):