Has anyone done Stereo vision for FIRST?

My off season focus is computer vision, and one topic I hope to get to is localization, i.e. figuring out where the robot is based on cues in the environment. One tool that could be used as part of the solution is stereo vision. There’s only one problem. (Well, there might be more than one, but one at a time…) Somehow, I have to get two time-synchronized images from two different cameras.

That could be an issue. I think if the robot is moving, the motion will be enough that two images taken “at the same time” won’t really be at the same time, and the difference will be enough to completely throw off the distance measurements.

One option I did see from a thread here was the ZED stereolabs camera, but it’s a bit pricey for FIRST. It barely fits into the newly revised cost limits, and it looks like it needs some pricey support systems to go with it. So, I would rather find some other solution.

And so, before I go too far down a particular road, I’ll start by asking if there are any giants whose shoulders I can stand on. If you have already done it, I would be very happy to try and imitate your solutions.


I believe 971 did a stereo style vision for their Stronghold robot. They said it wasn’t easy but they got it working. You should reach out to them.

1 Like

We (2702) tried our hand at 6dof position/pose stereo vision this year with our Axis f34 camera system, but we weren’t able to get the frames synced. If you want to try stereo division, look for cameras with global, syncable shutters. We haven’t tried it, but the cheapest and easiest way to do this appears to be the raspi compute module (it has 2 camera connectors for breakout) and appropriate cameras. I’d take a look at this blog article for more info. Have fun! Stereo vision is amazing stuff, and I can’t wait for an enterprising team to “limelight” (is it ok to verb that yet?) a stereo vision system

I’d recommend just using an off the shelf stereo camera solution. The frame synchronization really isn’t fun to DIY - ZED works fine and has nice libraries (though you’ll need a Jetson).

I’ve done a lot of work with localization and 3D camera stuff in FRC - I’d be happy to answer any questions you have.

How are you planning on using stereo to help with localization?

1 Like

@marshall has been summoned


We used two Real Sense cameras from Intel this year (one facing front and one back). They allowed us to get a relatively accurate distance to the target, or any other pixel we chose. From that we could basically determine our relative position and angle to the target. This information was used to generate continuously updating motion profiles for autonomous alignment. I don’t remember if the camera uses stereo vision or time of flight to determine distance, but it was fairly easy to set up and read from our Jetson.


I have no tech knowledge in this area, but would a 3D camera feeding single image with side-by-side frames work? 'Cause there are lots of available solutions for that, such as the Lenovo Mirage, Vuze XR, etc.

I don’t claim to be an expert, but I imagine this setup would become problematic due to the warping of the images provided by the extremely wide angles on the 360 degree camera. You could try preprocessing the images before applying the stereo algorithm in order to flatten the images out, but I’m still just speculating.

I bought cheap ($90) and easy to experiment with and it works.

Kayeton Technology model KYT-U100-960R1 Stereo USB2.0 camera module with non distortion lens

So what are the problems?

Takes many weeks to get it from China.

The two images are slightly different colors - one seems a bit red and the other a bit green. I complained but the company did nothing. I don’t see why they couldn’t get two cameras to see the same. I gave it a bad review on Amazon. Microsoft cranks out thousands of those HD-3000 LifeCams and all of ours produce the same image.

The minimum resolution is pretty high. The Raspberry Pi frcvision with a modest Java GRIP pipeline maxs out 2 CPUs of the 4 CPUs. I haven’t had time for much experimenting but I think the distance calculation that I have may max out another CPU. For that high of resolution you may have to step up to an Odroid and there isn’t as vast of a support network compared to the RPi.

I hadn’t paid attention to the Intel RealSense (D435) so I looked at its Amazon entry just now. Price is good ($180) and it comes in a case. If you aren’t processing the image on an RPi or roboRIO but just get back some sort of distance value then that’s a light load on the CPU. Of course, the target then has to be isolated somehow in the camera. I have no idea how that works but it sounds like fun to try.

The way I understand it*, you can poll the RealSense’s internal processor for the distance to any selected pixel from the color camera. When we used it this year, we used traditional HSV filtering to identify the target using the color camera. From there, we got the distance to a sample of points along the width of the target and used some geometry to turn those values into our position and angle relative to the target.

* Im not a programming mentor though, so some of this may be slightly wrong

I was looking into this, and there seems to be some open libraries out there. It takes a lot of power to do 3d calculation.

I looked into using a Nvidia Jetson Nano or TX2. CUDA offers a lot of performance if you’re willing to do some programming with it. CUDA is pretty easy to learn

I have some weird software I wrote for some webcams and a Jetson in python based on this document my team wrote.Stereo Vision for FRC Robots ver 0.1.pdf (94.1 KB)
I will try to find a more recent version if possible because this one has some errors but it should be easy to implement.

I’m hoping the stereo vision provides a more accurate distance estimate to landmarks than other methods, like solvePnP. As much as anything else, I just want to implement some of the algorithms. As an off season project, the goal isn’t necessarily to have the best solution, but to learn how to do some things, and teach it to students.

Like I said earlier, the ZED is a bit pricier than I want to go. If the Raspberry Pi compute module really supports two cameras, and those cameras are close to synchronized, that sounds very promising. Have to look into that. The Intel RealSense also seems promising. It’s a bit cheaper than the ZED, and probably more accurate than anything I could do with a PI camera.

Thank you to all who have given suggestions. Much research to be done.

How it works, at least with the ZED, is you get an RGB image along with a separate matrix of the same size with depth information. So at coordinate x,y, you can get the RGB color value and then using that same x,y coordinate, look up the distance of that pixel from the depth matrix. I’m pretty sure most other RGB-D cameras work the same way.

This is really convienent, but it brings up the obvious question of “which x,y?”. To figure it out, that means you’re doing some sort of other processing on the image. For example this year for tracking retro tape, we did the normal HSV filtering, contour extraction and feature comparison that a normal non-depth camera would. The difference was we could also use depth information (found from the screen coords of each contour) to do some filtering based on how big the feature should be given the actual distance to the target. And then once we found the correct targets, convert from screen x,y + depth info to world x,y,z coords.

That means you need some CPU power to make this work. A pi + realsense might be enough, but you’ll have to try it out.

Also note there is a discount on the ZED for FRC teams which I think is still alive. I forget the details, but it might save a hundred bucks off the price? With their new version you get not only RGB+D but also visual odometry fused with IMU data…

Would anyone know anything about the accuracy of stereo vision if they used it? Also would anyone have experience with integrating the data with some other pose estimation?

@marshall shared with all of FRC in a teased picture that he posted during the week 2 FUN roundup, that team 900 had used stereo vision along with deep machine learning, AI and probably holographic topology generation to read the color wheel this year. They did share a white paper describing some of their technology (although they kept the true technologies involving stereo vision a secret).

I have heard rumors that Team 900 is working on quadocular vision systems to be able to go beyond simple distance measurements (3rd dimension) but to actually measure things in the 4th and 5th physical dimensions as well. I expect that these techniques will be released in a white paper soon so that FRC can finally leave the Flatland era once and for all.

I’ve also heard that they have figured out a way to use various Fourier and Laplace transform techniques to not only enhance the images spatially, but also to enhance them temporally so that their vision system can actually see into the future.


If you want depth information then I’d recommend the Intel RealSense D435. It’s easier to use and less expensive than the Zed, and on an FRC field it gives similar results.

With that being said, I did full-field localization on our robot this year and I didn’t need a Zed or D435. Instead I wrote an EKF and fused pose estimates from the T265 VSLAM camera, retroreflective tape PnP pose estimates, and encoder odometry.

1 Like

Ehh… you get what you pay for? Definitely not “similar” results in our experience and we’ve got a lot of experience:

That doesn’t include the ones currently at the lab, including the T265s and various duplicates… and the one in flight to us for some early adopter testing.

You’ll notice I have a collection of these at my house and not at the lab where they would be on robots and there are good reasons for that. The RealSense line is great for hobby projects and tinkering but it has not proven itself to be robust for our use in FRC at least from our experience. We’ve had issues with the Zed but overall, we’ve found the data to be much more reliable, robust, and the hardware has been mostly bulletproof.

Needless to say, The Zebracorns willl have a white paper coming out about the work we did this year and for all of the EKF fans, we’ll address those… and particle filters… and probably some other stuff. Right now, the students are focused mostly on school and planning out what we can for future work, including those papers so some patience is required. In the interim, our white paper about running neural networks on the best processor in FRC is out there and available to all.

Edit: One more comment, with the exception of the T265, the RealSense line isn’t 100% Stereoscopic vision, it’s using a color camera for at least one lens and an IR depth map for at least one of the others, which is part of our reliability issues with it.

1 Like

I can only comment on the quality of the depth information from a “do the two cameras work about the same for VSLAM” perspective. In that regard I didn’t see much of a difference between the Zed and the D435 (and disclaimer, I’ve only compared with a Zed that I don’t own.)

Did you experience something different or is your use-case for these cameras different from mine (i.e. you do something other than plugging them into off-the-shelf VSLAM algos)?

As for whitepapers, I’m also planning on releasing one about my experiences with localization in FRC; the gist is that you don’t really need the VSLAM and you can just get away with fusing encoders and pose measurements from the retroreflective tape. I might also insert something about how T265 doesn’t always work so well… I was planning to do it much earlier, but I’ve been working on contributing to WPILib that EKF, a UKF, and latency-compensated sensor fusion “wrappers” with premade models that will do all of the encoder and vision fusion for you.

Well we’ve done a bunch of experimentation. Some of which has ended up on robots and some of it has never left the incubation phase. VSLAM using the Zed worked well but it wasn’t great and the new Zed 2, with its integrated IMU, has a lot more potential for that. The Zed Mini also has some possibilities for that but we’ve put it aside in favor of the Zed 2 for now.

More commonly, we are reliant on the Zed to provide us with accurate depth to target information for specific targets we are interested in and it’s proven very reliable for that in our experience while the RealSense cameras have shown much more variability in the results we’ve seen.

All of which is to say, you might be on to something with setting aside VSLAM algorithms in favor of other options and I’m really disappointed we didn’t get to play test our code on a field this year but we will still have something to show for it in a few months.