Has anyone done Stereo vision for FIRST?

My off season focus is computer vision, and one topic I hope to get to is localization, i.e. figuring out where the robot is based on cues in the environment. One tool that could be used as part of the solution is stereo vision. There’s only one problem. (Well, there might be more than one, but one at a time…) Somehow, I have to get two time-synchronized images from two different cameras.

That could be an issue. I think if the robot is moving, the motion will be enough that two images taken “at the same time” won’t really be at the same time, and the difference will be enough to completely throw off the distance measurements.

One option I did see from a thread here was the ZED stereolabs camera, but it’s a bit pricey for FIRST. It barely fits into the newly revised cost limits, and it looks like it needs some pricey support systems to go with it. So, I would rather find some other solution.

And so, before I go too far down a particular road, I’ll start by asking if there are any giants whose shoulders I can stand on. If you have already done it, I would be very happy to try and imitate your solutions.

1 Like

I believe 971 did a stereo style vision for their Stronghold robot. They said it wasn’t easy but they got it working. You should reach out to them.

1 Like

We (2702) tried our hand at 6dof position/pose stereo vision this year with our Axis f34 camera system, but we weren’t able to get the frames synced. If you want to try stereo division, look for cameras with global, syncable shutters. We haven’t tried it, but the cheapest and easiest way to do this appears to be the raspi compute module (it has 2 camera connectors for breakout) and appropriate cameras. I’d take a look at this blog article for more info. Have fun! Stereo vision is amazing stuff, and I can’t wait for an enterprising team to “limelight” (is it ok to verb that yet?) a stereo vision system

I’d recommend just using an off the shelf stereo camera solution. The frame synchronization really isn’t fun to DIY - ZED works fine and has nice libraries (though you’ll need a Jetson).

I’ve done a lot of work with localization and 3D camera stuff in FRC - I’d be happy to answer any questions you have.

How are you planning on using stereo to help with localization?

1 Like

@marshall has been summoned

1 Like

We used two Real Sense cameras from Intel this year (one facing front and one back). They allowed us to get a relatively accurate distance to the target, or any other pixel we chose. From that we could basically determine our relative position and angle to the target. This information was used to generate continuously updating motion profiles for autonomous alignment. I don’t remember if the camera uses stereo vision or time of flight to determine distance, but it was fairly easy to set up and read from our Jetson.

1 Like

I have no tech knowledge in this area, but would a 3D camera feeding single image with side-by-side frames work? 'Cause there are lots of available solutions for that, such as the Lenovo Mirage, Vuze XR, etc.

I don’t claim to be an expert, but I imagine this setup would become problematic due to the warping of the images provided by the extremely wide angles on the 360 degree camera. You could try preprocessing the images before applying the stereo algorithm in order to flatten the images out, but I’m still just speculating.

I bought cheap ($90) and easy to experiment with and it works.
http://www.kayetoncctv.com/synchronization-1-3mp-hd-960p-stereo-3d-vr-dual-lens-usb2-0-camera-module-for-biometric-retina-retina-analyze/

Kayeton Technology model KYT-U100-960R1 Stereo USB2.0 camera module with non distortion lens

So what are the problems?

Takes many weeks to get it from China.

The two images are slightly different colors - one seems a bit red and the other a bit green. I complained but the company did nothing. I don’t see why they couldn’t get two cameras to see the same. I gave it a bad review on Amazon. Microsoft cranks out thousands of those HD-3000 LifeCams and all of ours produce the same image.

The minimum resolution is pretty high. The Raspberry Pi frcvision with a modest Java GRIP pipeline maxs out 2 CPUs of the 4 CPUs. I haven’t had time for much experimenting but I think the distance calculation that I have may max out another CPU. For that high of resolution you may have to step up to an Odroid and there isn’t as vast of a support network compared to the RPi.

I hadn’t paid attention to the Intel RealSense (D435) so I looked at its Amazon entry just now. Price is good ($180) and it comes in a case. If you aren’t processing the image on an RPi or roboRIO but just get back some sort of distance value then that’s a light load on the CPU. Of course, the target then has to be isolated somehow in the camera. I have no idea how that works but it sounds like fun to try.

The way I understand it*, you can poll the RealSense’s internal processor for the distance to any selected pixel from the color camera. When we used it this year, we used traditional HSV filtering to identify the target using the color camera. From there, we got the distance to a sample of points along the width of the target and used some geometry to turn those values into our position and angle relative to the target.

* Im not a programming mentor though, so some of this may be slightly wrong

I was looking into this, and there seems to be some open libraries out there. It takes a lot of power to do 3d calculation.

I looked into using a Nvidia Jetson Nano or TX2. CUDA offers a lot of performance if you’re willing to do some programming with it. CUDA is pretty easy to learn

I have some weird software I wrote for some webcams and a Jetson in python based on this document my team wrote.Stereo Vision for FRC Robots ver 0.1.pdf (94.1 KB)
I will try to find a more recent version if possible because this one has some errors but it should be easy to implement.

I’m hoping the stereo vision provides a more accurate distance estimate to landmarks than other methods, like solvePnP. As much as anything else, I just want to implement some of the algorithms. As an off season project, the goal isn’t necessarily to have the best solution, but to learn how to do some things, and teach it to students.

Like I said earlier, the ZED is a bit pricier than I want to go. If the Raspberry Pi compute module really supports two cameras, and those cameras are close to synchronized, that sounds very promising. Have to look into that. The Intel RealSense also seems promising. It’s a bit cheaper than the ZED, and probably more accurate than anything I could do with a PI camera.

Thank you to all who have given suggestions. Much research to be done.

How it works, at least with the ZED, is you get an RGB image along with a separate matrix of the same size with depth information. So at coordinate x,y, you can get the RGB color value and then using that same x,y coordinate, look up the distance of that pixel from the depth matrix. I’m pretty sure most other RGB-D cameras work the same way.

This is really convienent, but it brings up the obvious question of “which x,y?”. To figure it out, that means you’re doing some sort of other processing on the image. For example this year for tracking retro tape, we did the normal HSV filtering, contour extraction and feature comparison that a normal non-depth camera would. The difference was we could also use depth information (found from the screen coords of each contour) to do some filtering based on how big the feature should be given the actual distance to the target. And then once we found the correct targets, convert from screen x,y + depth info to world x,y,z coords.

That means you need some CPU power to make this work. A pi + realsense might be enough, but you’ll have to try it out.

Also note there is a discount on the ZED for FRC teams which I think is still alive. I forget the details, but it might save a hundred bucks off the price? With their new version you get not only RGB+D but also visual odometry fused with IMU data…