Visual Odometry

I’m very interested in using visual odometry to supplement wheel/accelerometer/gyro odometry as an off-season project. I am looking into using RTAB-map running on a coprocessor (Possibly a NVidia Jetson TK1 or TX1, but I havn’t made up my mind), and using a stereo camera or RGBD camera like a Kinect.
I have some questions about this approach:
Has anyone ever tried this in FRC? I know that team 900 has used ROS before, but I havn’t seen any mention of visual odometry.
For anyone with experience with visual odometry, what are the major challenges/drawbacks?
I’ve seen robots using RTAB-map with a LIDAR scanner as well, and I will surely try this out if it becomes necessary, but I have my doubts about the consistency of LIDAR sensor on an FRC field.
If anyone has advice on this or is interested at all, let me know!


We don’t mention everything that we work on.

That being said, we’ve looked at this. You’ll find better luck with the TX2 and the Zed than other solutions. The TK1 won’t cut it unfortunately.

1 Like

We toyed with it and we got ROS working as well, but that’s another story. (issues with closed loop feedback in ROS using talons)

Main reason not to for us.

Clear FRC field walls, short, etc…
A lot of processing for real time feedback and there seems to be easier ways or items to focus on. Reflective tape, game pieces, etc…

So why focus on the field when its the game piece or a specific field element that’s important. The obvious answer is navigation, but for the effort involved the improvement in field NAV vs the effort and complexity was just not there for us.

You can also use Visual Inertial Odometry (VIO) which can either simplify or greatly complicate the problem depending on what you want to do. You would use a combination of camera and IMU data for localization, probably combined with a Kalman filter. I don’t have any experience using VIO on FRC robots, or with an FRC control system for that matter, but I do have some experience with VIO on ROS for an autonomous drone. It can be useful when you have the sensor data available anyways, which I did for leveling and flight control. Just something to consider.

We used visual odometry this year with mixed results. If you’re going to do it, I would recommend using the ZED + TX2 as marshall suggested. Though the ZED is expensive, it has some great libraries (C++ and Python) that will do the math for you and spit out X, Y, Z coords. Note that for SLAM you’re more on your own, and if you expect to get depth maps and odometry at the same time be prepared to use threading. If you use Python you can easily post data to NetworkTables using the pynetworktables library.

So how did it perform? In our testing, as long as the ZED didn’t rotate, it only got off by around 1in every 20ft. Rotation of the ZED, other moving objects, reflections from lexan, and lack of nearby objects all contribute to decreased accuracy, but it’s still pretty good overall.

Visual odometry will also force your control loops to become a lot more complicated. This was our first year with a closed-loop autonomous: we had one PID between current position (from ZED), and target position (from splines), and a second PID for robot orientation (using gyro). Surprisingly, these two PID loops fought one another. This is because the ZED did not sit directly on top of the axis of orientation of our robot. Despite some fancy math, we were never able to completely correct for this.

Finally, I should mention latency. I don’t have exact numbers, but I do know our robot tended to (slightly) spiral into its target position. This wouldn’t happen on a non-swerve robot, and the effect can be negated by using a Kalman filter with other sensors or simply by moving slower.

I just leased a new car. The blind spot detection is amazing. It will not let my wife do her tailgate routine. The total driver assistance package is amazing. The integration with cameras is amazing. I have seen the future of auto nav and lidar is not part of that. The foundation for it is RADAR. A whole RADAR system is now available on a chip. There are sensor fusion boards that take the point cloud, vision, IMU, GPS and can do amazing things. This year the demands of auton pushed the limits of robot frame of reference navigation. As Marshal, mentioned latency at the velocity we were moving this year Killed our vision solution. We had a student work on the vision system as a off season project. He delivered a working product only to have it killed by latency. Development boards are available that meet the cost constraints of FIRST. They come with libraries and demo programs. How ever unless your are manufacturer you can not get access to the secret sauce IP. The licensing agreement problem. To develop the fusion algorithms from scratch is a formidable task. I believe it’s just a matter of time before RADAR takes First auto nav to the next level.

Thanks for the feedback! I’m glad I’m not the first person to try this.
It seems that the biggest problem is going to be latency. However, I think we can work around that by using visual odometry to correct periodically, but still use inertial/wheel odometry to do real-time control.

I just returned from the IEEE International Conference on Robotics and Automation (ICRA) 2019. In terms of navigation and mapping, there is a solid consensus that for robotics, and autonomous vehicle application for that matter, the most performant approach at this time, or for the foreseeable future, will be a combination of Stereo Visual Odometry, IMU and Simultaneous Location and Mapping (SLAM).
LIDARs are becoming “ho-hum… are you still doing that?” kind of thing. When used, it is mostly as a secondary sensor.
The Jetson TX2 often is mentionned as a good stereo VO/SLAM processor.

I’ve just started to play around with the Jetson Nano, a smaller version of the TX2. I’ve built the Jetson Nano reference robot the Kaya.

I’ve got everything working on driving and object detection, and this weekend I’m going to get the mapping working. I would recommend this platform, it’s called Isaac and it’s built on ROS but has a little bit easier learning curve.


Hello - What is your current progress with the Jetson Nano and say the i435 camera?

Our team has been testing on Jetson Nano + Realsense t265. The realsense t265 has built in visual odometry based pose stream, and also a way to integrate wheel encoder to the pose stream.

Is your team willing to post any results of your tests with the t265?

After much testing, we found that realsense T265, even without wheel odometry, could provide a rough estimate of the robot for the entire match (error always smaller than 2 meter / 10 deg after 5 min of testing). And it is also accurate enough for trajectory following in auto. What I notice is that the pose starts to drift from the very start, but it is able to correct itself based on its internal map.

This year we have also implemented a ball detection algorithm using Realsense D435 similar to this paper However, we would like to see on Friday if that works consistently on field.

I’m very interested to see any results you are willing to share as you get them.

You mean the D435 or T265?
This year I am using three cameras with a Jetson Nano: a Realsense D435 for ball detection, a Microsoft Lifecam with LED light for target detection, and a T265 for localization. At the start of the game, we have the T265 calibrate itself to field coordinate based on the solvepnp results of the Lifecam.
I will open up a showcase thread with video later, and our source code is also public though quite messy and not documented:

I was mainly interested in the results of your visual odometry, I think it would be an awesome off-season project for our team.

Looks like someone else is doing something similar to us! I wrote a roboRIO JNI wrapper for the T265 along with an extended kalman filter that runs on the RIO which fuses the T265, regular retroreflective vision pose estimates, and encoder odometry. It’s a little overkill, but that’s what makes it fun. Our code is here (sorry about the mess :slight_smile:.)

I intend to release the T265 JNI wrapper that runs on the roboRIO as a vendordep that makes the whole thing plug and play. Keep on the lookout this post-season.

We also can do power cell detection, although I went with using LIDAR. It’s not that useful, so I doubt it’ll make it on the robot this year.

I am super surprised that we do almost the exact same thing.

Instead of fusing the three, what I did was to let T265 fuse the wheel odometry, disable T265 pose jump, based on vision pose estimate, we calculate the transformation matrix between T265 coord and field coord, and we recalculate that everything time we shoot.

Lidar and rgbd camera also share a lot of similarities.