What has been your experience with VSLAM in FRC?

Hey Everyone,

We have been thinking about making a VSLAM system for robot localization as a summer project. I want to get some feedback on the project.

The goal:

The goal is to create a system that is accurate enough to aim a turret throughout a match. We hope that this system removes the reliance on tracking retroreflectors so that we always have the robot’s position and so we never have to worry about tracking on the wrong subject.

My thoughts:

I think that VSLAM is the best option for this type of localization; VIO would have too much drift and lidar-based SLAM seems like it would have issues with the polycarbonate on the field. 4915 successfully combined Intel’s T265 SLAM camera with wheel odometry data (https://www.chiefdelphi.com/uploads/short-url/wy264PCiOgLa5oxdi9QR4dbtaYS.pdf), so it seems like VSLAM is possible. The issue is that SLAM is a very computationally expensive task, which means that a CPU-based SLAM system basically requires an x86 processor. Putting a NUC on an FRC robot does not seem practical because it’s large, power-hungry, and expensive. I talked to Kudan, a company that develops SLAM software, about this project. They thought it would be better to record a video on inspection/practice day, then use the video to generate a point map of the field in the pits, and then upload the map to the robot to run a localization algorithm. I think that this process seems volatile and unnecessarily complicated. The last option, which seems like the best option, is GPU accelerated SLAM. This would mean we could use a jetson SBC, which is practical to put on a robot. It looks like Nvidia’s Isiac Elbrus SLAM is GPU accelerated, efficient, and accurate.

Our hardware:

We have a jetson nano and Intel D435. A D435 is unideal alone because Elbrus likes having an IMU to fill in times when tracking is lost, but it is probably fine. However, a jetson nano will probably be a massive compute bottleneck; Nvidia shows that an AGX Xavier runs at 91 fps, and they say that 30fps is a minimum for good tracking. A jetson nano has no chance of achieving that. So it seems like to use SLAM, we need to get a Xavier NX.

I’m excited to see other teams’ thoughts and experiences on the subject.

1 Like

We hope to have experience with this next year :smiley:

Last year we did some initial trials with rpi4 and Jetson nano for game pieces tracking and navigation.

But, the only things that eventually were on the robot were 1 aligning the robot for climbing based on image processing of the lines on the carpet … (Rpi4) and 2 pose estimation using limelight and the reflective tapes.

All other stuff we tried to develop ran too slow or not stable enough to be operational.

1 Like

While I don’t have any experience actually doing it, I can offer $0.02:

It is very computationally expensive, but I wouldn’t 1-to-1 this with a particular architecture. I’d lean more on what you said later - an appropriate selection of computing hardware (GPU + CPU + form-factor + power supply + connectivity) is what’s important.

As a general rule, the opportunity to collect bespoke data about the environment with reduced noise is not something you want to pass up.

Assuming your end goal is increasing robot performance, I’d start with the assumption that what you really care about is the Localization part of VSLAM. The Visual, Mapping, and Simultaneous aspects are all just byproducts of one particular set of algorithms that get you to Localization.

If there’s any way to shortcut some of the simultaneous-mapping aspects to get to a better localization from t=0, I’d say it’s very value add to do that. Especially since t=[0,15] is arguably when good localization is the most important to robot performance.

Seems like a reasonable hardware choice, and I know at least two other teams that have attempted putting one to the field (though, without an appreciable increase in robot performance due specifically to the sensor).

The big question in my mind - a historical roadblock to IR depth sensors (including 1d and 2d lidar) is the polycarb on the field perimeter - it’s hard to get a reliable return from the field border.

The little bit of research i’ve done in this space was specifically on monocular visual SLAM: Monocular SLAM for Visual Odometry: A Full Approach to the Delayed Inverse-Depth Feature Initialization Method

My reasoning: To make this approachable for the most number of teams, you’d want a turnkey solution. This involves using hardware they already have, like raspberry Pi’s (plus single camera) and limelights. However, I never did get anywhere near far enough to know if getting it to run ~10FPS or more was a reasonable goal.

In all these efforts, the maximum performance robot will definitely use all available sensors - a SLAM system, plus encoders, IMU, etc. A Kalman filter (or something similar) will fuse these estimates together.

However, the real “but was it useful” answer will be in the Kalman filter std. dev matricies. Knowing how much information the SLAM system injects above-and-beyond the other sensors will help gauge usefulness.

4 Likes

There are a lot of small mini PCs consuming 25w and under.

Some Ryzen based (but tend to be expensive) and plenty of Intel.

Thanks for the response!

This is a really good point. I agree that talking about computers based on their architecture is a gross oversimplification. However, most CPU-based SLAM implementations are CPU-based because the algorithm is not parallelizable. This means that they are also single-threaded. There are certainly ARM computers with great single-threaded performance, like Apple’s M1. But the ARM SBCs that I have seen, like the RPI4, use ARM for its power efficiency and not to compete with x86 SBCs for single-threaded performance. So I don’t know of any ARM computer that can reasonably run any CPU-based SLAM. Actually, that’s not completely true. Oculus has a really accurate SLAM running, in parallel with rendering a game, on a Snapdragon processor. But they have also hired some of the leaders in SLAM development to develop a SLAM specifically for their application and hardware.

This is also a really good point. I think that even if we do run SLAM, I agree that some out-of-competition mapping to generate high resolution and lower noise maps is a good idea. I am just worried about completely relying upon these maps. What if a lighting change between when the map was generated, and the match is played causes the localization algorithm to lose tracking?

I was thinking of using its stereo global shutter cameras and just turning off its IR and RGB cameras. I don’t think RGB-D has any advantage over Stereo for SLAM, and even if it should work better, the polycarbonate will certainly cause issues.

I agree with the goal to make this approachable to as many teams as possible. I think that the best first step for this is to make a working system, with less worry about approachability, and then simplify based on what seems the least necessary. I also think that expensive computing hardware is a larger barrier to enter than getting a stereo global shutter camera. Some teams will already have stereo global shutter cameras ( like our D435 ), and stereo global shutter cameras can be bought for a reasonable price ( like Arducam 1MP*2 Wide Angle Stereo Camera for Raspberry Pi, Jetson Nano and Xavier NX, Dual OV9281 Monochrome Global Shutter Camera Module ).

That is so true. I am amazed at how 4915’s Kalman Filter fused individually inaccurate position estimations into something that looks usable. Given that it was running on a T265, which is not known for being good at SLAM, it is possible that Elbrus + Jetson Nano + Stereo Global Shutter + Wheel Odometry is enough. It may be a while before I can test this system, but I will definitely post any progress.

Paging @pietroglyph if he still checks chief delphi. He knows a thing or two about VSLAM. His previous posts probably have something.

I did a project on this during the first part of the year with decent results. I would recommend looking in this algorithm as it worked fairly well for me and already has support for the addition of fiducials which may be coming in future years.

Have you thought about using the pose estimator stuff that is built into WPILib. This tracks field position based on odometry and a starting pose and has the ability take a pose estimate update from whatever SLAM method you want to use (Actually, it doesn’t really need the AM part of SLAM as the map is pre known, so really it’s only a localisation problem). It even has the option to supply processing delay times so that the pose estimates can be applied to a past position and wound forward with odometry. It also supports the whole probabilistic approach to localisation with confidence Standard Deviations from your localisation method which handles false readings and noisy sensors. I wouldn’t be re-inventing the wheel when some very smart WPILib programmer has already done the hard yards.

I know the top end teams in Australia are already doing this with pose estimates from limelight angles combined with Gyro headings.
If you come up with a full blown SLAM solution in a box that has multiple inputs like multiple cameras or 360 degree cameras, lidars and ultrasonics, sort of like the limelight guys did with vision processing, you would have one hell of a saleable product. There are lot of teams branching into Swerve drive and SLAM and this definitely the next growth area in FRC robotics. We just need to watch out the robot AI doesn’t start to kill off the driver because it thinks it can do a better job at driving.

I cant wait. Every FRC team, small or large will have localisation once FIRST start putting these things around the field.

See Using ROS for inspiration in FRC

TL;DR 90% of our shots at Champs were aimed solely using localization from 2 LIDARs and wheel odometry. We are able to see the polycarb walls when the laser doesn’t hit it at too shallow of an angle, which is enough. It requires way less processing power, and there is no chance of grabbing features outside the field that will change from event to event or even throughout an event

1 Like

LIDAR targeting worked great this year because the goal was relatively large. What we observed was if our bot was being heavily defended, it’s position would slip by a few centimeters. It would quickly regain its position after it was able to move again. I think this can be improved by accounting for accelerometer data in our odometry.

If the goal is smaller in a future shooting game, I would want to enhance it with an additional sensor. The LIDARs would provide a very close estimate of the goal and some vision sensor (a Limelight or object detection system) would refine it to the millimeter range.

For the purpose of autonomous robot navigation, I don’t think you can top LIDARs for this style of game. By the end of the season, we got it to be very robust. It never lost it true position for more than a second through out our worst defense matches.

Also, you can’t underestimate the value of no field calibration. I was surprised how little time we had to calibrate before the competition began (this was my first FRC season). Another thing to contend with is practice fields are rarely the same as the real field. Translating our starting waypoints and localization settings was super easy compared to working with 3D maps. No calibration means no frantic competition mistakes.

The Jetson Xavier NX can easily handle AMCL sitting at ~20% of a single core usage. That leaves plenty of head room for neural nets and path planners which take up much more CPU and GPU.

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.