Software Update #2: Finding the North Star
Let’s talk about AprilTags.
We gained a lot of valuable experience last year with full field localization using reflective tape (more info here), and we’re very excited about the opportunities that AprilTags provide with superior precision, robustness, and ease of use.
I also want to note before getting into the details that the code for this project is not designed to be “plug and play” in the same way as many of our recent projects. See the important note in the “Code Links” section below for more explanation.
We’ve opted to develop a custom solution for AprilTag tracking, which we’re calling Northstar. This project began in fall of 2022, and we have a couple of reasons for taking this route rather than using a prebuilt system like PhotonVision or Limelight.
- We used this project during the offseason as a training opportunity, bringing in some of our newest members. We were fortunate enough to have the time and resources to take this extra step without making sacrifices in other areas.
- A custom solution gives us a greater ability to optimize for our specific choice of hardware, both in terms of performance and reliability. We can quickly debug any issues we encounter or experiment with different approaches to improve performance.
Northstar is written in Python using OpenCV’s ArUco module. Image capture and MJPEG decoding is handled by a GStreamer pipeline. Ultimately, our plan is to have four cameras on the robot processed by four Northstar instances running on multiple coprocessors. Given this requirement, we wanted to make it easy to configure and manage multiple instances. A traditional web interface makes it challenging to keep track of each device’s configuration. Instead, Northstar relies on NetworkTables as its primary interface:
An MJPEG stream is available through a browser when necessary. All of the camera settings can be tuned on the fly, but the real benefit of using NT is that the default configuration is stored by the robot code and pushed to the coprocessors on startup. After retuning one camera, we just adjust the constant, restart the code, and every instance will be using the correct configuration.
Calibration is also managed through the NT interface, with topics used to start calibration and capture frames. The final calibration data is device specific and is therefore saved locally as a JSON file. Northstar uses a ChArUco board for calibration instead of the traditional checkerboard. This has the advantage of not requiring the full board to be visible, making calibration much more robust to disturbances like odd lighting (or a hand covering the board). Here’s an example of ChArUco tracking with Northstar:
We’re testing with a 2MP Arducam OV2311, which has a maximum resolution of 1600x1200 at 50fps. This camera is both global shutter and grayscale. Receiving a grayscale image allows us to remove the step from the pipeline where we convert from a color image. Interestingly, we still receive three channels from the camera but OpenCV’s ArUco module seems to deal with it correctly while having less of a performance impact than an explicit conversion. For testing, we have mounted one of these cameras to our swerve base:
Here’s a video of what the tag tracking looks like from the camera’s perspective:
We confirmed through early testing that having a global shutter camera greatly increases our ability to detect tags while moving (in combination with low exposure settings). Here’s an example of tracking tags with the robot spinning at full speed:
Below is a frame captured from the fastest part of the spin:
The tags have very little blur, and the image is not skewed from top to bottom like a rolling shutter would produce. The pose estimate from this frame is nearly as accurate as if the robot was stationary.
Coprocessors & Performance
From the start, we knew that there would always be tradeoffs between range, framerate, and accuracy. Those tradeoffs are affected both by the choice of hardware and the tracking parameters. We opted to start by maximizing resolution, because it increases our maximum range and decreases noise at low range. This is why we specifically chose a camera with a resolution of 1600x1200 (“1080p”) rather than the more common 720p variants.
Of course, running a camera at 1080p requires a powerful coprocessor. We began our testing with a Le Potato board. This was able to achieve about 10-15 fps at 1600x1200 under normal conditions. While perhaps usable with appropriate sensor fusion (pose estimation) on the RIO, it was far from ideal.
We have now switched to running on an Orange Pi 5, which is significantly more powerful. On the Orange Pi, Northstar can achieve around 45-50fps at 1600x1200. The maximum range for reliable tracking is about 30ft, with occasional detections up to ~35ft.
A nice consequence of our choice to use OpenCV’s ArUco module is that we have a lot of parameters to adjust in order to increase range or performance. However, we are currently running with the default parameters. For anyone curious, this means our AprilTag decimation setting is at zero. Overall, we feel that these default parameters provide a good balance:
- At low range, the noise in the data is very low. Within a meter of the target (as may be the case while scoring) the standard deviation for distance is under one centimeter.
- The maximum range is 30ft, which is already more than half the field. With four cameras, it’s likely that we will start to pick up closer tags at that distance. There are also quickly diminishing returns to having very high range because the measurement noise increases drastically.
- Having a high framerate allows us to quickly adjust our robot pose estimate while still effectively rejecting noise. See more details in the localization section below.
While we haven’t found benchmarks of other solutions at 1080p, the closest data available from @asid61 suggests that PhotonVision can achieve 22fps at 1280x720 with a maximum range of ~32 feet. Of course, this is far from an apples-to-apples comparison. We have certainly made different decisions about tradeoffs in performance vs. resolution vs. range, and we will continue to evaluate those choices throughout the season and adjust as necessary.
Localization & Pose Ambiguity
Northstar sends tag detections to the robot code using NetworkTables 4. NT4 has some nice benefits over NT3 that we’re taking advantage of, like message timestamps and queuing for subscribers. Even if multiple frames are received within a single robot code cycle, all of them will be processed (and with the correct timestamps).
For pose estimation on the RIO we’re using a custom implementation adapted from WPILib’s pose estimators. Our system is fundamentally similar, recording a history of twists for drive updates while applying vision updates with weighting based on measured standard deviations. We’ve made a couple of minor improvements:
- The drive kinematics are decoupled from the pose estimation, so our version of the class only accepts the calculated twist measuring a drive movement. This gives us more control over our swerve kinematics, like allowing us to run without a gyro in simulation (more information on that here). It’s also just our structural preference to isolate these calculations.
- When adding a vision measurement, we don’t discard older pose updates. This allows vision data to be added with out-of-order timestamps, which becomes increasingly likely with lots of cameras running at high framerates.
- Vision measurements from the same timestamp (multiple tags in the same frame) are handled cleanly. WPILib fixed the issue we were originally working around (allwpilib#4917), though we’ve also added logic to apply the vision updates in order based on their standard deviations. For each frame, applying the lowest noise measurements last effectively gives them a higher weight.
Both our pose estimator and WPILib’s classes require standard deviations for each measurement, plus state standard deviations that can be used to tune how much vision data is trusted in general. The vision measurement noise increases with distance, so we collected data for the X/Y (average) and angular standard deviations at various distances:
Pose ambiguity makes this a little tricky (if the measurement is swapping between similar poses), and the X/Y components in the field coordinate system are affected by the angle to the tag. However, this provides a decent starting point for manual tuning. We created a linear model for each dataset, then manually tuned the state standard deviations to converge quickly while rejecting noise. Our current state standard deviations are 0.005 meters for XY and 0.0005 radians for theta.
You may notice that our theta state standard deviation is very small, which means we’re trusting the current gyro measurement far more than vision data. This brings us to the issue of pose ambiguity when solving for the positions of the tags. At certain angles, there are two ambiguous tag poses; finding the correct one is critical to running a reliable localization pipeline. Here’s one example where four points can be solved as two very different poses:
Careful observers may have noticed in the previous videos that Northstar is always tracking two poses for every tag. Both poses and their errors are sent to the robot code; no filtering is done by Northstar because we need to be able to quickly retune and improve this part of the pipeline. Filtering in robot code also allows us to use AdvantageKit to test improvements in simulation using log replay.
Our primary strategy is to choose the pose with the lowest error if its error is less than 15% of the other. We chose a very strict threshold for this because we’ve seen many examples of tags with 15-40% relative ambiguity where the “better” estimate is still wrong.
For the remaining ambiguous poses, we started by always choosing the one that produced a robot pose closer to the current pose estimate. That was fine for simple cases, but started to become problematic at large distances because the vision poses became very noisy and started to overlap in range. However, we found that the reliability was significantly improved by comparing only the rotations of vision measurements to the current estimate. The robot’s gyro (a Pigeon 2) is very stable and ambiguous measurements appear to pivot the robot pose around the tag. That makes it easy to reject incorrect vision measurements because they don’t match the (much more stable) onboard gyro. This is also why our state standard deviation for the gyro is so low; we correct slowly based on vision (especially useful during startup) but mostly trust the current rotation and use it to disambiguate tag poses.
After all of that work, let’s take a look at the results. The video below shows us driving around with tags coming in and out of view. The translucent robots show individual vision updates and the solid robot is the final estimate.
For comparison, we replayed the log from that demo using AdvantageKit while ignoring all vision updates after the robot was enabled. The translucent robot shows the “pure odometry” pose and the solid robot is the same pose estimate as before that makes full use of vision data.
And here’s one last demo. This is using a simple PID controller to drive to a pose exactly one meter in front of the center tag. We turn the robot around so the camera doesn’t see the tags, offset the odometry by driving around, then flip around and immediately line up using the tag. Based on rough estimates using a tape measure, the final position of the robot is within about 1-2 cm of the target.
Several links to key parts of our code are provided below. Note that Northstar is designed for and tested only on the specific hardware we plan to use. We don’t plan to support alternate setups for Northstar, simply because we don’t have the capacity to assist anyone undertaking such an effort. Both PhotonVision and Limelight have fantastic communities of users and developers who are able to provide support.
- Northstar code (here)
- Robot-side Northstar IO implementation, reading data with an NT4 queue (here)
- AprilTag vision subsystem, producing “vision updates” for the pose estimator (here)
- Custom pose estimator class (here)