Here is the CAD for that. It is currently built, but not tested, so we should have videos of it soon.
Updated Auger Write-Up
As seen in the videos from the previous post, we made some significant updates to the flange grabber/auger prototype. After countless iterations of finding the optimal compression and exploring roller options, the prototype has gained some very effective improvements. After the unsuccessful 0.4” compression version, we had low hopes for this intake going forward. But, after trying the 0.7” compression and new auger design, we were very pleased to see it work quite well. We also added a 3rd roller of 3” compliant wheels as a safety net to kick up cones that could have missed the top 2 rollers. We are also exploring the idea of using this 3rd roller to actuate up and intake cubes.
Please note the two plywood plates sticking out on the sides were a quick idea to adjust cones at odd angles, but after further thought we will likely not use this on the robot. It inhibits our ability to quickly ram into walls and collect game pieces at the Loading Zone, and being able to do this was a major reason for wanting a ground intake to begin with. Another solution we are considering for this problem is spinning the bottom 2 rollers in opposite directions to intake the tip of the cones.
Overall, the design is looking quite promising as it is intaking and vectoring cones very quickly, so we are likely going to use this as a ground intake on the robot.
*The assembly is not very robust; we only needed some parts to make the essential side plate dimensions.
Charge Station Testing
We began playing around with balancing on the Charge Station. This gave us a reference point for what frame dimensions we might want to use.
Starting from today the team will be transitioning away from initial intake prototypes and beginning our robot CAD. We haven’t set a hard deadline yet, but we expect to have most of the CAD done by the end of next week.
The link for the auger CAD seems to be broken, just takes me to the onshape landing page.
The design looks really proimising, great work!
Just edited the message, it should be fixed now. Thanks for letting me know!
Software Update #2: Finding the North Star
Let’s talk about AprilTags.
We gained a lot of valuable experience last year with full field localization using reflective tape (more info here), and we’re very excited about the opportunities that AprilTags provide with superior precision, robustness, and ease of use.
I also want to note before getting into the details that the code for this project is not designed to be “plug and play” in the same way as many of our recent projects. See the important note in the “Code Links” section below for more explanation.
We’ve opted to develop a custom solution for AprilTag tracking, which we’re calling Northstar. This project began in fall of 2022, and we have a couple of reasons for taking this route rather than using a prebuilt system like PhotonVision or Limelight.
- We used this project during the offseason as a training opportunity, bringing in some of our newest members. We were fortunate enough to have the time and resources to take this extra step without making sacrifices in other areas.
- A custom solution gives us a greater ability to optimize for our specific choice of hardware, both in terms of performance and reliability. We can quickly debug any issues we encounter or experiment with different approaches to improve performance.
Northstar is written in Python using OpenCV’s ArUco module. Image capture and MJPEG decoding is handled by a GStreamer pipeline. Ultimately, our plan is to have four cameras on the robot processed by four Northstar instances running on multiple coprocessors. Given this requirement, we wanted to make it easy to configure and manage multiple instances. A traditional web interface makes it challenging to keep track of each device’s configuration. Instead, Northstar relies on NetworkTables as its primary interface:
An MJPEG stream is available through a browser when necessary. All of the camera settings can be tuned on the fly, but the real benefit of using NT is that the default configuration is stored by the robot code and pushed to the coprocessors on startup. After retuning one camera, we just adjust the constant, restart the code, and every instance will be using the correct configuration.
Calibration is also managed through the NT interface, with topics used to start calibration and capture frames. The final calibration data is device specific and is therefore saved locally as a JSON file. Northstar uses a ChArUco board for calibration instead of the traditional checkerboard. This has the advantage of not requiring the full board to be visible, making calibration much more robust to disturbances like odd lighting (or a hand covering the board). Here’s an example of ChArUco tracking with Northstar:
We’re testing with a 2MP Arducam OV2311, which has a maximum resolution of 1600x1200 at 50fps. This camera is both global shutter and grayscale. Receiving a grayscale image allows us to remove the step from the pipeline where we convert from a color image. Interestingly, we still receive three channels from the camera but OpenCV’s ArUco module seems to deal with it correctly while having less of a performance impact than an explicit conversion. For testing, we have mounted one of these cameras to our swerve base:
Here’s a video of what the tag tracking looks like from the camera’s perspective:
We confirmed through early testing that having a global shutter camera greatly increases our ability to detect tags while moving (in combination with low exposure settings). Here’s an example of tracking tags with the robot spinning at full speed:
Below is a frame captured from the fastest part of the spin:
The tags have very little blur, and the image is not skewed from top to bottom like a rolling shutter would produce. The pose estimate from this frame is nearly as accurate as if the robot was stationary.
Coprocessors & Performance
From the start, we knew that there would always be tradeoffs between range, framerate, and accuracy. Those tradeoffs are affected both by the choice of hardware and the tracking parameters. We opted to start by maximizing resolution, because it increases our maximum range and decreases noise at low range. This is why we specifically chose a camera with a resolution of 1600x1200 (“1080p”) rather than the more common 720p variants.
Of course, running a camera at 1080p requires a powerful coprocessor. We began our testing with a Le Potato board. This was able to achieve about 10-15 fps at 1600x1200 under normal conditions. While perhaps usable with appropriate sensor fusion (pose estimation) on the RIO, it was far from ideal.
We have now switched to running on an Orange Pi 5, which is significantly more powerful. On the Orange Pi, Northstar can achieve around 45-50fps at 1600x1200. The maximum range for reliable tracking is about 30ft, with occasional detections up to ~35ft.
A nice consequence of our choice to use OpenCV’s ArUco module is that we have a lot of parameters to adjust in order to increase range or performance. However, we are currently running with the default parameters. For anyone curious, this means our AprilTag decimation setting is at zero. Overall, we feel that these default parameters provide a good balance:
- At low range, the noise in the data is very low. Within a meter of the target (as may be the case while scoring) the standard deviation for distance is under one centimeter.
- The maximum range is 30ft, which is already more than half the field. With four cameras, it’s likely that we will start to pick up closer tags at that distance. There are also quickly diminishing returns to having very high range because the measurement noise increases drastically.
- Having a high framerate allows us to quickly adjust our robot pose estimate while still effectively rejecting noise. See more details in the localization section below.
While we haven’t found benchmarks of other solutions at 1080p, the closest data available from @asid61 suggests that PhotonVision can achieve 22fps at 1280x720 with a maximum range of ~32 feet. Of course, this is far from an apples-to-apples comparison. We have certainly made different decisions about tradeoffs in performance vs. resolution vs. range, and we will continue to evaluate those choices throughout the season and adjust as necessary.
Localization & Pose Ambiguity
Northstar sends tag detections to the robot code using NetworkTables 4. NT4 has some nice benefits over NT3 that we’re taking advantage of, like message timestamps and queuing for subscribers. Even if multiple frames are received within a single robot code cycle, all of them will be processed (and with the correct timestamps).
For pose estimation on the RIO we’re using a custom implementation adapted from WPILib’s pose estimators. Our system is fundamentally similar, recording a history of twists for drive updates while applying vision updates with weighting based on measured standard deviations. We’ve made a couple of minor improvements:
- The drive kinematics are decoupled from the pose estimation, so our version of the class only accepts the calculated twist measuring a drive movement. This gives us more control over our swerve kinematics, like allowing us to run without a gyro in simulation (more information on that here). It’s also just our structural preference to isolate these calculations.
- When adding a vision measurement, we don’t discard older pose updates. This allows vision data to be added with out-of-order timestamps, which becomes increasingly likely with lots of cameras running at high framerates.
- Vision measurements from the same timestamp (multiple tags in the same frame) are handled cleanly. WPILib fixed the issue we were originally working around (allwpilib#4917), though we’ve also added logic to apply the vision updates in order based on their standard deviations. For each frame, applying the lowest noise measurements last effectively gives them a higher weight.
Both our pose estimator and WPILib’s classes require standard deviations for each measurement, plus state standard deviations that can be used to tune how much vision data is trusted in general. The vision measurement noise increases with distance, so we collected data for the X/Y (average) and angular standard deviations at various distances:
Pose ambiguity makes this a little tricky (if the measurement is swapping between similar poses), and the X/Y components in the field coordinate system are affected by the angle to the tag. However, this provides a decent starting point for manual tuning. We created a linear model for each dataset, then manually tuned the state standard deviations to converge quickly while rejecting noise. Our current state standard deviations are 0.005 meters for XY and 0.0005 radians for theta.
You may notice that our theta state standard deviation is very small, which means we’re trusting the current gyro measurement far more than vision data. This brings us to the issue of pose ambiguity when solving for the positions of the tags. At certain angles, there are two ambiguous tag poses; finding the correct one is critical to running a reliable localization pipeline. Here’s one example where four points can be solved as two very different poses:
Careful observers may have noticed in the previous videos that Northstar is always tracking two poses for every tag. Both poses and their errors are sent to the robot code; no filtering is done by Northstar because we need to be able to quickly retune and improve this part of the pipeline. Filtering in robot code also allows us to use AdvantageKit to test improvements in simulation using log replay.
Our primary strategy is to choose the pose with the lowest error if its error is less than 15% of the other. We chose a very strict threshold for this because we’ve seen many examples of tags with 15-40% relative ambiguity where the “better” estimate is still wrong.
For the remaining ambiguous poses, we started by always choosing the one that produced a robot pose closer to the current pose estimate. That was fine for simple cases, but started to become problematic at large distances because the vision poses became very noisy and started to overlap in range. However, we found that the reliability was significantly improved by comparing only the rotations of vision measurements to the current estimate. The robot’s gyro (a Pigeon 2) is very stable and ambiguous measurements appear to pivot the robot pose around the tag. That makes it easy to reject incorrect vision measurements because they don’t match the (much more stable) onboard gyro. This is also why our state standard deviation for the gyro is so low; we correct slowly based on vision (especially useful during startup) but mostly trust the current rotation and use it to disambiguate tag poses.
After all of that work, let’s take a look at the results. The video below shows us driving around with tags coming in and out of view. The translucent robots show individual vision updates and the solid robot is the final estimate.
For comparison, we replayed the log from that demo using AdvantageKit while ignoring all vision updates after the robot was enabled. The translucent robot shows the “pure odometry” pose and the solid robot is the same pose estimate as before that makes full use of vision data.
And here’s one last demo. This is using a simple PID controller to drive to a pose exactly one meter in front of the center tag. We turn the robot around so the camera doesn’t see the tags, offset the odometry by driving around, then flip around and immediately line up using the tag. Based on rough estimates using a tape measure, the final position of the robot is within about 1-2 cm of the target.
Several links to key parts of our code are provided below. Note that Northstar is designed for and tested only on the specific hardware we plan to use. We don’t plan to support alternate setups for Northstar, simply because we don’t have the capacity to assist anyone undertaking such an effort. Both PhotonVision and Limelight have fantastic communities of users and developers who are able to provide support.
- Northstar code (here)
- Robot-side Northstar IO implementation, reading data with an NT4 queue (here)
- AprilTag vision subsystem, producing “vision updates” for the pose estimator (here)
- Custom pose estimator class (here)
This is excellent!! Photonvision was experimenting with aruco detection a while ago, but found that maximum range was reduced compared to normal AprilTags detection. These results are very promising, and I think the comparison for 1280x720 is apt - the OrangePi can only process that at around 22fps as you mentioned.
If you can, I would like to see a 1:1 test by swapping Photonvision and Northstar SD cards and checking localization accuracy at 1280x720 and 1600x1200 at the same range - say, 10ft and 20ft. I would dig a 2x performance boost.
Keep in mind that when given an already grayscaled image open cv aruco will make a copy of the image rather than just return it right back (copying is a slightly expensive operation). Not sure how you could get around this besides submitting a PR.
Photon tried our hand at the ArUco module, but had some issues with detection distance and frame divisors that I didn’t have time to iron our before the season. Northstar looks great! I’d presume the performance would be a lot closer if we were using the Aruco module, as we saw 2-3x increases in fps when using it. Now that we see the detection distance is actually increased, I might take a second look at adding the Aruco module back in.
This looks incredible! May I ask what the performance times are looking like on the rio side of this pipeline? What’s the loop cycle time looking like when passing in multiple poses obtained from vision measurements?
Do you use a board/map of tags for your pose estimation or do you use each tag individually?
We’ve had consistently poor luck running PhotonVision in both 2022 and 2023, including on the Orange Pi. While we would love to get more consistent benchmarks, working through the issues to get this data just isn’t a priority right now. We may revisit this in the future though.
By far the biggest impact here is the pose estimation, because the full series of drive and vision updates need to be recalculated each cycle. This is actually another improvement we made with our pose estimator; a set of many vision measurements can be inserted into the history together so the latest pose is only recalculated once per loop cycle rather than for every tag. With a 0.3 second history (already far more than enough given the camera latency), we’re seeing an impact of about 1.5ms per cycle on a RIO 1. That’s with two tags in frame, though I’m not sure there’s a strong correlation given that the pose history is replayed only once regardless of the number of tags.
I’m not certain what you mean by this. We use the ChArUco board for calibration, which has lots of tags in a checkerboard pattern. This is an alternative to the simple checkerboard that is often used for camera calibration. But the ChArUco board is only used for the initial calibration and not on the field. We solve for the pose of each tag on the field individually, and those poses are fused by the robot code using our pose estimator class. We’re aren’t doing solvePnP with multiple tags at once akin to Limelight’s “MetaTag,” though we wouldn’t rule it out in the future. Also worth noting that the final pose estimation and disambiguation will tend to find the robot pose that best aligns with all of the data available even without multi-tag solvePnP.
If you have a chance to share your experiences with the development team over discord at some point when you have the time, we would greatly appreciate it so we know what we can do better.
This is incredible! Is your driving code based off of the poses available? That video is awesome
Here’s the command: RobotCode2023/HoldPose.java at main · Mechanical-Advantage/RobotCode2023 · GitHub
We just threw this together for testing; it’s three PID controllers for X, Y, and theta with no proper profiling or acceleration limits. Ultimately we’re planning to use on-the-fly trajectory generation for this type of alignment, but that’s still a work in progress.
Despite CAD progress being slower than desired, we have made decent headway. Over the past week we have mostly been focusing on the overall architecture of the robot with a few key decisions being made. The frame dimensions for the robot will be around 25” by 25” with the double jointed centered in the middle. For the intakes, we have changed our initial plan of using the same intake for both cubes and cones. Now, we are going to mount the flange grabber/auger on one side for the cones, and a separate single roller for the cubes.
Here is the link to the current cad document: Onshape
The first component of the double jointed arm, the static link, is mounted to reach 22” above the frame rail. The first moving link is 27” long with the second link being 33” long. The final stage of the arm (the wrist) will include the end-effector (what is now the claw intake).
The arm will be built to be able to pass through itself, enabling it to score and intake on both sides of the robot. This will be critical for the separation of the intakes for the cones and cubes to function.
Building the robot to be able to pass through itself has created a few interesting design and packaging issues which we have been working through. With these issues progress has been delayed; however, arm CAD is nearing completion with the plan to finish it during our next meeting.
The first joint of the arm is a MaxSpline live axle (optimized to be durable and compact). The second joint will be a funky live/dead combo. The final stage will be playing into REV ION’s strengths with a relatively simple dead axle. Many other systems have been proposed within the team, but we ultimately decided on this system due to its relatively simple design and robustness.
This past week we assembled our intake prototype and mounted it on the end of an arm. Although we did not mount the arm to the robot, we were able to test the functionality of the intake pretty well.
Following what we had prototyped, we had both sets of wheels constantly spinning inwards at 3v to keep the game pieces in the intake. With the motors running at 12v, intaking the cube seemed pretty touch-and-own, and the intake held on to the cube well.
We tested grabbing the cone at various heights. As expected, we found the best results when intaking near the wider base rather than the narrow top, although the intake was still able to hold on to the cone near the top.
Ground Cube Intake
We also prototyped a full width intake to pick up cubes off of the ground. The prototype consists of a roller, with pieces of angled polycarb behind it to guide the cube to the center of the intake, where it can be handed off to the claw intake. We plan to have this intake on one side of the robot, with the cone intake on the other, to allow us to pick up both game pieces off the ground.
Have you done testing on where the cube isn’t straight on? In our experience, its more of a touch and push away, which is why we moved away from this sort of design.
How is your team planning on passing the arm through itself, especially with a longer second link than the first? Do you have a screenshot of the front view of the arm (rotated 90 degrees from the one in the post)?
Loving that intake. Feels very 1684 2019, and probably makes packaging easier.
This end effector design has been very good with cubes from pretty much any angle both on the horizontal and the vertical. Here’s a video of us testing approaching from the corner of the cube:
I’m honored my 2016 photoshop work made it into one of your videos. Sorry I wasn’t clear, I meant like the cube is to the side of the wheel, offset from the center of the intake. In our experience with a robot driving into it not straight on it got pushed away after touching, I didn’t mean the actual rotation of the game piece.