Hey everyone, I’m back again to post another crazy spooky thread about ridiculous concepts. I’ve been doing a lot of shower thinking lately about how fast you could get a closed loop vision system to perform with reasonable time and energy put into it. As the driver for our team, I never felt like our vision system was particularly slow, but I’m the person who’s always looking to optimize.
So I guess here’s the real question. Say you have a turret, shooter, and camera mounted on the front of said shooter. You’re using the target from 2020 and your turret can move at whatever reasonable max speed you decide. How do you optimize the parameters of the system to minimize the time between the camera seeing the target and being ready to fire, with a steady lock on the target?
Maintain target lock and shooter RPM at all times, rather than having a dedicated “aim-spoolup-shoot” sequence. Use drivetrain motion information to provide “feed-forward” information to the turret/shooter RPM/hood, vision for feedback.
Use lowest resolution possible in vision processing to reduce latency and max framerate.
If you processor is good enough: You can use subpixel-based pipelines to get some accuracy back from low-resolution images.
Ensure NT is running at 10ms. If that’s not good enough, try going to something custom with pure UDP, maybe?
Limelight or (insert open-source friends here) are probably about as good as you’re gonna get on the actual vision processing… Data at 90Hz with ~5ms latency is realllly good for this application. From the technologies I know of, Industrial vision cameras and FPGA’s are probably your next major step-up, but this quickly gets out of the range of “reasonable effort” or legal-cost components.
Shooter RPM recovery time between shots is also big. Two approaches: Lots of mass (to reject the dips in RPM), or lots of motors (to recover more quickly).
Finally, also don’t neglect your feeder mechanism - the faster you get balls into the shooter, the faster you’ll score… which is probably your real end-goal.
Choosing a fast camera is the first thing to do. The Pi Camera V2 and PS3 Eye can get higher than 90 FPS at 320x240. If you really care about speed you should probably go to e.g. a Jetson Nano. After that it’s all software. I’ve been doing some work on PhotonVision to GPU accelerate vision processing on the Pi 3 and CM 3. When done well it has some nice performance properties (i.e. it beats Limelight by a good bit on the same hardware at higher resolutions.) Be on the lookout for a post with some specific numbers and technical details soon.
There are probably 2 scenarios to consider: 1) parking and shooting from a stationary position and 2) shooting while moving.
The 3 variables in play are the turret angle, the hood setting (assuming you have an adjustable hood) and the shooter speed.
I will make the observation here that there are several types of hood adjustment mechanisms I have seen out there - motor driven, servo driven or pneumatic. Pneumatic is fast at getting to position, but has a limited number of positions. Servo driven is slow at getting to position. Most of the motor driven ones I have seen in reveal videos get to position rather slowly. So, it seems like the hood adjustment is probably the first variable you want to get to position. I will make the assumption here that you have 2 hood settings - one for close in shots and a second for longer range shots. So, I think the first thing I would do to reduce the ‘ready to fire’ time would be to set the hood position first and then hold it in one position and adjust the shooter speed to fine tune the shot.
In the park and shoot scenario, the turret can lock onto the target while you are still moving such that by the time you park, it is already locked on to the target. As you approach your parking position, the driver can activate the shooter wheel to spin it up to speed. Once the robot stops moving, you make the final adjustment to the shooter speed and you are ready to fire.
All of this can be done with a relatively low frame rate on the camera.
In the shoot while driving scenario, you would add a function to lead the target with the turret to account for the yaw velocity (the lead distance is likely a function of range to target, so this would be tricky to develop) and you would adjust the shooter RPM to account for the relative range closing speed (again, not a simple problem to tune). Again, you would have these adjustments calculated continuously and you would spool up your shooter wheel prior to being in range to shoot and then make continuous adjustments from there.
Since the rate of change of particular variables is of interest in this scenario, you might need a higher frame rate camera, but I still think that 30 fps would work to calculate closing rate as well as yaw rate such that you could calculate these adjustments. Since the hood is moving the whole time, your reference frame for the camera is going to be constantly changing which will make the math really challenging. You might be better off using odometry and a gyro to calculate closing speed and strafing speed rather than the camera.
I believe SPI Gyro and encoders measured by the RIO’s FPGA will be the lowest-latency options for these.
However, even with latency - at a super high level, you can build up a queue type structure where each sensor inserts timestamped data about observations of the robot position, and insert hand wavey kalman filter math you can get out estimates of robot position/pose at time X (even if X is in the future). Using that estimate, you drive commands into the turret subsystem to keep it positioned properly at all times (where “properly” accounts for velocity of the ball at the time of ejection and keeps its ballistic trajectory going through the goal).
The linchpin to all of this: All sensors report their data against a common timestamp (probably the FPGA timestamp?). Any skew or delay in how the data was actually captured versus when the timestamp says it was captured will limit the ability of the system to accurately estimate state. So, from an architecture perspective, getting this timestamping action as correct as possible is foundational to building the powerful state estimation logic.
Having a consistent and well-characterized drivetrain will make the insert hand-wavey kalman state space buzzword math here easier.
We have had good luck using swerve drive motor encoders and steering motor encoders to calculate X and Y position and velocity. We use the NavX gyro and I think that would be low enough latency for these calculations as well.
My opinion is that optimizing the shooter speed of a bot that shoots stationary may well be optimizing the wrong thing. If it takes you 15 seconds to collect balls and get into position to shoot, then whether it takes 0.5 seconds or 1.5 seconds to shoot your balls is rather immaterial. Your first task should be accuracy – getting from averaging 4-in to 5-in would be a far greater pay off than shaving off that 1 second). Once you are deadly accurate, then other ideas would be worth exploring.
I would think that the giant leap forward would for the bot to stay locked on (and recover lock when lost) and simply shoot on its own whenever it has a ball and is locked on and in range.
You don’t really need anything more than a Limelight/Gloworm for keeping a lock on the target or even shooting while translating and rotating; there are several neat software tricks that you can use.
Keep a backlog (approx. 1 second or so) of your robot pose (from odometry) and your turret angle. Whenever new data comes in, based on the latency, you can query your robot’s pose and turret angle at the time of frame capture and calculate the field-relative pose of the target. This is the code we used this season:
if (doesTurretLimelightHaveValidTarget()) {
// Calculate image timestamp.
val timestamp = Timer.getFPGATimestamp().seconds - turretLimelight.latency
// Get camera-relative goal pose at timestamp.
val cameraToGoal = Transform2d(
getDistanceToGoal() * cos(getAngleToGoal().value),
getDistanceToGoal() * sin(getAngleToGoal().value), Rotation2d()
)
// Get field-relative goal pose at timestamp.
val fieldToGoal = Drivetrain.getPose(timestamp) +
Turret.getRobotToTurret(timestamp) + VisionConstants.kTurretToCamera + cameraToGoal
// Add goal pose to GoalTracker.
GoalTracker.addSample(timestamp, Pose2d(fieldToGoal.translation, Rotation2d()))
}
Our GoalTracker class performs some filtering to make sure the target doesn’t jump around too much. Furthermore, it “remembers” the target for about a second or so even when the Limelight doesn’t see the target (perhaps the turret is rotating 360 degrees after running into a hardstop) so that we can continue aiming. This means that the refresh rate of the vision processing doesn’t matter as much – what matters is that the data is accurate.
Note that all the geometry classes required for this math are in WPILib.
Because you have the goal pose in field-coordinates, you can simply use that goal position relative to your current robot position to determine your aiming parameters (i.e. turret angle, shooter speed, etc.)
If your robot is translating and rotating while you are attempting to lock onto the target, you can use that velocity data to generate an adjusted turret velocity that effectively “cancels” out your robot’s motion. Obviously this is harder than the case where the robot is stationary but should still be doable with some vector math.
One “problem” regarding this method is that we adjust the goal’s field-relative position based on vision data – but the goal position is technically fixed on the field! This is not a big deal because as long as our “calculated goal pose” relative to our current robot pose is accurate, we shouldn’t have issues.
The “correct” way to do this is to assume the goal pose is fixed and adjust the robot’s pose based on vision measurements. The upcoming state-space pose estimators will automatically take care of this for you:
All you need to do is provide encoder and gyro measurements (as you would for odometry) and pass in any vision measurements that may come in (at any frequency) along with the timestamp. The estimator will automatically perform the latency compensation I described earlier (i.e. keeping a backlog of poses) and adjust the robot’s pose based on your measurements. Now you can just use the fixed goal pose (from the game manual) relative to your robot pose from the estimator for all your aiming.
TL;DR: the camera latency will be your limit. Most camera will have 10s of millisecond of latency. Typical USB cameras have 50-100ms of latency. Even industrial camera, which are too expensive for FRC, will have too much latency. Also, note that there is no guarantee that the latency is less than the frame rate.
I tend to agree with @gerthworm on this one. Rotating the turret into position can easily take several seconds, adjusting the hood can take a couple of seconds. Bringing a shooter up to speed can take significant fractions of a second. The 100ms of latency is not going to be the limiting factor on how quickly you will be “ready to shoot” unless you come up with a solution for those other things.
Assuming you have to wait for 0.1 seconds after you come to a stop before you have your final fix on your target, that may be the thing that dictates when you are “ready to shoot”. But the range to target that you got 0.1 seconds earlier as you were coming to a stop is probably close enough to adjust your shooter settings and start shooting before you get that final fix.
Now, in my “shooting while moving” scenario, the latency of the camera could throw you off enough to be missing your shots because you are aiming at a target that is too old. But if the latency is consistent and you tune your offsets taking that latency into account, then you probably would not miss shots because of the latency.
In my model of the world and OP’s question: I definitely disagree.
I’m assuming OP is actually asking about minimizing time from operator-button-press to gamepiece-scored. You could also interpret the post as button-press to ready-to-launch.
In both cases, the mechanical delay of motor/gears/mass systems is the main latency component (hundreds of milliseconds), not camera (10’s of ms, to use your numbers).
I didn’t intend to make this assumption, and am not sure how it would play in. The relevant number to account for is total time delay between image-capture and processed-pose-data recorded in your queue.
I’m curious how even a multi-thousand-dollar vision system with microseconds of latency is “too much” for a mechanical system which takes ~100ms to servo to position.
I suspect that we’re just talking about different definitions of OP’s “system” and that’s the root of our difference in claims. Regardless, I definitely am interested to understand what led you to your statements.
Sorry, yes, I think we are thinking about the question in different ways. I totally agree that if you think about button push to shooting, then yes the vision system with typical latency is not the bottleneck. I would hope that you could aim the turrent and tune the hood faster than “a few seconds” (ours did, although range of motion was small) but the mechanical parts are certainly slower than getting the lock from vision. My original reading of the OP was closer to running a PID loop, ie how fast can you track sudden changes, which I grant is probably not correct.
My comment: “Also, note that there is no guarantee that the latency is less than the frame rate” was because of the early post noting that you can get 90 FPS cameras(*). I just wanted to make sure that people don’t think that going to a fast frame rate camera beats the latency. Until you measure the actual latency (it is rarely published), you have to assume that you will wait ~100ms before you get results for “now”, whether it is a 30fps or 90fps camera.
For everyone who thinks the vision and control system isn’t the bottleneck, here is my challenge: demonstrate this is true. We built our vision aiming testbed to minimize mechanical latency between the control system and aiming accuracy confirmation. It’s mostly 3D printed, so can be built for little cost other than print time and stuff you already have laying around your shop. So, build one and post video of the laser spot holding position on a vision target while you push and spin the base around a flat floor. If you can do that, move the turret to a robot base and demonstrate your ability to hold target lock. If you already have a turret aiming system you think is accurate, put a vertical piece of tape from the vision target to the floor, put a laser pointer on your turrent and see if the laser spot remains locked on the tape stripe.
In my many years in FRC, I’ve heard many people talking about how easy this should be, but have never seen a single video posted demonstrating successful implementation. Doing this successfully has become one of my personal “white whales”.
That is a near impossible test to pass depending on the width of the tape and the distance to target. Assuming the tape is 1/2”, and you’re 10 ft from the target, that’s an aim accuracy of +/- 0.1 degrees. Assuming an external disturbance of just 30 deg/s (very slow for a robot!), your robot will move 0.1 degrees in just 3 ms. No FRC control system can respond that fast. Returning to that accuracy is challenging enough, but “keeping the laser on the tape” continuously under external disturbances is unrealistic.
OK, but the question is vision OR control system. So, with your test rig, which has proven to be the bottleneck; the latency of the vision system providing accurate pose of the target, or the control systems ability to respond to that pose?
FWIW @ToddF I think we’re not describing the same things:
As I understand it, OP is looking for system design goals and tips for creating the “best” system to deliver gamepieces to a goal.
My thesis is that in an FRC robot that does the basics well, I believe spending time optimizing vision processing logic and computing hardware will not be as beneficial as pursuing other mechanical hardware improvements. But, I provided concrete examples of both to pursue.
Your thesis seems to be focused on the idea that designing a functional vision system to meet a particular disturbance rejection requirement is difficult. I do concur, it is. But that wasn’t the question, as I understood it at least.
build one and post video of the laser spot holding position on a vision target while you push and spin the base around a flat floor. If you can do that, move the turret to a robot base and demonstrate your ability to hold target lock.
Doing this successfully has become one of my personal “white whales”.
I’ve actually done this with successfully, but with VSLAM instead of retroreflective tape vision (warning: squeaky turret):
In this demo I’m using a library I wrote to run the VSLAM camera directly off the roboRIO; no coprocessor is required.
One thing you aren’t seeing in the video here is what is the average error from target center? You can’t see how off center the turret is at any given time.