Is there a blackbox available for all-in one FRC localization?

What I’m talking about here is a drop in solution that uses probabilistic localisation techniques and multiple sensor fusion that you can throw on your robot, plug in a network cable and it just spits out the current coordinates and pose of your robot to network tables based on the current years game field layout.

I know there are many things out there that do part of this process, like WPILib odometry with sensor measurements updates, we have limelights or photvision to detect targets. ML technologies exist for detecting various field objects.

I suppose what I’m looking for is what the limelight did years ago for vision targets. Before the limelight existed, we all setup a camera, with a green ring light, wrote your own vision processing code, and hoped that it worked at the event. Then the limelight came out, and we had a reasonably priced, drop in solution that basically gave you exactly what you needed to reliably detect a vision target. How many people still run reflective vision pipelines on their RIO now that limelight exist?

I feel accurate localisation that updates as the game progresses is at the same point now, we need a product that you can drop on the robot, plug in as many cameras as you want, go through some camera alignment setup process, plug in ultrasonic or laser distance measuring devices, give it an IP address so that it can talk to network tables, and put a some lines of code in your drive subsystem that dump your swerve module state info onto network tables so the black box can consume it.

Then have this magic black box, consume all this data, does its target detection, April tag detection, field element detection from Machine learning and give back perfect localisation values so that products like Pathplanner can accurately following paths at any points in the match.

Maybe even a system where the on robot video footage and associated odometry values from of hundreds of robots playing in early season matches could be uploaded to some server somewhere in the cloud and some magic Machine learning engine could continuously be updating to give better results. Then before you compete you download the latest ML engine to the black box.

Isn’t this what Tesla does with its self driving stuff. Footage from cars all over the world is constantly being supplied back to Tesla so that the image recognition of signs, roadside feature etc is being continuously improved.

Some sort of system where, the video footage with odometry values could be post processed on some cloud server.
A system where, based on the embedded odometry values and say having a camera placed in the centre of the robot at 45 degrees to left with height of 1 metre, the image in the camera should be ABC. Then train the engine to give back these coordinate when this image is seen. You could only use early match footage before odometry errors grew, but I feel if a system where people contributed there on robot footage for even just the first 15 seconds, a ML could be developed that related every possible pose on the field to an expected image that you would see from a camera at that location.

Surely there is a PHD in this for some university student out there that is a coding mentor for an FRC team.

I wish I was 40 years younger and had the mental capacity to develop such a solution. I know I’m not the person to develop this but I’m sure there is a person/team out there that is capable of building such a black box. I’m sure it could be very profitable as well.

6 Likes

I suspect there is at least one, and possibly multiple intermediate steps between where we are now and where you propose. Given the high likelihood that there will be multiple Apriltags on the field, and that they will be located in “useful” places; I suggest that a “black box” system which fuses IMU, and possibly odometry, with known position reset based on vision targets is the next logical step forward. Such things are happening at the highest levels of FRC, appliance-ing them would be a huge leap forward for a great many teams. the potential to “zero” your pose errors 2 or more times per cycle would be incredible.

then, depending on the game, the ability to detect and auto acquire game pieces (think balls randomly located on the floor of the field) would, I think, be more valuable than incremental improvements in positioning accuracy by ML (until the game arrives that needs sub-centimeter positioning accuracy, anyway!).

Combine those with the already demonstrated (by a few teams) ability to “shoot on the move” and you are very close to robots that can play autonomously for the entire match. And from there, probably not that far from robots that can autonomously beat human driven robots.

4 Likes

Tesla is probably not the best company to take inspiration from on this front…

7 Likes

Identifying how many degrees you are left or right of the bright thing is easy.

Identifying where you are at relative to the apriltag is an order of magnitude harder.

The ML infrastructure describe is several orders of magnitude harder.

While limelight was likely fairly profitable, I don’t think it would be profitable to bite off a project many orders of magnitude harder and assume it can be sold for enough money to recover the R&D effort.

19 Likes

What you’re describing is eerily similar to a Zed + a jetson but it is so cost prohibitive it has almost no team support.

Should be somewhat easy to do yourself though, if you can make and tune your own pipelines and use apriltags with it.

Also Tesla has made several very bad decisions to limit their ability to localize with astonishing inaccuracy and as such should not be treated as a gold standard, as they are now adding them back.

If this were productized it would be $1k+ per unit to be profitable

2 Likes

Ignoring the constant overpromising from a self aggrandizing hype man… The problems facing Tesla for FSD are drastically different than the problems facing you for localization in FRC.

The first big difference is in precision - Tesla really doesn’t have to be that precise. Side to side a Tesla Model 3 is ~6’ wide and lanes in the US are between 9 and 15 feet wide. If they are off by a few inches it doesn’t matter. (Model X is still just about 6" wider than the 3…) Compare this to FRC where a few inches is the difference between hitting your target and hitting a wall.

The second is the type of environment they operate in - Tesla has to build a map and localize at the same time, in FRC you don’t need to do this. If you are localizing off of landmarks (be they known features on the field or april tags) they are within some tolerance of accurate. It’s a simpler problem.

Eh, maybe. But it is largely a solved problem. The real trick is in compensating for the noise in all systems and getting more accurate predictions.

Eh, transfer learning into YoloV8 or YoloNAS isn’t too hard these days, you can do it in Collab pretty reasonably. Deploying ML models is a bit harder but PyTorch has a post about how to get 30FPS performance on a stock rpi 4.

Marshall can correct me - but there’s some issues with getting reasonable performance out of the Zed.


Personally I’ve had decent luck with the Luxonis OAK-D cameras. In my simulations using one I can get passable precision on pose estimation but the reality is that the locations of the tags is bad for localization purposes. If I was trying to localize on the 2023 FRC field I’d likely want my ML system to also spit out a few landmarks aside from the April Tags. Either that or I need multiple cameras and my simulation tools can’t currently support that (it’s on my todo list)

Moving out of sim into the real world is on my todo list, I’m also experimenting with quality of results I can get out of monocular cameras as that’s far easier to use for most people (and markedly cheaper)


Localization is a hard problem. If HQ wants to see us do it we need either tags everywhere or visually distinct landmarks at the very least. They also need to make it worthwhile. Right now most teams don’t need it - they can use wheel odom and it’s fine.

Having improved simulation capabilities would also be great - bonus if they can output video streams and whatever vision tools can operate on them. This was actually a space I was working on but I haven’t been able to get adequate performance even on my fairly beefy computer. (Strong hunch it’s something I’m doing) and I also just feel really unenthusiastic about writing against the WPILib socket interface given the documentation for it seems to be increasingly harder to find and idk the level of support for it. (Also, bluntly, the lack of ROI on building this stuff out…) I think building out simulation and test infrastructure is beyond most teams.

FRC PyBullet Camera Feed Tracking (Note, without the secondary camera feed I can get comfortable real time performance even with more complex models)

4 Likes

Not performance but definitely precision for the proposed solutions. You do need NVIDIA hardware though.

2 Likes

Agree “a solution exists”, but it requires thinking about roughly 10x the things you have to for a naive “turn toward reflective tape”.

True, but I’m referring to the solution OP indicated, where there was also automated collection/training of data observed on the field.

To be clear: My thesis is not that “There’s no way to do what OP describes” - there is, absolutely. Where I’m coming from is I don’t see this as a great startup opportunity - the technical cost to deploy and maintain the solution as described will outpace what teams are willing to pay for the solution.

The best path to execution is likely with limelight, and to build upon the ML platform they started but haven’t fully fleshed out yet. Otherwise, you’re rebuilding large chunks of their hardware infrastructure from scratch.

I know of at least one team that can navigate across the field and auto align on the poles using April tags. They use photonvision. It was code used for competition. 971 reportedly has a black box that checks the alignment of the April Tags and use the data to calibrate the Robot to the field.

First is using a better family of April tags for next year. I don’t know if it will get to the plug and play level of the lime light since it is a more complex problem, but I see something close in the near future.

2 Likes

It’ll get better. :slight_smile:

1 Like

It’s not great in low light. Not an issue with most FRC, but it cannot handle low light at all. Dark colors and reflective/transparent surfaces like the side walls of the field will also be hard to find. This is an issue of all vision based localization unless it’s aided by additional distance measuring as well. Smartphones even use this to help auto detect focus distance and better control your regular camera.

It will not run without a Nvidia CUDA gpu. Even the cheapest Jetson has this, but it’s about all 1 Jetson nano can handle on it’s own with a desktop OS running as well. The zed software is a processing/ram hog and when you have 2 or 4gb you’re really limited. Again in FRC not so bad, you don’t really need to run much more on it.

In ROS there are nodes for localization from laser, vision, Imu and Encoder feedback. All methods are fused together to get a combined Odom in the world space. That can then be used in the Navigation stack to drive the robot autonomously or give you an accurate position during teleop. Marshall and Zebracorns probably can speak to this better but that is a digital black box… Once it’s configured properly. Configuring ROS, teaching students how to use Ubuntu and it’s build tools is the difficult part

1 Like

I think the all in one blackbox in FRC would be difficult because you really need to include wheel odometry in addition to other observations and since every drivetrain is a bit different it would be hard for a hardware device to be aware of that.

Having said that the Pose estimator classes (both diffy and swerve) are very good and both limelight and photonvision do a good job of providing information for field pose.

Do a sanity check on the vision reading and update your pose estimator and you are there. So not an all in one but pretty close.

2 Likes

In the general case, yes. Where you are pulling up 6" away from and square in front of the one on the {game element} you were already driving to, considerably simplified.

Nah, sadly automating it is kinda involved. With a handful of labelled data augmenting it should be fairly automatable though. There’s also some efforts that can be made in simulation here. It ain’t easy though.

I’m not as certain. While Nvidia + Zedd may be a solution it certainly isn’t the only one. The primary constraint that has existed the last few years has been the chip shortage - it is difficult to design and produce a product if you can’t find reliable sources of chips. Or even build an open source solution if you can’t expect users to reliably source hardware.

I mean, why should external parties support LL efforts? They are a closed source privately owned solution. I have no beef with them but I learned long ago don’t work for free and strongly encourage people not to as well.

2 Likes

It’s used by all halsim plugins, like the Romi and XRP. I had to go ask where the docs were, but the protocol spec is here:

At this point, all the user has to do is map the measurement sources and voltage sinks to a physics model (the framework side can’t know their robot design). One way we could make it easier is adding physics sim to all our examples (not just a few of them like it is now), so users can modify a working setup.

Yeah, there was until about 2 years ago, a link to those in some of the simulation documentation. Since it had disappeared I got real nervous about building against a spec that may be being deprecated.

So, I did a lot of digging into wpilib sim over the summer and found it to be somewhat challenging when I put on my “I’ve a basic understanding of physics and math” hat (which, ok isn’t hard given my long history of being publicly bad at math). Examples would certainly help.

Warning, this is going to get WAY outside the topic at hand but simulation is somewhat linked soooo.

Constructing it, fairly simple, I found most of it to be fairly straightforward. I probably would have found a builder pattern to be somewhat more intuitive rather than remembering argument order, but with decent IDE support in a well typed language this isn’t a huge issue and it does keep errors from not initializing things happening at compile time which is a real nice thing.

new SingleJointedArmSim(
                    DCMotor.getFalcon500(1),
                    Constants.Arm.Proximal.GEARING,
                    SingleJointedArmSim.estimateMOI(Constants.Arm.Proximal.LENGTH_METERS, Constants.Arm.Proximal.MASS_KG),
                    Constants.Arm.Proximal.LENGTH_METERS,
                    Units.degreesToRadians(Constants.Arm.Proximal.U_LIMIT_DEG),
                    Units.degreesToRadians(Constants.Arm.Proximal.L_LIMIT_DEG),
                    true
            );

This is where I start to have issues. The function signature for setState is completely opaque and requires an understanding state space which I know a lot of students don’t have.

        sim.setState(VecBuilder.fill(Units.degreesToRadians(Constants.Arm.Proximal.INIT_DEG), 0));

If you wanted me to propose a solution to make this more clear - build a simple wrapper as a static method on simulation class. SingleJointedArmSim.buildInitialState(double initAngle, double initVel) which reduces having to remember what the state vector looks like.

But the big piece I find missing in simulation isn’t just the mechanism and physics side, it’s the visual side. It’s hard for me to output an mjpg stream of what a camera located at some point on the robot would see. In part because specifying the transform chain gets real hard. ROS has a lot of tooling built around this, and I’m very sad that the FRC Gazebo efforts seem to have gone nowhere.

We accidentally made the setState() overload that takes two doubles (angular position and angular velocity) package-private by omitting the access specifier (this affected one other physics sim class as well). That was fixed in 2024 beta 2.

I agree, and I don’t have a good answer for it. We’ve wanted a Field3d widget in Glass/SimGUI for a while, but we need more developers.

We used to have full Gazebo support; we removed the plugins because no one used them, and the sim was kinda jank (e.g., has Gazebo fixed wheel simulation?).

It’s stable and we have no plans to deprecate it, although we’ve switched the XRP to use a UDP protocol instead due to the processing constraints there. I’m not sure why the link disappeared from the simulation docs, it’s definitely still our recommended approach for connecting outside applications into HAL simulation.

The biggest issue with Gazebo is that they’ve promised Windows support for a decade and it’s never materialized. Even today their features page says “Make use of the Gazebo Libraries on Linux and MacOS. Windows support coming in mid to late 2019.”

1 Like

AdvantageScope has 3d field visualization. It might be worth looking into whether it’s feasible to add support for a virtual camera rendering back to a MJPG stream. @jonahb55?

Re: Gazebo on Windows (metaticket) · Issue #2901 · gazebosim/gazebo-classic · GitHub