High Performance Vision Tracking

Over the last several years, Limelight has been the leader in high performance vision tracking, and rightfully so. Take a look at Einstein 2022. Every robot had two things in common: Swerve Drive (a subject for a different paper), and Limelight. Now, only 5 were using LL as the 6th robot was playing defense.
The purpose of this paper is to find out if the same performance can be achieved from a Team built and coded vision system. It also provides code to get you started and guidance on why, how and where to possibly improve even more on this code and hardware.
Please feel free to discuss this paper here. Please add suggestions, critiques and feedback, and as always, please keep it GP!

TLDR?? The answer is YES, it can match the performance.

12 Likes

Good stuff Bill!

Thanks Nick!!
I can’t wait to see teams take this and make it even better!

1 Like

Very good work!

A couple of thoughts:

You mention having a way to adjust the camera without a monitor. Could the configuration values for the pipeline be achieved using NetworkTables and then Shuffleboard or a NetworkTables client to view the stream and adjust the values?

image

Can we take inspiration from Grip and make the full pipeline available through NetworkTables so that you could even change the steps of the pipeline through it?

The processor would be externally programmable and you can abstract the pipeline away from the actual hardware so you could use a pipeline on a Jetson/RaspberryPi/Intel PC

I think one of the problems with Grip was that you’d have to change to code to make changes to the pipeline.

PS: The Intel MiniPC with OpenCL accelerated OpenCV gets over 200fps for a retroreflective pipeline at 720p.

You asked and answered your question. What is it you want from your readers? Maybe this?

Design a competitive product using others’ ideas such as

I’m being to be a bit irritatingly contrary because in my mind this is NOT true:

You touched on a better expression of LL as being used to measure others and I see others being some “better” and some “worse” whatever that may mean (my team has been both of those if I get to define the terms).

Sure LL has a lot of good features (many that we would have liked but did not do) but my team has had more features that have been equally or more effective at pointing at targets.

This past season we did change from our roboRIO or RPi running code (the same code ran on both) for competition #1 to the LL for competition #2 and it had nothing to do with LL being a performance leader or better at finding targets. LL might have somewhat over 50 fps and my team plods along at under 20 fps but that is not what effected our shooting performance.

Some of your assumptions are naïve or wrong such as brighter is better. I posted recently one way that is effective to ignore bright windows is to turn down the bright green LEDs. That’s why LL provides that control and why my team devised a clever and educational (and helped win an award) way to dim the lights before there was such a thing as LL. (And too bright washes out the green target when close-in.)

Our algorithms often have been more sophisticated than LL. As it turns out that sophistication is usually unnecessary as we had been assuming targeting in our terribly lit robot room with incorrectly colored game elements and the real arenas have not been so challenging. So why not trust in LL to provide the (minimum?) needed to play each year’s game. What more is needed?

And how are you fitting in with other similar targeting systems that are largely free?

You’ve put a lot of good thought into your project but I can’t figure out why and where you want to go with this and will anyone follow you? Sounds like reinventing wheel and underestimating what others have been doing?

I’m willing to personally work on almost any worthy project (including your proposal) but I need to see a better sales job of how this will significantly improve students and teams.

Reasons why to explore alternatives?

  1. Cost. LL is expensive.
  2. Unavailability. PhotonVision uses an OpenGL operation to do part of the pipeline. This is only available on the raspi-3.
  3. LL is based on the CM3 (last I read). These are also unavailable.
  4. I suggested once to add a ML pipeline to PhotonVision and was downvoted. IMHO its a similar operation. Instead of doing a HoughCircles, you use pytorch/tensorflow/onxx/openvino etc to run ML
  5. Maybe a different simplicity. I can run a mini-pc at 12v-3a that does vision, telemetry logging, dashboard serving and possibly some sensor fusion and vision all in one box.

So its great to see someone else pushing the limit and exploring and checking things on raspi-4 (these are occasionally available now) There is a chance of acceleration with this in opencv with a vulkan backend.

3 Likes

I was lucky and managed to stir up @signalfade so I’m the one who’s head is spinning - he or she knows a lot so I learn a lot.

My impression is that LL is a

If the FRC game evolves to warrant it, I certainly don’t mind supporting the purchase of the next generation of the turnkey, blackbox LL in whatever form it may take and whoever supplies it. LL isn’t perfect so updates in my image would obviously be desirable for me.

With my engineering management hat on I’m still confused about where we are headed with this project and what does my team need to do to not be left behind?

Of course, for my engineering/scientist hat persona this is really exciting and how do I get in on the ground floor?

What is the intention of @billbo911? Is it heading toward the next blackbox vision system like LL? Whitebox that teams do some days of work to compete well? Or glassbox where teams with one extremely dedicated student can get a competitive advantage?

Or is this just a tremendously ambitious, fun research project?

I don’t yet see what my motivation is to spend a lot of time thinking about reviewing this document (I read it; it’s meaty; it’ll take time to review well). The crass statement is phrased as “what’s in it for me?”

Thanks @signalfade for your thoughtful answer that I had to Google so many of the words :smile:.

1 Like

After reading the paper, I think the punchline I’m walking away with is: Nope, there’s nothing magical about a limelight. With the right hardware, a smart bit of software, and enough time, teams can replicate many of its headline performance metrics at a lower out-of-pocket cost point.

Thanks for the resource Bill!

Minor recommendation - upload the individual files to github, rather than a zip. Also a PDF rather than word document. It’ll make it easier for folks to read through it online, without having to download and unzip and install Office.

4 Likes

First off, thanks for the summation and suggestions. I will get those uploaded immediately.

I do want to emphasize, I have absolute respect for LL! That said, your “punchline” really does encapsulate the entire project.

I simply wanted to verify if the performance was achievable in a team built project.

If you didn’t catch it in the paper, I AM NOT A CODER. So, If I can pull this off, so can any team, and they can do it better than I can! At least that is my hope.

2 Likes

Thanks for sharing your concerns and thoughts. This is exactly what I was hoping for and expecting. A healthy debate/conversation from differing points of view can only help improve the understanding of those involved and those who follow along.

From your posts I only have a couple takeaways: 1) I am not promoting or selling any product. 2) Your description of a “fun research project” is spot on.

Ever since LL was introduced, I have been fascinated with it. @Brandon_Hjelstrom has done a marvelous job coding a product that has absolutely raised the level competition. I spoke with him about this project when I ran into him at the World Championships this year. He agreed it was a good project to undertake so that there were multiple high performance options available to the FIRST community.

What my paper provides is a guide how to assemble hardware to make a system that matches LL performance. This hardware is NOT the only option, just an example of what works. I also provided code that is only a starting point. I hope teams will develop way beyond what I provided. Sure, the code I provided will be sufficient for competition as it stands, but… If a team wants more out of it, or something completely different, they will need to build upon it. I invite them to. It will only raise their understanding of the process and hopefully their competitive performance.

2 Likes

Thanks for the explanation. I was concerned that you are starting a big public or semi-private project, someone maybe you were going to be the lead (not a bad idea since you are capable but I wanted to know that) and the project was going to generate a substantial amount of intellectual property. I wanted to know up front how that IP was going to be managed. I have no concerns after reading your response to me. Thanks

Sorry for not directly answering this question. YES, I see no reason the code can not be simply modified to work exactly as you suggest. It currently holds the tuning values, between sessions, in text files saved to the SD card or flash memory. There is no reason these values could not be loaded to NT, read from NT and saved to a file on the Rio, or back to the SD card. In each case, these tuning values could then be modified in real time from Shuffleboard.
One of the scripts I included does basically this, but uses OpenCV sliders in python to do the same.

So here is another question in general. If it’s been shown that 90fps is achievable on regular hardware without heroic measures… Why does Photonvision struggle at times to achieve the camera FPS even when much lower?

Might be worth an investigation.

Do you mean “similar” hardware (e.g. Pi3b)? And fairly heroic measures were required–the PhotonVision team put a lot of work into GPU acceleration to get the framerate up.

If I run photon vision on a windows desktop or linux machine it struggles to get past 15fps.

If I run a very similar pipeline in python with OpenCV even without OpenCL I can get the camera framerate without issue.

I have no doubts the effort it took to get photonvision at 90fps in a Raspberry Pi3.

It was tuned for RaspberryPi3, and it might be worth it to tune it for other platforms.

Without digging into PhotonVision, a system I am not familiar with, there are two things that I can think of might cause this. Is a USB camera used? The other possibility is, was multithreading used to separate the acquisition and processing threads?
Of course, manually setting the exposure incorrectly could cause this as well.

1 Like

I would love to see a “high performance” vision solution, which is not the LL (for cost and also pedagogical reasons). My team does not actually use a LL, but I have been meaning to play with PhotonVision but have not had the time.

Looking at your code, I have some reservations about whether you have actually met the “goals” of your test. In the main loop, you call cam.read() to get the next frame. However, that routine does not block, so it will return a frame immediately, but you have no guarantee that it is a new frame. So, you can think you are running at 1000 frames/sec, but you are really only getting 10fps of unique frames (for example).

Also, your example code is really only a simulation, and as such, it does not really prove that you can match the LL. Your contour selection, etc seems pretty primitive, but I suspect that would not really matter in terms of speed. The big missing part (that I can think of) is that you don’t output the images in any way. I guess you could run a targeting solution “blind”, but I think that would have some serious downsides. “All” current solutions send a marked up image to the drivers station, and I know that can be a load on the CPU.

100% correct. As mentioned in the supplement, there are alternative ways to actually guarantee you are only processing a new frame. I HIGHLY encourage the reader to take what I have provided and improve upon it. As for the actual frame rate achieved, the acquisition loop configures the camera to free run at it’s maximum possible frame rate. For the PiCamera v2, that is 90 FPS.
So, my challenge to you is: Prove me wrong. Re-write the code to validate exactly what the acquisition rate is. I left enough snippets in the code that should make that process simple and easy to implement.

Agreed, for the actual targeting code used for competition, I intentionally left the display code out. I find it is unnecessary. It uses CPU cycles and builds in delays. So, if you want the display, add it back to the code. Take a look at the 2 calibration scripts, they both display the image. If you enable the frame rate read code, you will see there is still plenty of CPU overhead even while displaying the images. Granted, this is a local image, not a streamed one. So, write the code if you want to stream the marked up image.

I never intended that the code I provided was a complete solution. In fact, I said multiple times that teams should expand on it. If you want specific features, PLEASE, feel free to add them!

1 Like

Slight edits to the scripts have been made to insure 90 FPS is obtained. Please download again to get these changes.

So, anyone found a good solution for cloning 1 bee link mini pc to another, you know, to have a backup.