My team is currently deciding upon whether to use a Custom Vision Pipeline or PhotonVision this upcoming season to detect AprilTags. What are the advantages and disadvantages of each? PhotonVision seems to be easier to set up (and I have already successfully gotten it to run on a Pi 4 I have at my house), but would there be any advantages to use a custom solution?
What features are missing in PhotonVision that you would like to obtain by creating a Custom Vision Pipeline?
Custom could potentially be more performant.
Custom provides a good learning experience (your time is most likely better spent elsewhere but if you have the people and the time (if you’re asking this question now, you don’t have the time))
Custom might allow you to do crazy cool stuff (looking at you 971) (PRs welcome)
Among other things.
I still think photon vision or Limelight is a much better use of a programmers time and has most of the features you really want.
Disclaimer: I do photonvision things sometimes
I think the 2 major reasons you’d go with your own custom vision pipeline are:
- Source control
- It’s better/performant than existing solutions
In what ways, and why, would it be better performant than other solutions?
Sorry I was a little unclear. I was saying the reason to go with your own would be IF it performs better. No sense going with your own if it’s worse.
Something to think about is even If your custom performs better than an existing solution like Photon or LimeLight, and that’s a big if, you’ll find much less support available if things go wrong.
Personally I prefer to be risk adverse on stuff like this, it’s just not worth gambling a season on unless you are 100% confident in your custom solution.
Something to consider is the amount of people-hours you have available on your team. It is a limited and bounded supply. Do you have time to code everything, bring the less experienced along, have a kick-butt auton AND make your own pipeline?
Unless you have already identified a deficit, and have ideas on how to address it; This seems like an off-season project.
I believe this is possible with PV and LL as well, it’s just your job to regularly export your settings and put them in a folder that gets committed to git.
My team, with me being the lead mentor on vision, are seriously thinking about this too.
For the last ~5 years, we have used a custom solution for processing the retroreflective targets. Pretty much every year, someone asked “why don’t we use a Limelight?”. The answers have been (not necessarily in order)
- Cost - $400 for a LL when we have all the existing pieces is pretty steep
- Learning - we generally have the students and have taught new ones how to process images
- Customization - each year we have gone beyond just detecting the color blobs and used the specific geometry and layout to try to refine the detection and location. (Can’t say we always succeeded, but we tried.)
This year, with Apriltags and freely available PhotonVision, it seems like the tradeoffs have changed. We are not going to implement our own Apriltag detector, so we would just be calling the standard library and computing pose within our custom framework. The algorithms will be pretty much identical to PV, and are not going to be changing (much) year to year, so the “learning” is going boil down to understanding the behavior of Apriltags. So we are seriously thinking about going with PV.
Fast version: probably not if you are using an FRC-typical coprocessor
I do a lot of professional machine vision video-handling projects, some embedded and some on major platforms using mostly gstreamer, OpenCV, and WebRTC. However, this will be my first time interacting with PhotonVision (PV), which I have tentatively selected for my team to move forward with.
I can think of two reasons why someone in a different position might consider going in a different direction. Even in that case I would do it as an alternative or bake-off.
- Coupling of the vision system and IMU (maybe odometry). You can use a high rate IMU to correct or guide imagery fed to your fiducial (april tag) detection algorithm. I am not sure if PV has this feature or cares to have it. There are other features like optical flow fallbacks and tracking through occlusion that I am not sure PV has implemented.
- Using the roboRIO, an unsupported coprocessor, or a different algorithm to do localization (NN, SLAM, LIDAR, etc.)
Most of the rationale for not using something else has already been stated but I will add two of my own. Video handling is easy to start and a pain to get right. There are a lot of camera settings that impact results. If you ever wrestle with unstable AGC, you might find it oscillates you right to the bottom of the river. Then there is the optimal pipeline where you have to both keep buffers in the right type of memory and have it get in some wonky state (more of a media handling problem). The other major rationale is that PV has made itself open source which makes me want to contribute or fork rather than replace.
True but I guess I didn’t fully explain what I meant. I was trying to keep it simple with a short answer. I just meant you control all steps of the code and the source code itself. You don’t have to wait from PV to update or if they break something you don’t have to fix it. You could obviously fix PV yourself cause it’s mostly open source but “effort” and what not.
It’s a double edged sword because you gain some things but you also lose some things. I personally would stick with PV/LL.