Disclaimer: I’m not an expert on this topic, feel free to respond eith any corrections or observations you have made. I’m also not suggesting a specific camera, because it’s very use case specific.
With the prevalence of april tags in current games, a lot of teams are finding “D.I.Y” solutions like photonvision easier than hoping the limelight restocks. This usually means teams will have to decide what Camera(s) to use, and there are quite a few common misconceptions that should be addressed.
Frame Rate
Frame rate is obviously important, but is slightly overhyped. It’s good to have a high framerate (10 - 45 fps) but it becomes redundant as a result of the processing time and latency.
Latency
USB camera’s have latencies of 30-100 ms typically. Latency is a natural part of all vision systems, and comes as a result of higher resolutions and just time the camera has to spend processing. This is usually something you can design around in your code, but it’s important to consider because higher resolution cameras are typically higher latency.
Its worth noting as well that networktables has it’s own latency, (I believe at default it refreshes to be around 10Hz) so you can have anywhere from 1 - 99ms of extra latency from your networkstables.
While you shouldnt pick a camera based just on the latency, it’s important to consider higher resolution cameras may introduce further latency, which is a drawback.
Motion Blur
In the environments robots move in, motionblur is a huge problem. Motion blur occurs as a result of lower shutter speeds/longer exposure times. A high shutter speed camera is essential in reducing the effects of motion blur, or a global shutter camera can be used (although they can be expensive and annoying to use on different coprocessors). Global shutters have been shown to be more effective in dynamic systems see this study. They may be computationally expensive (both in terms of latency and current draw - I’ve heard of global shutters browning Raspbery Pis out)
AprilTags are monochromatic markers, and they operate on the ability to draw straight lines between the black and white points on the AT. Motion blur makes these lines basically impossible to draw, which stops the tag from being detected.
Motion blur can still be worked around however, and teams limited by budget should not invest in a global shutter camera when many of the issues of motion blur can be resolved with clever design.
Resolution (and Acutance)
Very interestingly, it’s been concluded numerous times that resolution isn’t as important as “video sharpness”.
Video sharpness is correlated to resolution, but is also a result of acutance, a property intrinsic to various cameras. This paper shows that the piCam is more accurate and provides a higher pose estimate in all cases than the more expensive logitech C270 camera (Both cameras were ran at the same FPS, the piCam had a lower resolution). This came as a result of the percieved image sharpness, as the sharper image allows for a line to more easily be drawn between the AT.
This is a problem for picking camera’s, because it’s hard to know what’s going to be sharper or less sharp (because its basically impossible to quantitatively measure).
The best we can do is go for a compromising resolution (1080p at the highest, 480p at the lowest).
So what should I go for in a camera?
All the previous info implies we should be looking for a 30-45 fps camera, with lower latency and a resolution at around 720p. Any qualitative information or data on the “sharpness” of the video stream should be taken into account when deciding what camera. High shutter speeds are generally ideal.
Be wary of various coprocessor and camera interactions, and make sure the camera is appropriate to your use case. A team looking to attach a camera to a turret to “lock on” a target, should look into a global shutter, whereas a simple “turn to target” command doesn’t really need one.
TLDR: 30- 45 fps, 480-1080p, lower exposure if possible