Are you attempting to write “general” camera tracking code? With or without human adjustment / compensation? Is this for autonomous mode?
There are many situations you could be working toward that have different constraints. Perhaps you may find that you’re best served by a different solution for each mode. One potential example: In autonomous mode, you have a more or less known starting position, and the rules prohibit opponent interference. If you can shoot from there reliably with relatively simple camera aiming, then you have “figured out” autonomous scoring without camera depth perception.
Teams in Aim High often reported shooting from particular “sweet spots”, nice little areas on the field that the robot was tuned to score most reliably from. Obviously a different game, from the different size hoop to a very different game piece. But this is the kind of thing one might need to look into if, for whatever reason, camera depth perception isn’t reliable enough to work for your desired constraints.
Do keep in mind that this post is coming from the perspective of a member of a team that doesn’t have the expertise to do complex software control loops, so my instinct is to look for as many ways to eliminate a need for software as possible. Take this post as a reminder that there may be more than one way to solve the problem of reliably making shots with your particular design - not just camera depth perception. Whether or not you give that up, and when, is one of the many, many challenges of this particular game.