You'll need to know the distance to the target to make it work.
Pretend the above is a top view of the camera. O is the lens. Y1 is the camera sensor. y1 is the number of pixels off center you are. f is the focal length of the camera. Those are all known locations & values.
So this is a simple similar triangle calculation, if you just know x3.
The problem is that all the pixels along the green line outside of the camera will show up at the same point Q on the sensor. So without knowing the distance x3 you can't figure out how big x1 is (P could be at any point along the green line).
If you can determine distance x3 some other way (ultrasonic, some trig using the distance between the two vision targets) x1 is easy. But any error in x3 will also be an error in x1 so testing is in order.