The 2012 Vision whitepaper explains on page 9 how to compute the distance from the target. We understand everything but the 56 pixel wide target rectangle (This is the actual quote):

The target width measures 2 ft wide, and in the example images used earlier, the target rectangle measures 56 pixels when the camera resolution is 20x240. This means that the blue rectangle width is 2*320/56 or 11.4 ft. Half of the width is 6.7 ft, and the camera used is the M1011, so the view angle is ~47˚, making Θ equal to 23.5˚.

My question is this, where did they get the target rectangle width in pixels (56 in their case)? After all, that isn’t a fixed number, it will change accordingly based on how far/close the robot is to the target.

The pixel width will vary with how far you are away from the target. What you know is how wide the target is (i.e. 2 ft). Using the math in the paper, you can convert the pixel with into how far away you are from the target.

Think about it from your standpoint as a human: the farther you are from something, the smaller it looks, right? With the camera and the math in the paper, you can translate the “how small it looks” directly into “how far away” it is.

The “56” number in the paper is just an example. They said “let’s say the camera shows the width of the rectangle is 56 pixles, then that means …”

We understand that that was used as an example, but how do we find that number? You have to have a way to find the exact number (you can’t use examples for the real thing). How does the camera “show” the width of the rectangle is 56 pixels? If I step 10 feet farther back, obviously the target rectangle will NOT measure 56 pixels anymore. How do find out what the NEW width is?

The width was measured from the particle that is believed to be a particle. It was actually the width of the bounding box of the particle. Those numbers come from the particle analysis report function.