What I’m about to say here may complicate your problem. But, I know you are dealing with a difficult problem and I’m assuming you have some very good professional help, so I thought I would add a few comments on what you may be missing. You probably don’t want to do this now, but take notes in case you want to do this as a later refinement.
All that is mentioned here regarding the image differencing is quite valid. However, your problem where the image sizes are different is easily solved by padding the images with black pixels so they are the same size. This has some other implications to your math, but it would normalize the images and remove your concerns about area.
However, I think you are focusing on diff-ing the images while missing your real problem: position tracking. While you can use the motion of the floor as an indicator of motion, there’s still going to be errors and noise, etc. Theoretically, you could image the entire floor of the pool, then use the image you capture and do a correlation over the whole pool image to find your location. But, that’s a huge amount of time, and a huge amount of data to manage, and you probably can’t do it on the hardware that’s been mentioned.
To Greg’s point above and Jared’s in the Optical Flow thread, you basically do feature extraction, establish geometries between the features, and determine the difference. That tells you how much the camera moved. The SURF algorithm that Jared mentions is a very robust generalized feature extractor that usually gives useful information. It’s just a matter of deciding what of the information that SURF outputs is what you want. Once you do, you’ll have the pattern of features. Then, you need to determine how to find the pattern (that’s highly application dependent…but could be as simple as a Hough Transform if you are lucky), and determine how far it moved. So long as you don’t move too much from frame to frame (like the square-in-the-image concept that Greg mentioned), you can determine the shift of the object in the field of view.
But, you aren’t dealing with a camera watching a moving object, you’re dealing with a camera ON a moving object. So, you get more information than just the motion of the scene. If you put an accelerometer/gyro combination on the sub, you get an indication of the motion of the craft. Jared suggests this as a means to do dead reckoning, but there’s a nuance to it: you have a second sensor. Now, it’s not dead reckoning, it’s sensor fusion as Gdeaver alludes to in the optical flow thread.
To apply this, you need to use a great but quite advanced tool called a Kalman Filter (yes, I’ve got other CD posts on this from a couple years back saying it’s overkill for FRC…but your problem isn’t FRC). It’s an algorithm to estimate values in noisy environment. It’s used in a lot of things like airplanes, ships, and spacecraft.
Basically, taking your differencing problem, you are going to establish a velocity vector at the end of it. Why? because you know the time between frames, you know the number of pixels the image features moved, you know the size of the pixels from the sensor’s datasheet, you know the magnification of the camera’s lens, and you know the distance to the floor of the pool. Grind through the math and you’ll turn the i,j motion in pixels to an x,y change in position over a known time and bingo…you’ve got velocity. This is pool-centric velocity, since you are basically measuring the pool.
But, you can take it a step further and do the dead reckoning that Jared describes by getting an acceleration vector from the accelerometer/gyro combo. Taking the vector the accelerometer gives you and rotating it using the gyro rotation vectors you translate craft-centric acceleration to a pool-centric acceleration. Then, if you integrate twice you get position. The problem again is that this is noisy. Well, you’ve got another measurement: the velocity vector from the pool. So, use that as the “measurement” part of the KF, use the dead reckoning as the “system” part of the KF and now you’ve got a Kalman Filter tracking estimate of position. It won’t be perfect, but it should be less noisy and more accurate than either the dead reckoning or optical flow alone.
Think of it this way: in the short term (on the order of your frame rate plus processing time), the dead reckoning estimates position. But, dead reckoning errors increase as you use them for longer and longer. So, your optical flow/image information “resets” the dead reckoning error back to a small number to compensate. That’s not exactly how the KF works, but it’s a good conceptual picture of what’s going on.
Again, I may have just complicated your problem by bringing this up. But, if you have experienced professionals helping you, then you may get to the point where you want to do something sophisticated like this. Yes, it’s pretty advanced knowledge, but it’s also a very robust method.