Need help and suggestions for image comparison method

Things to note:
I am using C++, Qt, and OpenCV.

Background info:
I am trying to create a program (actually a method) in C++ that takes two (filtered to two color black & white) images from a moving robot, one within X amount of time of the other, then trys to find the amount of shift between each frame. It does this by overlaying the second picture ontop of the first, in every possible placement, and comparing the overlapping area, pixel per pixel.

I’ve gotten all of the bugs out (to the extent of my knowledge), with the exception of one. There is a method that takes two images, one from the overlapping area of each frame, counts each matching pixel, and returns a “score” of how similar the two images that were passed in. My issue with this is the “score” returned. This score only returns the number of matching pixels. If I were to give it larger images, it would return a larger score than if I gave it smaller ones, even if the smaller ones were more similar. I’ve already tried dividing the score by the area of the images.

The program does that for many different images to figure out which is the best match, which is dictated by the “score”.

I was wondering what the best way to calculate the “score” is, that would always give the pair of images with the most in common the best score, regardless of size or shape.

Here is the troublesome code:

*double ImgCompare (Mat Img1, Mat Img2, int ImgWidth, int ImgHeight) { //both images are the same diamentions

double Score = 0;

double ImgArea = ImgWidth * ImgHeight;

for (int Y = -1; Y < abs(ImgHeight-2); Y=Y) {

    for (int X = -1; X < abs(ImgWidth-2); X=X) {

        if (Img1.at<double>(X,Y) == Img2.at<double>(X,Y) ) {

            Score++;

        }

        X++;

    }

    Y++;

}

//printf("Score = %f

", Score);
return(Score);

}*

Sorry for the lack of indents

Would anyone like the code for the rest?

Are you using black and white or “color” (greyscale) or full color?

Your can do something like the average absolute value of the difference between the pixels. If you’re using b&w or “color” that should be easy to calculate. If you’re doing actual full color, then the computation will be a little more complex since there’s a few different options.

Incedentily, the increments of your for loops bother me: “X=X” really shouldn’t do anything.

And what was the problem with dividing by the area? That will give score a range from 0 – when no pixels match, to 1.0 when all pixels match. That normalizes the score.

I think the bigger problem is that your goal is to determine the shift, and counting matching pixels doesn’t always act as a good predictor of shifted distance.

Lets look at two examples.
If the image is simple … a black background and a smaller white square within it. If I shift the square one pixel to the right, two rows of pixels will contain errors. If I shift it two pixels four rows have errors. It seems to work well. Once the shift amount is greater than the edge size of the square, the score plateaus – not that helpful in determining shift amount.

In the second image, lets have stripes, like a picket fence. The stripes are two pixels wide with two pixels of black in between. Shift one pixel and half of the rows have errors. Shift two and all rows have errors. Shift three and half have errors again, four and no errors (depends on what enters the frame).

Anyway this difficult problem, and a simple score like you are computing works well in some cases, but less well in others.

After you’ve completed this approach, perhaps you want to try another based on feature extraction. It could find the largest particle, or the ten largest and compare positions of those. It could identify corners within the image and track those. You may also find it useful to read about Optical Flow algorithms. They are pretty sophisticated to implement, but they are pretty robust.

Greg McKaskle

Sorry about those incrementations. I did it to keep the loop indexed at 0, instead of the loop starting with an x value of 1.

The images are in literal black & white, as in the pixles are either black, or they are white.

When you divide the score by the area, I think (as in it’s possible there’s a bug elsewhere and I lied to you) the program then becomes focused on reducing the area to achieve a better score. I also tried reducing the area’s influence (Score = matches / Area/2 ) of the total score, which didn’t help.

This program is actually a dumb program, in that it takes every possible shift, and gives it a score, and whichever shift(shiftX, shiftY) has the best score is the shift that it considers to be right. It disregards all scores other than the best one, so it doesn’t matter if the score plateaus.

Images the program are working with have almost no pattern whatsoever (it’s looking at patches of moss on the bottom of a pool), so I’m not worried about things like you’re fence example.

This program is actually an optical flow program, and this was the simplest system I could think of.

I didn’t tie this to the post from Freddi. While this is good for learning the effectiveness of various algorithms, but this brute force approach will be thousands of times slower than more modern approaches. If you can provide some video sequences of the pool bottom, it would allow for different approaches to be compared in order to identify the performance and robustness tradeoffs.

Greg McKaskle

I was looking at other methods of achieving this, but I would still like to get this one working. As for performance, I found this system works just fine, only at a lower resolution, say 64px by 64px. I could also improve performance later on by looking at every Nth frame, and only checking the match hotspots.

I think I’ll try other, already done, forms of optical flow for now however.

Something you can try is to multiply the score of each match by an estimate of how likely that result should be. For example, if you think the robot should be moving slowly, then results that would imply large motions should be penalized.

One way to do this is to use normal distributions. For example, you might estimate that you should have moved 10 px forward since the last frame with a standard deviation of 3. From That, you can give an initial estimate of how likely having moved 8 px is vs 10 px vs 1 px. So the initial estimate would say 10 px of movement is likely and 8 px is reasonable and 1 px is unlikely because it’s 3 standard deviations away from what was expected.

I hope that makes sense.

One thing you may want to do is to plot your result as an image. You have image A, and image B, and you are skipping a few steps but are essentially computing image C=B==A and then summing all pixels in C. It may be helpful to visualize C and see if there are issues around the edges, or see if your algorithm does something unexpected. You may also want to compare the current approach to using C= |B-A| or similar functions that aren’t just binary.

Greg McKaskle

A technique we have used for years in broadcast is to invert one of the images and then add. The result is only that which is different. Counting that should be fairly easy. If you are comparing pixel by pixel the overall number shouldn’t change with image/zoom should it?

I’m confused. Why do you have images of different resolutions being scored and compared to each other?

I think what he’s saying is that to compare two images that are offset he’s just comparing the region of overlap. So if the upper left pixel is black in the first image and the lower right pixel is black in the second image, one of the combinations he would try would consist of only those two pixels. Then, since they’re both black there would be a 100% match.

They aren’t different resolutions. This is best explained using visuals.

Here is the first picture:


XXXXX
XXXXX
XXXXX
XXXXX
XXXXX

And the second picture:


YYYYY
YYYYY
YYYYY
YYYYY
YYYYY

Now, he is trying to see how far the robot has moved is a given direction. His unit of measurment is pixels that the camera sees. He makes this measurement by overlapping the 2 pictures in every possible way. Here are 2 examples:

(Z is the overlap between the pictures)


XXXXX
XXXXX
XXXXX
XXXXX
ZZZZZ
YYYYY
YYYYY
YYYYY
YYYYY

XXXXX
ZZZZZ
ZZZZZ
ZZZZZ
ZZZZZ
YYYYY

In the case that example 1 is the right match, the robot has moved backwards 4 pixels. If example 2 is the right match, the robot has moved backwards 1 pixel. To figure out which overlap is the right match, he scores the overlap based on how many pixels in the overlapping areas match color. So lets look at the overlap from the 2 examples:


ZZZZZ

ZZZZZ
ZZZZZ
ZZZZZ
ZZZZZ

Now, lets say just that all of the pixels in example 1 match up, but only half of the pixels in example 2 match. You would probably think “example 1 is the right match”, but it only has a score of 5, where as the second picture has a score of 10. That is where the different areas come in.

I may have went a little overboard with this explanation :rolleyes:.

Good Example. I shoulda put that up originally.

I hadn’t tried that yet. Maybe instead of using a solid estimate, I could use the current speed as a base point, and from there it would reduce the score for shifts as they get father away from the avg shift. Actually wait no. That would probably screw up the accuracy. I’ll just go with yours

What I’m about to say here may complicate your problem. But, I know you are dealing with a difficult problem and I’m assuming you have some very good professional help, so I thought I would add a few comments on what you may be missing. You probably don’t want to do this now, but take notes in case you want to do this as a later refinement.

All that is mentioned here regarding the image differencing is quite valid. However, your problem where the image sizes are different is easily solved by padding the images with black pixels so they are the same size. This has some other implications to your math, but it would normalize the images and remove your concerns about area.

However, I think you are focusing on diff-ing the images while missing your real problem: position tracking. While you can use the motion of the floor as an indicator of motion, there’s still going to be errors and noise, etc. Theoretically, you could image the entire floor of the pool, then use the image you capture and do a correlation over the whole pool image to find your location. But, that’s a huge amount of time, and a huge amount of data to manage, and you probably can’t do it on the hardware that’s been mentioned.

To Greg’s point above and Jared’s in the Optical Flow thread, you basically do feature extraction, establish geometries between the features, and determine the difference. That tells you how much the camera moved. The SURF algorithm that Jared mentions is a very robust generalized feature extractor that usually gives useful information. It’s just a matter of deciding what of the information that SURF outputs is what you want. Once you do, you’ll have the pattern of features. Then, you need to determine how to find the pattern (that’s highly application dependent…but could be as simple as a Hough Transform if you are lucky), and determine how far it moved. So long as you don’t move too much from frame to frame (like the square-in-the-image concept that Greg mentioned), you can determine the shift of the object in the field of view.

But, you aren’t dealing with a camera watching a moving object, you’re dealing with a camera ON a moving object. So, you get more information than just the motion of the scene. If you put an accelerometer/gyro combination on the sub, you get an indication of the motion of the craft. Jared suggests this as a means to do dead reckoning, but there’s a nuance to it: you have a second sensor. Now, it’s not dead reckoning, it’s sensor fusion as Gdeaver alludes to in the optical flow thread.

To apply this, you need to use a great but quite advanced tool called a Kalman Filter (yes, I’ve got other CD posts on this from a couple years back saying it’s overkill for FRC…but your problem isn’t FRC). It’s an algorithm to estimate values in noisy environment. It’s used in a lot of things like airplanes, ships, and spacecraft.

Basically, taking your differencing problem, you are going to establish a velocity vector at the end of it. Why? because you know the time between frames, you know the number of pixels the image features moved, you know the size of the pixels from the sensor’s datasheet, you know the magnification of the camera’s lens, and you know the distance to the floor of the pool. Grind through the math and you’ll turn the i,j motion in pixels to an x,y change in position over a known time and bingo…you’ve got velocity. This is pool-centric velocity, since you are basically measuring the pool.

But, you can take it a step further and do the dead reckoning that Jared describes by getting an acceleration vector from the accelerometer/gyro combo. Taking the vector the accelerometer gives you and rotating it using the gyro rotation vectors you translate craft-centric acceleration to a pool-centric acceleration. Then, if you integrate twice you get position. The problem again is that this is noisy. Well, you’ve got another measurement: the velocity vector from the pool. So, use that as the “measurement” part of the KF, use the dead reckoning as the “system” part of the KF and now you’ve got a Kalman Filter tracking estimate of position. It won’t be perfect, but it should be less noisy and more accurate than either the dead reckoning or optical flow alone.

Think of it this way: in the short term (on the order of your frame rate plus processing time), the dead reckoning estimates position. But, dead reckoning errors increase as you use them for longer and longer. So, your optical flow/image information “resets” the dead reckoning error back to a small number to compensate. That’s not exactly how the KF works, but it’s a good conceptual picture of what’s going on.

Again, I may have just complicated your problem by bringing this up. But, if you have experienced professionals helping you, then you may get to the point where you want to do something sophisticated like this. Yes, it’s pretty advanced knowledge, but it’s also a very robust method.

A sub in water is similar to the navigation of a quadcopter. Allot of work has been acomplished in the Quad comunity. http://diydrones.com/ is a good source of info. Go to the DIY DRONES store page look at sensors and you will find this. http://store.3drobotics.com/products/px4flow
Follow the link to the open source code on Github. There you will find all the code needed for the type of optic flow algorithm that you are looking for. The meat is at https://github.com/PX4/Flow/blob/master/src/flow.c
Notice in this code they take the gyro and correct for body rotation. In my other post I mentioned the problem of 2 solutions to a set of x and y’s. Look at how they handle this. The algorithm used may not be the best as it is getting a dense feature matrix from the camera in a quad app. As Jared mentioned your pool bottom is most likely going to give a sparse feature Matrix. Have you shared some of your pool bottom video? Could you post a link to some? People on this forum could help if they saw what you have to work with. This is all dealing with a captured digitized frame. To get really good results you may need to clean the image up optically first. A polarizing filter comes to mind but camera people have a whole arsenal of devices to play with optics. I don’t know were to get a hold of this knowledge. Today every one wants to do the clean up digitally. It is not always the best way.

All this is not going to work if your sub platform is not stable. I mean rock solid stable. If the the sub is oscillating pitching and rolling the errors will accumulate real fast. How stable is this years sub? When it is accelerated do you get rotation? Physically the sub has to be rock solid and this comes from sound physical design. The mechanical team has to get this right or all your work will only lead to frustration.

Quaternion. You will have to learn how to work with rotations using them and as mention some of the best sensor fusion algorithms use extended state kalman filters. Allot of math to master. This is not easy but if you master it save it for your senior college project. Its at that level or high .