Computer Vision without Image Processing

Say I have a gray scale image, 640x480 pixels, like this image: I am thinking about training a neural network, with the input of each pixel, to calculate distance. In my training data, I would load in thousands of images with corresponding distances… I think this would be really cool way of doing computer vision to find the distance to a target. The question arises as to how quick it is and computational heavy it will be. Thoughts?

The first thing that I think of if you’re just throwing whole images in is:

What is the topology you hope to use for your network? Once you know that, you can pretty easily calculate the number of operations (mostly just multiplies and adds) and get a rough performance estimate.

You’d probably want to train the network with two outputs, a binary output that tells whether a target is visible and a real valued output to output the distance. Otherwise you’re always going to get the network’s best try even when it’s looking at junk

There’s a bit of a black art in picking the types of nodes to place in each layer of the network. I’d suggest checking out convolutional nets for the initial layers, unless you’re going straight to deep learning.

If you’re looking for a fun project to use to learn about NNs, then I can’t encourage you enough (although you may want to start with some toy problems first). They are definitely a very cool computing paradigm. If you’re considering this as a serious solution for FRC, though, I would offer the following caveats:

Regarding performance, a NN will never outperform a well-hand-coded algorithm in terms of speed, and will likely be several orders of magnitude slower. NNs have found wider usage lately because, like other types of machine learning, they’re being used to implement algorithms that programmers have found it very difficult to discover on their own, but only in situations where data and compute power are highly available.

Collecting data manually is incredibly tedious. I assume that you were thinking of generating data using your existing hand-coded system. Realize, then, that your NN can’t be any better (statistically) than your hand coded algorithm that it’s being taught by.

It was just an idea I was playing around with. At our next meeting I’m going to collect lots of data and do a gradient descent algorithm with the inputs of image characteristics of the target, such as height and width in pixels, size, and center location. Hopefully I’ll be able to write a script that is easy to feed in new data so all you’d have to do from year to year is threshold for the target, then feed in data to the script and then plug in values into the equation it pops out.

It’s a cool idea, but I suspect you won’t have much luck. It looks like you were initially planning on inputting all the pixels, but the issue with that is you need a massive network and your output is sensitive to boundary conditions.

In your last post you’ve made the right first step in simplifying your inputs, but I feel at this point you’ve over simplified where a neural net won’t really give you what you’re looking for.

Have you thought about direct photogrammetric methods?

I sent that late last night, and I just want to be clear I don’t want to discourage playing around with anything that seems cool.

Neural networks were one of the things that got me excited about computer science a long time ago. If you want a really cool project to try with NN’s, I highly recommend this one:

The Jetson TK1 from NVIDIA is probably optimal for this. They’re already giving you the software, you just need to train it.

That looks like an impressive board. What do you mean by they are already giving the software?

I realize how slow it’d be. That would be a fun project to do, alas, college is starting up again soon and I do not think I’d have the time to do it before.

So, I’m sticking with regression for calculating distance :smiley:

speaking of which, I just got a simple regression to work that has an r value of >.999 involving the relationship between how high a target in an image and how far away it is. The next step is to incorporate camera tilt data from a gyro on it.