It's a cool idea, but I suspect you won't have much luck. It looks like you were initially planning on inputting all the pixels, but the issue with that is you need a massive network and your output is sensitive to boundary conditions.
In your last post you've made the right first step in simplifying your inputs, but I feel at this point you've over simplified where a neural net won't really give you what you're looking for.
Have you thought about direct photogrammetric methods?
http://arch.ced.berkeley.edu/kap/dis...a-gopro3be-/p1
http://balloons.space.edu/habp/photogrametry/