UDP would work fine. We use raw network programming like that for a complex data structure between the robot and driver's station. We could use something fancy like Protobufs, CORBA, or other WSDL-type middleware - but hardcoding the encode/decode order of 4-byte values also works and is what we do.
As for vision itself, I have no comments
