Any Interest in a Distributed Vision programme?

I’ve just started work on a programme which will distribute the load of vision processing across multiple Raspberry Pi’s (or any set of computers). This is roughly how it would work:

1st Pi: BGR -> HSV and HSV threshold
2nd Pi: Canny (find edges) & findContours
3rd Pi: Iterate through contours to find largest and report its centre with JSON

The first Pi takes in the mjpeg stream from the camera and produces an mjpeg stream of its output for the next Pi to use as its input, and so on. The 3rd Pi’s job could most likely be integrated into the 2nd Pi.

So far, I’ve only done the first stage. However, the results are already pretty dramatic, (at least in my opinion). The Pi is able to smoothly process 15 frames per second, with just about a second of delay at 320 by 240 resolution. Turning the camera frame rate down to 10fps, brings down the delay to about a bit less than half a second. Now I just have to get my hands on another Raspberry Pi!

I just wanted to gauge interest in such a programme for a possible release in the future.

Much of this was motivated through my hatred of nVidia and CUDA. (Their business practices are horrible)

I’m using OpenCV 3.0 via C++ on the Raspberry Pi’s.

We were getting less delay than that just using an Axis Camera running GRIP on the Driver Station last year.

What frame rate were you using on the camera?

Also, I’ve had my fair share of issues with connectivity on the field. I’ve even seen so far as the field volunteers blaming the robot code for disconnecting rather than admit that there may be issues with the FMS. We weren’t doing any vision at all, just driving.

That’s another reason for me personally to use a co-processor rather than the driver station.

P.S. Also, its cool to use a co-processor.

This is a cool project. I’ve always been interested in parallel/distributed computing. I just don’t really see the application here. Just to throw my two cents in here, today we were getting 40 FPS on a Beaglebone Black running a basic HSV/contour/math pipeline at 320x240 resolution.

I would think that transmitting the data between your processors would introduce more problems than it would solve. RAM is a lot faster than TCP/IP.

All for multiple ARM processors clustered to perform operations (it’s something I actually use not as part of FIRST).

A control system I helped propose to FIRST actually used a similar idea to expand motion and sensor processing using the Parallax Propellers stacked up.

Still have 400+ 68360 CPU cards in VME chassis in my garage arranged as a mainframe.

However I’d like to see a comparison of this against a NVidia or Kangaroo because it seems this is a very hardware intensive way to arrive at parallel processing cores in this situation. It would also seem harder to optimize the operations because you end up doing interprocess communications over actual sockets on physical NICs rather than by sharing an array in memory. I can easily see how first you’ll end up with collisions, then having to introduce hardware switches to your robot. Finally ending up with network tuning issues that will never be faster than the RAM on the ARM boards would have been.

I wonder how some of the posters in this topic feel about OpenCL consider the issues they have with NVidia’s Cuda. I’ve used OpenCL quite adequately and there is support for OpenCL (though less optimized) with some of NVidia’s product line. It’s kind of strange to suggest that NVidia’s business practices are horrible when both AMD/Intel via Intel Management Engine and AMD PSP have basically locked out the development of 3rd party boot firmware for the latest generations of processors which along those same lines does not lead to market diversity (note it is under the pretense of a security solution).

Hmm, finding contours from a 320x240 image shouldn’t be taking that long. What Raspberry PI are you using? There will be a big difference between 1 and 3. There are also more capable ARM based computers out there. We’re using the ODROID C2 this year for real time video streaming at it is handling 3 320x240@30fps and 1 640x480@30fps H.264 streams at about 50% CPU usage.

A couple of recommendations, based on what you’ve shared:

  1. Get frames from the camera in YUYV, not MJPEG. MJPEG requires addition CPU overhead while only saving USB bandwidth. Unless you have more than 4 cameras, this isn’t going to be a concern
  2. Downsample the image before processing it. Both JPEG and YUYV store color information at half resolution anyways, so if you’re doing hue based processing you’re not actually losing much here. Half the resolution = 4x faster processing
  3. Filter your image. You can use OpenCV’s morphological operations to filter out noise in a binary image. Little specks can happen due to noise and can cause a lot of performance overhead.
  4. Tune your OpenCV method parameters. A lot of OpenCV’s methods take parameters that impact runtime. Canny is a good example, wide threshold values will find a lot of potential candidates, but this means a lot of processing overhead.

Distributed approaches to problem solving can be really technically rewarding and have a big payoff, but be aware of how much additional complexity this introduces in the system.