We are using Java and OpenCV for our vision processing. When testing on a Windows PC, we get a good frame rate. However, when we run on a Raspberry Pi 3, there’s about a 5-second lag in the video stream, and we get 1 frame every 2 seconds. The CPU was running at only about 30%, and we had plenty of free RAM remaining (can’t remember the amount).
If you have succeeded in getting this combination working, we would be beyond thrilled to hear from you. We’ve tried everything that we can think of. On Windows, we’ve got what seems to be a solid vision program; however, if we can’t get our FPS issue resolved, we’ll have to give up and go blind.
Have you tried reducing the image capture size? Also, what camera are you using? Lastly, what type of processing are you doing? We noticed that the slowest part of our pi-based tracking was actually writing images to the SD card for later review…so we actually disable that in “match” mode.
Not sure what is going on exactly without seeing the code.
Wanted to comment on the 30% CPU part. This could mean lots of things depending on how you measure it. But since the Pi3 has a quad core it could be that 1 core is fully loaded (25%) plus another core has some load for the extra 5%.
So “only” 30% CPU usage doesn’t mean that you don’t have a single core loaded at 100%. Maybe. Again, depends on what tool you’re using to measure it. But don’t rule out being CPU bound until you investigate further.
Very good point. I should’ve thought to have looked more closely at that CPU stat to see how things looked at the core level. We’ll have to do that. Thanks!
UPDATE: We tried using OpenCV 3.2 (we originally used OpenCV 3.1). There was some improvement but nothing substantial. We dropped the resolution from 640x360 to 320x180 and saw no improvement. We’ve stripped out some functionality that was nice but not necessary; no improvement. Our CPU load looks to be reasonably distributed across all 4 cores.
We’re currently compiling OpenCV 2.4.13 to give that a try. After that, we’re pretty much out of ideas.
Not to look argumentative, but that does not look close to evenly distributed. They seems to be taking the load alternately, not concurrently. Even when there is concurrent load, that could easily be network or disk activity.
I see what you’re saying. I got target-fixated on the fact that all 4 cores were active and that none of them was running fully maxed out (although 4 does seem to have heavier load on it than the others).
Welp, apparently, RPi3s REALLY do not like Gaussian blurs.
We used the blur as the first step in our pipeline. It worked very well for “re-connecting” the pieces when the reflective tape beside the lift was divided by the peg because the robot was off to the side.
On my laptop, which is rather old and not too speedy, the blur was taking 20ms. However, on the Pi, it is taking 200ms. So, the blur is gone. We’ll compensate for it by tweaking our target candidate scoring.
While we’re going to continue to look for more ways to improve the efficiency of our code, I think we just might be in business.
Thanks to all of you for taking the time to try to help us out.
Side note: The other operations in our pipeline performed similarly on the Pi as they did on my laptop. Only the blur had the radically worse performance.
You’re quite welcome. We’ve learned a lot from others’ experience, and I’m more than happy to share ours.
With this year’s learning curve (since it’s our first year to attempt vision and a Pi), we weren’t able to run our code on the Pi until this week. In the future, we definitely need to get on the Pi earlier in the process to head off any of these nasty surprises ASAP.
Another lesson (re)learned is to put in println()s to find our most time-consuming sections of code as our first performance debugging step.
After removing the Gaussian blur, we also cut our resolution to 240p and trimmed our HSV threshold values to 2 decimal places. We’re now consistently running at 30fps!
I’ve done a vision program for tracking the peg using the RasbPi, which now works well with little delay. I did not use a blur, the pipeline was pretty standard. Multithreading is a viable option that I recommend using, as it drastically increased frame processing time, but it requires you to avoid the global state. I would be happy to help you get this working. Just let me know.
Also as much as I hate to say it, if you are going with the Raspberry Pi, which is fine unless you can afford a NVidia Jetson Tegra K1 or X1, but if you decide that’s the route you want to take you are going to want to fall back to Python. It’s the default language of the Pi and it runs fastest on it. I wrote my RasPi vision code in Python, and while I do dislike the language, but it was the best option available for both it’s well-documented OpenCV support and it’s support of NetworkTables for communication. As for programflow it’s basic; pull, filtering, detection, math, and output. Each thread runs this loop and works independently. The code automatically pushes the newest frame’s info.
What vision target are you going for anyway? The hopper is easier than the peg because the peg is two targets and that can be a mess. It can be done, but it’s kind of a pain from my experience.
Found we got 0.5 FPS with a Gaussian blur on an RPi3. Switched to another blur and it’s keeping up with the 15 FPS camera without blinking an eye. We’re using Java, BTW, and network tables work fine.
We reworked our target-candidate scoring and were able to simplify it a good bit. Removing any height-based tests took care of the issue of the peg intersecting the tape.
Multithreading is something that we started to look at once we discovered how poor our performance was on the Pi, but since we (1) were able to make other tweaks to get acceptable performance and (2) ran out of time (we compete in Pittsburgh this week), we had to put the idea on hold for now.
Thanks for your offer of assistance. I’ll talk to our programmer. We just might take you up on that, if not soon than perhaps after competition season.
I won’t bore you with the details, but our original plan had us using our Jetson and programming both the Jetson and robot in Java so that we were standardized on a single language for now. Let’s just say that reality decided to be disagreeable with our well-crafted plans.
We might have to take look at Python. Although I’ll have to overcome my moral objection to a language that defines code blocks simply in indentation.
We are going for the peg in autonomous. It was just too many potential points to pass up, plus the initial chatter from our mechanical and strategy members in discussing design and game play was to completely ignore the fuel balls. That has since changed, but by that time, we had a vision program that seems very solid in finding and tracking the peg.
I guess we’ll find out in a few days how well it works out on the field.
The GB definitely was the most expensive operation in our pipeline. I just was shocked by the huge difference relative to the other operations when run on the Pi. All of our other pipeline operations ran at around 1.5x of the time they took on my laptop. The Gaussian blur took 10x or more. Had it been even as high as 3x, I would’ve just shrugged it off.