Vision on a Pi

We just finished our rookie season and we can see how important visioning is. Our cameras never worked at competition (they did work at school), so we were wondering how to run visioning on a pi machine. Also, how do we get started in doing visioning for targets? What camera/etc. would we need?

Pis are typically frowned upon because of the lack of horsepower.

Many teams use the Nvidia Jetson TK1 or similar mini computers. They have CUDA which helps a lot with processing speed.

Maybe the newer Pis are faster, I haven’t used one in a while.

Anyways, many teams use OpenCV for their tracking system. GRIP is a good tool to help you with vision tracking.

A popular camera is the Microsoft LifeCam 3000

I don’t agree with the lack of horsepower thing. Pis are more than fast enough to run your standard FRC OpenCV vision stack, especially if you downscale the image to something like 320x240, which is frankly all you need.

The Jetsons are nice, but they can be difficult to set up and power on the robot. As well, the CUDA capability is generally unnecessary for the vision tasks found in FRC, and is pretty difficult to utilize in code, especially if you’re less familiar with C++.

+1 on GRIP, it’s a very nice tool.

Lifecam 3k is always a good bet, some teams like things like the Logitech C920 or C310. I’m a fan of the ELP USB cameras, myself.

The JeVois camera is worth looking at for sure. It provides an integrated package of a low power linux computer and a camera. You have to do serial communication, which isn’t as easy as networktables, but it’s a great and compact piece of hardware.

At least in my - probably controversial - opinion (and as someone who’s done a lot of vision stuff), vision isn’t too important in FRC, it seems. It was near necessary for the 2017 boiler, and nice to have in 2016, but other than that it’s far from essential. While it’s a great goal to work towards, I wouldn’t sweat it if you need to prioritize something like motion profiling or getting control loops running instead of a vision stack.

However, for sure see if you can figure out your competition camera issue. Driver cameras are very useful. What were the sort of issues? Did you use a USB webcam or an Axis IP cam or something else?

I agree that vision isn’t priority number 1. Focus first on developing your programming skills in the following:

-Control loops: PID control, feed forward control

-Utilizing WPIlib’s command based robot framework. This is really important. You can’t write high level code without some sort of high level framework like command based.

-Code architecture and framework: separating your code into subsystems, utility code, commands, and so forth.

-Motion profiling and path following

If you don’t know what I’m talking about, ask questions here! People will be glad to help out and provide resources. IMO these are the topics you should concentrate on first before vision.

Edit: it’s usually referred to as ‘vision’, not ‘visioning’ :slight_smile:

Many teams have been very successful using an RPi with either a web camera or a Pi camera. Examples of code to do this have been posted several times, so do some searching and you are sure to find what you need to get started.

A couple things to consider:
Vision has two basic outputs, the image stream and targeting information.
Setting up the camera optimally to do both can be a bit of a challenge.

Absolutely correct!

I can vouch for the JeVois being a fantastic alternative! In fact, it is even a bit more economical to set up than a Pi.
Honestly, it is a bit more challenging to program if you are not already familiar with vision coding. That said, JeVois is introducing a GUI (JeVois Inventor) that addresses a lot of the difficulties and makes coding it much easier. In fact, it already has several code examples built in that can be modified for use in FRC.

This point needs a HUGE caveat! The necessity of vision is completely dependent on the game and field that FIRST releases. With careful inspection of the rules, and a bit of creativity, it turns out even 2018 had a really excellent use for vision. (Think ArUco)

We did vision this past year (for finding cubes) on the roboRIO. The best way to utilize vision is typically to take a single frame from the camera, analyze it to find your target, and then use a gyro and encoders to move to the target. If you do this, you won’t need much processing power since you don’t need to process a continuous video stream, just 1 frame. Additionally, you won’t have to worry about latency since you will be using the gyro and/or encoders instead of a delayed video stream.

I agree.

I was actually surprised that this year’s game had so few tasks that could be done better with vision. After 2016 and 2017, and with the way the FTC game has placed increasing emphasis on vision and Vuforia, I was expecting this year to take things up another notch in terms of vision related tasks being a part of higher level game play. I saw a few teams use vision well this year, but few gained any real advantage from it.

Having said that, I still feel that vision will be a good tool to have in your toolkit of skills. It may not be the most basic tool and if you have not mastered the more basic stuff, you should not distract your efforts with vision, but it is probably something you want to have as part of your long term to-do list. I expect we will see games that have vision elements in the future.

But, more than that, I think it is a great programming challenge for students. It involves both a highly technical sub-task of processing the image and extracting key information, but also involves integration with the rest of the robot programming and the overall strategy (what information do you need to extract from the image and what is the robot going to do with that information?). If the programmers in your group have the desire to give it a try, especially during the off-season, I say, go for it.

I completely agree. We got a prototype of a Vision program to identify cubes on the JeVois working this year, but we never used it at competition because we found that using splines to get to the two fence cubes was very accurate.

One thing to keep in mind is how you’re going to use Vision information to perform a task on the robot. Integrating this data is often considered the hardest part when talking about Vision. We are considering trying out some experiments over the next few months to somehow integrate Vision data to correct error while path following, so we’ll see how that goes.

For sure! Completely game dependent.

I’m curious about the ArUco thing, though. I was under the impression ArUco was solely for April tags?

If you’re just talking about SIFT/SURF/ORB/FAST/BRIEF/the flavor-of-the-day image homography algorithm, those can for sure be useful, especially with a Perspective-N-Point algorithm. Issue is though, they’re pretty darn finicky — you need a sharp, vaguely high res image from not too extreme of an angle, or Bad Things Will Happen. Compute horsepower is another issue — Jetson CUDA acceleration is a must if you want a usable framerate, but even then calling it fast would be pretty inaccurate, especially with the larger image sizes you need to have the algorithm work over longer ranges. While it certainly can have its uses, it’s not very fun to integrate onto a robot. As someone who integrated that sort of thing into our 2017 vision stack, you’d have to do some serious convincing (or bribery) to get me to do that again.

Though, if you guys managed to get it effectively integrated into your vision stack this year, I’d love to hear about it — saying that 2017 me was clueless is an understatement, haha.

I am not entirely sure what kind of vision processing you’re trying to do on a Pi, but I don’t know if I would recommend a Jetson - up to $500!! - for a team who has never done vision processing before. It’s simply overkill.

OP – I recommend you get a Pi, look into setting up GRIP. Once you have the image processing part down, you can start to send data over NetworkTables to the Rio from the Pi using PyNetworkTables. For example, this year on 340, we processed cube images using the Pi, calculated where in the image the cube was (on the X axis, 0-360, where 180 is the center), and sent that over the NetworkTables to the Rio. Then we had a simple “rotate until the cube is between 170ish and 190ish”. While we never wound up integrating it into our autos, we had it working in a day or so during build season. You can see our code on GitHub.

This was absolutely nowhere near full utilization of the computing power on a Pi. It’s not the most advanced vision code, but it was relatively fast and it worked. For a camera, we just used a Raspberry Pi Camera Module.

There may be an easier workflow to this by now, but this is particularly easy and it lets your programmers start to learn Python as well as Java, without being too overwhelming.

This is essentially what we do too except that we have a background thread analyzing frames at a rate too low to cause performance issues and consequently also too low to be the pid input. However, it does produce new targeting information plenty quick enough for us to link together multiple gyro and pid controlled turns/course corrections together to complete a single movement. Did not use it in 2018 but did use it for hanging gears in 2017 during both autonomous and teleop.


I agree with this. The importance of vision varies a lot from year to year.

In 2016, to score a high goal after crossing one of the terrain based obstacles in autonomous was extremely difficult without using vision. Even teams that had excellent autonomous driving software to get into a position near, or even locked against the face of the tower (330), still used vision once they got there.

In 2018 or 2015 vision was really not a very important ability because scoring didn’t need to be very precise.

It is super important to determine whether using vision is important or if there is an easier way to aim at the target. Games like 2012 and 2013 are examples of games where only certain strategies need to use vision.

In 2013, for example, full court shooters filled a niche role that benefited from vision alignment because of the long range (though many full court shooters still were driver aimed). Additionally, the vision system didn’t need to be super fast, because once aligned, a full-court shooter would just continue to shoot without having to move again. Having an amazing vision application like 987’s from 2013 is one of the coolest things in FRC, but in their case it wasn’t a *huge *advantage over a solid cycling robot. I will always think of 610 as one of the ultimate KISS robots to win the championship. Like many others, they would drive straight to the back of the pyramid, spend zero time on line-up or vision lock, unload four discs, and zoom back across the field.

Vision is a great tool, but it is important to evaluate critically whether it is valuable to use.

In regards to using the Pi, my team has developed vision for the Pi running OpenCV and streaming the points to the Rio over USB and would recommend the setup, though we didn’t use it this year. From our experience, one factor that turned out to be important is having a heatsink and/or fan on the Pi. Another important tip is to utilize the separate cores of the Pi to run each chunk of the operation on separate cores. This threading dramatically improves framerate by allowing the camera to capture another image without having to wait for the previous image to be processed.

I know the original post is questions about the PI. I will again +1 GRIP. Infact start now using your PC and whatever usb camera you have.

I have used GRIP on kangaroo mini pc and a ip camera with separate LED Ring. This is similar to a raspberry pi and camera system. The challenge was getting a reliable system. We had a routine that included relaunching the smartdashboard after the robot connected hopping the video feed would come up.

No mention yet of Lime Light. We used it this year with great success. Camera, co-processor (rasp-pi compute), led lights, web based configuration, all in one and soon GRIP support. It a super powerful and fast vision system. Many of the teams here in the Central MN Robotics hub are moving to it. This winter before kick off i will be hosting some Jumpstart seminars with Lime Light.

What kind of framerates were you getting with the Pi?
I liked using the JeVois over the Pi. We tried Pi-based vision for 2017 and it was very slow, on the order of 10-20fps maximum. By contrast, the $50 JeVois I bought last December was able to effectively process a 320x240 image at 61fps right out of the box. I have a (somewhat verbose) guide to porting GRIP code onto the JeVois in my list of white papers.

I guess my original post about the pi being slow was a little unfair. I used the first generation pi, so of course my code was slower.

I attempted 640x480 and got terrible framerates. Can’t remember what, but doing blob detection and outlining was just killing the pi.

A Gaussian filter is something not wisely done on a Pi, as I recall, as it can kill frame rates (from 15 frames to 0.5 frames per second for a 480p --IIRC-- image). Otherwise, an RPi3 seemed to handle a reasonable GRIP-built processing chain.

What kind of speeds were you hitting with an rPi 3, and at what resolution?
Anything other than a standard box blur for filtering seems to slow things down, although Gaussian wasn’t bad compared to the rest of my pipeline.

You can easily achieve usable frame rates with the Pi3 if you are considerate about your computations. Industrial machine vision for things such as bolt location detection for automated tightening run on boards weaker than Pi3. 2012, 2013, 2014, 2016, 2017 vision tasks are solvable with an Otsu threshold and an image moment, no edge detection or blurring required.

There are much faster boards than a RPi which won’t set you back the big bucks of Jetson. My team used a ODROID XU4 board this past season and were quite satisfied. The XU4 is about 3x faster than the newest RPi3, and the GPU is supported by OpenCV.

The whole setup for the XU4 (board, storage, power, case) is about $100.

I honestly don’t remember, but I do know it was either 15 or 30 fps (stable) at either 360p or 480p resolution. I don’t think it was 15@360, I’d be more inclined to bet on 30@360.