After looking into a few of the vision solutions (briefly for each), it sounds like OpenCV on a daughter board is the most popular. However, I would like to know the pros and cons of the three “main” solutions: OpenCV, NI Vision, and RoboRealm.
Which has your team used in the past?
Which worked best?
Which is the best for beginners in vision?
Which one is more accurate?
I’m finding that RoboRealm is awesome to use! And quite simple. The only down-side (a pretty big one), is if we want good results, we would need an onboard laptop to run it.
I’ve played around with OpenCV very briefly and the first thing I’ve noticed is that there are very few tutorials out there for FRC specific tasks (past challenges, etc.) It seems very “learn-by-yourself”.
We’ve used NI’s Vision Assistant before, but we didn’t end up using it. We didn’t really explore it that much.
Finally, to leave off, I know that GRIP is coming soon, and I’ve played around the Alpha stage, and it seems pretty legit! What I don’t know is how it will be used in FRC. Will it be a script that runs on the RoboRIO or Daughter-board? Does it run on a laptop with NetworkTables? They say they will release the final version “in time for kickoff”, so maybe we just wait until then.
Anyway, to make a long post short: we want to start vision, and I, as the mentor, have played around with all of them at least for a little bit, and I would like to know what other teams do.
I personally believe that running vision on the driver station laptop is simply the best way to go. Out of all 4 years I’ve done it I have tried:
a,) running vision on the crio in java (2012)
b,) LabVIEW on the driver station (2013)
c,) opencv on an onboard processor (2014)
d,) pcl with a kinect and onboard processor (2015)
While the most efficient (in terms of ability to do vision processing) methods have been d followed by c, the most efficient method in terms of programmers time used has by far been b.
It’s easy to set up, and NI has an extremely easy to use library that will suffice for anything frc will throw at you.
For personal development, go ahead by all means spend more time making an advanced system, but generally the most useful method is the one already built in.
Our best success has been with OpenCV running on a raspberry pi, then just sending coordinates of targets to the RIO. This had several distinct advantages over our 2012 implementation of vision processing on the cRIO in java (and trying to do vision processing on the driver station was a disaster for us because of network congestion):
Much less network traffic
Programmer could take a test platform home with no impact on team progress otherwise; this was particularly helpful the first two years when we could only afford one control system.
No lag introduced into drive experience (perhaps a side effect of #1)
Very loose coupling of vision processing with robot control; we reused the 2013 vision processing code (and hardware) in 2014 with no changes.
For those of you that run vision off of the driver station:
The laptop needs to have enough “horsepower”, correct? It can’t just be a simple “classmate-like” laptop?
Was there a lot of lag introduced between what the robot saw and reacting to it (since the image had to be transferred over the network, processed, then sent back)?
It totally depends on what you are doing. The more complicated the vision code, the more you stand to benefit from a more powerful laptop (CPU speed is the operative specification here).
There was typically about a 100-300ms lag between the start of image capture and receipt of the processed result on the robot when I last did this in 2013. Some of this is due to transmission time in both directions, some is due to processing time on the laptop, and some of it is because image capture itself is not instantaneous (an issue that affects all processing methods).
That amount of lag can either be disastrous or a non-issue depending on how you are using vision. As a mental exercise, compare the following two approaches for turning your robot to face a vision target:
Approach 1:
while(true) {
Capture camera frame
transmit frame to laptop
detect target in image
compute a drive turn command to place the target in the center of the image
send command back to robot
execute command
}
Approach 2:
while(true) {
Capture camera frame
Record robot heading from gyro at the moment the frame was captured
transmit frame to laptop
detect target in image
Send the heading angle to the target back to the robot
Add recorded gyro heading at time of image capture to the heading angle sent back by vision code
Compute drive turn command to turn to new target angle
execute command
}
Which approach would you expect to be more robust to variations in latency?
That makes sense. I guess we don’t need to process EVERY frame. We could just have a button that has the robot process the current frame and send the robot the correct commands to line up with the goal.
Even if you do want to process every frame, the idea is that you can deal with latency by saving a snapshot of the relevant robot state at the time the image is captured, do your processing on the image to obtain some result, and then use the saved state, result, and current robot state to obtain a modified result that is synced up with the present.
I’d highly recommend setting up a test for latency. I’ve done it using an LED in a known position. The roboRIO toggles an LED and you measure how long before the camera and vision system sees the LED change and the roboRIO is notified.
A simpler test of this is to make a counter on the computer screen. In LV, just wire the loop counter (i) to an indicator on the panel. delay the loop by 1 ms. Then point the camera at the screen. Place the source image and the display of the captured image side-by-side and take a picture of it – cell-phone camera or screenshot. Subtract the time in the display from the time in the source for an idea of latency in the capture and transmission portions of the system.
The reason for this test is to learn what affects the latency and how to improve it. Camera settings such as exposure and frame rate directly determine how long the camera takes to capture an image. Compression, image size, and transmission path determine how long to get the image to the machine which will process it. Decompression and your choice of processing algorithms will determine how long it takes to make sense of the image. Communication mechanisms back to the robot determine how long it takes for the robot to learn of the new sensor value.
An Atom-based classmate is really a pretty fast CPU compared to a cRIO or roboRIO. Plus, Intel has historically done quite a lot to help image processing libraries efficient on their architecture.
Any computer you bring to the field can be bogged down by poor selection of camera settings and processing algorithm. Similarly, if you identify what you need to measure and what tolerances you need, you can then configure the camera and select the image processing techniques in order to minimize processor load and latency.
Also, you may find that the bulk of the latency is in the communications and not in the processing. The LV version of network tables has always allowed you to control the update rate of the server and implemented a flush function so that you could shorten the latency for important data updates. Additionally, the LV implementation always turned the Nagle algorithm off for its streams. I believe you will see much of that available for the other language implementations and you may want to experiment with using them to control the latency. Most importantly, think of the camera as a sensor and not a magical brain-eye equivalent for the robot.
You were all very helpful. I think we’re going to wait until kickoff to see exactly what type of vision processing is required (more simple, or more complex).
However, I’m really liking RoboRealm and the Labview vision solutions. As for latency (again, depending on the challenge), we might just have a button on the joystick that will “auto-aim” (process the single frame), rather than constantly processing.
I would still really like to hear a bit about OpenCV, but I’m getting the feeling that it will be a bit more complicated. Has anyone used a Raspi and the USB interface to transfer the data to the RoboRIO? (Preferably with LabView). I’m not sure how you would ready from the RoboRIO’s USB port.
I would not use USB to transfer data. You’re going to end up having to emulate another device to get that to work correctly though it’s an interesting thought. USB is really meant for peripherals and not co-processors. Ethernet or one of the myriad of serial interfaces would be better.
Thanks! The last two links don’t show anything. Are they searches of keywords?
I thought I remembered someone talking about using USB to transfer data, but maybe they were talking about the other serial interfaces. With the ethernet, you would use UDP or something similar I’m guessing?
My team has been using NI Vision assistant and we now know how to track objects on the screen; however, we were wondering how to impliment the tracking to motor movement. We want our robot to find the target then auto adjust to score. If anyone has any code, websites, or tips for us that would be great. Thank you.
I have a general algorithm for you. Mind you, it requires an encoder or gyro, depending on what you’re moving (turret or robot).
Step 1: Use camera to acquire angle relative to shooter.
Step 2: Use Encoder/Gyro to turn that angle (or close to it).
Step 3: Use camera to check new angle relative to shooter.
If angle requires more adjustment, GOTO Step 2. Else, continue to Step 4.
Step 4: FIRE!
The vision example does some mapping on target location to make the target coordinates range from -1 to 1 in X and Y. The reason for this is that it makes the code independent of camera resolution and makes it very similar to a joystick output. You may need to multiply by a scaling number, but this is pretty close to being able to wire the camera result to the Robot Drive.
If you want to close the loop faster than your image processing allows, consider using a gyro or IMU to turn by X degrees.