Every once in a while the motors on our robot begin stuttering (including motors other than the drivetrain). The stuttering lasts for a few seconds and then goes back to normal. When the stuttering occurs, the warning “RobotDrive…Output not updated often enough” shows up multiple times on the Driver Station. This stuttering happens with equal frequency and severity at low and high battery voltages. We reproduced the problem on our robot from last year when we updated its code to the 2017 libraries, so it’s not a physical problem. Outside of this stuttering, the robot behaves completely normally.
Our current theory is that the problem lies in the code. Specifically, we’ve interpreted the given warning to mean that the robot program is running too slowly, which is preventing the code from sending values to the talons often enough. Does this theory hold any merit? If so, what do we do about it? (We haven’t run a profiler on the code yet, though we plan to.) For reference, we’re using Talon SRXs for our motor controllers, we’re programming the robot in Java, and we’re using the Command-Based Robot code base.
It has something to do with WPILib’s RobotDrive/ArcadeDrive(or whatever they are calling it) class. We had this issue all season in 2016, and we solved it by making our own ArcadeDrive. Although, I’m sure there is better/simpler way to do it (I heard people suggesting playing around with [RobotDrive].isSafetyEnabled())
Edit: It is clear now the OP understands all of this, I’m just leaving this up for other people who Google this problem to read.
This is really incomplete advice and is really dangerous to just suggest casually messing with this sort of thing. The safety should not be disabled on your robot except as a last resort. You are right that it is a code issue, but it is not caused JUST by using WPILib’s driving functions. By bypassing WPILib’s drive functions, you may make the error go away, but you are not preventing the underlying problem from actually happening. You are just making it so the robot doesn’t tell you that it’s happening and doesn’t disable itself if it happens for an extended period.
This sort of error message is similar to the old “Watchdog not fed” errors. Basically, your robot code (when writing Iterative code, at least?) loops constantly through the same pieces of code over and over, constantly checking what the drivetrain should be doing and updating the drivetrain with those values. If it does not update often enough, the code will warn you with this error message, and if the failure isn’t just a quick intermittent one, it will disable your drivetrain.
This is usually a good thing. If this safety mode is disabled, and your robot drivetrain stops receiving inputs, it will just keep on running with the last input it received. It won’t stop running until it either gets a new signal or shuts off. So this robot would crash right into a wall, another robot, or (if at a demonstration for example) a person! Thus it is very good to have this safety mode on, and if you turn it off, you should not be turning it off just because “ugh my code keeps giving me this error” - you must fix the problem of your robot drivetrain not getting signal often enough, not just the problem that it gives you an error message!
One very simple bit of psuedo code that would trigger this error:
if (some other button is pressed)
Timer delay for 5 seconds
do something else
Can you see what the problem with the above code is (assuming no threading, etc?) The code loop that controls the drivetrain also contains a delay that prevents the code from looping often enough to prevent the error from happening. Only once every time the robot code loops (again assuming Iterative code) would the drivetrain update - during that entire 5 second delay, if the safety was removed, the robot would just keep on driving as it was last commanded without any ability to receive further input. That’s no good.
I would recommend posting your code for us to look at in order to examine it for any such unnecessary delays or wait statements or anything like that which may be holding up the code from properly executing. Don’t just make the error go away until you understand what’s happening.
The difference between our problem and the one described in that thread is that for us, the warning only happens occasionally. The problem in that thread is probably that the motors aren’t being updated every loop in the code, so the motors are stopping for safety because they’re no longer being updated. With our code, we believe that the problem is that the TalonSRXs are occasionally updated too slow, so they stop for safety, but then are quickly updated again, creating the brief and occasional stutters we see.
Also, the problem doesn’t only occur on our drive train (which uses the RobotDrive class to run its motors), but also when other motors are running. We were able to frequently reproduce it while running no motor other than our 2 shooter motors when our robot was out of the bag.
The code that we’re using to reproduce the problem on our Stronghold bot can be found in the 2017 branch of this GitHub repository. We appreciate the help!
Just to add some extra info to this, we see the problem both in teleop and auto, but haven’t really found any one thing that that triggers it. We are using the Arcade Drive class in Teleop, and the Tank Drive class in Auto. As Kaiz said, we have seen it in our shooter motors while the robot is completely stationary (only the shooter motors were running). The very intermitant nature of the problem (maybe one occurrence every 10-15 minutes of operation) makes it very hard to track.
Our programming team has been scouring the code for any undue waits or delays, but hasn’t come up with anything yet. If the cause of this error is that our code is too slow to update the Talons as often as they need, we haven’t found what could cause that yet.
Importantly, we were able to reproduce the problem on our 2016 robot simply by updating it’s code and libraries to the 2017 versions. This leads me to believe that the 2017 libraries require something different, and we are not meeting that requirement.
In particular if anyone has suggestions to help reproduce this intermittent problem, or even suggestions to help us profile our code or locate what situations could be causing this, it would be a huge help.
I also want to note that this has intermittently been happening to my team as well, but very sporadically, and never for long enough to produce a noticeable loss in robot signal or control. We have Victor SPs on the drive, not Talons, leading me to believe the problem isn’t specific to the speed controller. Since it has not impacted our performance I’ve not spent a lot of time fixing it with other code issues at play.
The issue seems to happen when battery voltage dips to a very low level - leading me to think maybe the rio is partially browning out or something? Not sure.
Brian, is that 10-15 minutes of constant operation or 10-15 minutes of operation broken up across several JVM restarts (assuming you are still using Java)?
I ask because it makes me wonder if you are being hit by a Java GC. My experience with using Java with FRC is that GC is rare to non-existent (unless you really work on producing garbage) during the relatively short runtime of a match. However, going for 10-15 minutes straight could do it.
A couple of options to try. First, there should be a log entry (may need to change some JRE setting) when GC occurs and it should have a time stamp. You could go through a long run and see if there is a correlation. Second, you could attach a test command to a button that forces a GC and see if that reproduces your error.
I hope this helps and good luck at Pitt this weekend. I will be watching.
We had this issue in our code a few years ago. I agree with what Chris said above, there is a problem, it should be addressed. And I don’t think it’s a WPILib problem.
You said you were looking for waits or delays. By recollection, the problem for us was a loop, so I’d scour your code for do/while/for loops.
Fundamentally, WPILib functions are constantly being called and entered and exited repeatedly until a task is complete. Functions should either be “done” or not done. What may be happening is that you have a loop in one of your functions, and it’s blocking the OS from ever taking over control. When you do that, the other safety functions can’t be called, and time out.
My broad recommendation is that you probably aren’t structuring your code right if you have loops within the functions you added or modified.
Math/Vision/Other Subsystems that you import code for - also may not be releasing control soon enough. Look for things that take control and don’t give it up.
I wonder how long it will be before we simply reach the point that we’re trying to do so much on the RoboRio that it simply can’t keep up. It’s a nice piece of hardware, but it’s fixed, and if each Java update or wpilib update is trying to cram that much more functionality in…
Teams these days aren’t just polling for sensors and joystick inputs and sending motor drive commands, we’ve got TCP traffic to Raspberry Pis, vision processing, webcam image processing and encoding for streaming back to the driver station, …
Just thought I would drop back in and update this for anyone in the future. We determined that the problem we were seeing was the one detailed in this thread. After more research we found the spike in dropped packets. After using WLAN Optimizer as described in the thread, the problem disappeared. At competition (when connected over Ethernet) the problem never reoccurred. Still not entirely sure why so few teams see this issue, but boy it was a pain.