|
|
|
![]() |
|
|||||||
|
||||||||
![]() |
| Thread Tools | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
CPU spike to 100% - robot unresponsive
Hello all,
We had an issue where our robot became unresponsive during two matches - one practice match, and one elimination match. Looking at the driver station logs, we saw that the CPU usage of the cRIO spiked to 100%. The motors continued to run at the voltages they were set at before this moment - this meant that we drove (usually at full speed) into a wall and then stalled our motors. In the past, when we've seen deadlocks (like when two WhileHeld commands both required the same Subsystem) CPU usage would drop to near 0%. This is making us believe the problem is in some sort of infinite loop somewhere... likely in our own code. Has anybody seen an issue like this before? If so, how did you debug/diagnose it? Thanks! |
|
#2
|
||||
|
||||
|
Re: CPU spike to 100% - robot unresponsive
Adding a
Code:
Timer.delay(2/1000); |
|
#3
|
||||
|
||||
|
Re: CPU spike to 100% - robot unresponsive
Sounds like a code thing. You're probably running a loop somewhere under a certain condition. Usually you wouldn't see anything close to 100% from normal usage.
The fact that your drivetrain becomes unresponsive but continues to drive could mean 1 of 2 things. One is that something else loops and takes all processing, without updating the drivetrain voltage (this is less likely - motor safety would need to be off). The other is that something in your code (PID?) is looping, and setting your drivetrain a certain speed for that period of time, without letting anything else continue to run. Is one of your triggered commands running something on the drivetrain? Let me know if that makes sense, I could take a look at the code if it's Java too. |
|
#4
|
|||||
|
|||||
|
Re: CPU spike to 100% - robot unresponsive
I have helped teams debug this problem at events many times over the years, and the majority of the time the issue is a "for" or "while" loop where there shouldn't be one. With an Iterative or Command-based robot, you seldom need loops in user code.
Other less common causes are interrupts or TimerTasks that take too long to run and miss their deadlines, very long print statements, and deadlocks. I would put the robot up on blocks and try to reproduce the failure. One strategy that is quick-and-dirty that I use on desktop apps is to run the program in a debugger, then pause execution and look at the current line once I hit 100% CPU usage. If your CPU utilization is normally fairly low, this method has good odds of stopping in the problematic section. Consider it CPU profiler roulette ![]() |
|
#5
|
|||
|
|||
|
Re: CPU spike to 100% - robot unresponsive
Are the motor safeties on a separate thread (and thus somewhat unaffected by a cpu usage spike)? I would hope that the watchdog is somehow resilient to some of our code demanding lots of cpu time.
joelg236, I'll see about packaging up our code. Reproducing the failure has been the hardest part - we've only been able to get it to reproduce in matches. We ran the robot substantially in the pits to no effect. We have a spare cRIO back at home that we can run code on, but it will be missing sensors/actuators. Last edited by JohnGilb : 15-04-2014 at 23:37. Reason: add detail. |
|
#6
|
||||
|
||||
|
Re: CPU spike to 100% - robot unresponsive
Quote:
What do the ping status's look like during those matches? That could help debug as well. Usually when I've seen 100% cpu usage, the crio actually stops responding to the DriverStation, and the watchdog would usually kill the robot. That would show up in the logs as well. |
|
#7
|
|||
|
|||
|
Re: CPU spike to 100% - robot unresponsive
If anybody wants to look at our code, I've placed a drop here:
https://github.com/Team488/Shared For some basic explanation of the architecture: http://www.chiefdelphi.com/media/papers/2912 |
|
#8
|
|||
|
|||
|
Re: CPU spike to 100% - robot unresponsive
I've also attached a picture of the DS log of the elimination match. Ping / packet loss looks consistent before and after the 100% event.
The complete loss of connection you see about 75% of the way through is where our drivers did a remote soft reset of the cRIO - it comes back up just seconds before the match ends, I think. Last edited by JohnGilb : 15-04-2014 at 23:48. Reason: Add more context |
|
#9
|
||||
|
||||
|
Re: CPU spike to 100% - robot unresponsive
Quote:
Last edited by Thad House : 16-04-2014 at 00:02. |
|
#10
|
||||||
|
||||||
|
Re: CPU spike to 100% - robot unresponsive
Quote:
|
|
#11
|
||||
|
||||
|
Re: CPU spike to 100% - robot unresponsive
We had a similar issue, caused by a loop. Essentially, we had something like (pseudocode):
Code:
while(limitSwitch == false)
{
motor = -1.0;
}
We were able to find the issue by comparing our match logs to what we actually did in the match - we saw that every time the CPU spiked, it was while we were shooting the ball. So, we could narrow down where we were looking and walk through the code until we found it. Note that it didn't cause issues every time it spiked, only a couple of times. |
|
#12
|
|||
|
|||
|
Re: CPU spike to 100% - robot unresponsive
Yeah, we suspect the same - but there are no obvious candidates. We've been searching through our code and only have a few candidate loops (of the while / for variety).
This weekend we'll try and run some longhaul/stress tests on our backup cRIO and see if we can reproduce the issue. |
|
#13
|
||||
|
||||
|
Re: CPU spike to 100% - robot unresponsive
As a general rule, try not to use while loops. There are parts on your robot project that automatically update (on your command you should be able to use execute() as your loop. You can also use the teleopPeriodic() in the main project file too)
https://github.com/FRC125/NU14 |
|
#14
|
|||
|
|||
|
Re: CPU spike to 100% - robot unresponsive
We've been trying to repro the issue and have been inspecting our code, with no luck.
Let's assume for a minute that we can't find the root issue in time. What other options are there? Is there a way to perform a quick program reset? Instead of rebooting the cRIO, is there any way to just kill the robot process and start a new one? |
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|