Go to Post Who am I kidding, FIRST is so much cooler now than it was when I was a youngin'! - Michael Corsetto [more]
Home
Go Back   Chief Delphi > Technical > Programming > Java
CD-Media   CD-Spy  
portal register members calendar search Today's Posts Mark Forums Read FAQ rules

 
Reply
Thread Tools Rate Thread Display Modes
  #1   Spotlight this post!  
Unread 15-04-2014, 23:05
JohnGilb JohnGilb is offline
Programming Mentor, Drive Mentor
FRC #0488
 
Join Date: Mar 2011
Rookie Year: 2003
Location: Redmond, WA
Posts: 116
JohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura about
CPU spike to 100% - robot unresponsive

Hello all,

We had an issue where our robot became unresponsive during two matches - one practice match, and one elimination match.

Looking at the driver station logs, we saw that the CPU usage of the cRIO spiked to 100%. The motors continued to run at the voltages they were set at before this moment - this meant that we drove (usually at full speed) into a wall and then stalled our motors.

In the past, when we've seen deadlocks (like when two WhileHeld commands both required the same Subsystem) CPU usage would drop to near 0%. This is making us believe the problem is in some sort of infinite loop somewhere... likely in our own code.

Has anybody seen an issue like this before? If so, how did you debug/diagnose it?

Thanks!
Reply With Quote
  #2   Spotlight this post!  
Unread 15-04-2014, 23:09
cgmv123's Avatar
cgmv123 cgmv123 is offline
FRC RI/FLL Field Manager
AKA: Max Vrany
FRC #1306 (BadgerBOTS)
Team Role: College Student
 
Join Date: Jan 2011
Rookie Year: 2011
Location: Madison, WI
Posts: 2,079
cgmv123 has a reputation beyond reputecgmv123 has a reputation beyond reputecgmv123 has a reputation beyond reputecgmv123 has a reputation beyond reputecgmv123 has a reputation beyond reputecgmv123 has a reputation beyond reputecgmv123 has a reputation beyond reputecgmv123 has a reputation beyond reputecgmv123 has a reputation beyond reputecgmv123 has a reputation beyond reputecgmv123 has a reputation beyond repute
Re: CPU spike to 100% - robot unresponsive

Adding a
Code:
Timer.delay(2/1000);
at the end of a teleoperated loop can work wonders.
__________________
BadgerBOTS Robotics|@team1306|Facebook: BadgerBOTS
2016 FIRST Championship Tesla Division | 2016 Wisconsin Regional Engineering Inspiration Award

2015 FIRST Championship Carson Division | 2015 Wisconsin Regional Chairman's Award

2013 FIRST Championship Curie Division | 2013 Wisconsin Regional Chairman's Award

2012 FIRST Championship Archimedes Division | 2012 Wisconsin Regional Engineering Inspiration Award, Woodie Flowers Finalist Award (Lead Mentor Ben Senson)

Reply With Quote
  #3   Spotlight this post!  
Unread 15-04-2014, 23:10
joelg236 joelg236 is offline
4334 Retired Mentor & Alumni
AKA: Joel Gallant
no team
Team Role: Mentor
 
Join Date: Dec 2011
Rookie Year: 2012
Location: Calgary
Posts: 733
joelg236 has a reputation beyond reputejoelg236 has a reputation beyond reputejoelg236 has a reputation beyond reputejoelg236 has a reputation beyond reputejoelg236 has a reputation beyond reputejoelg236 has a reputation beyond reputejoelg236 has a reputation beyond reputejoelg236 has a reputation beyond reputejoelg236 has a reputation beyond reputejoelg236 has a reputation beyond reputejoelg236 has a reputation beyond repute
Re: CPU spike to 100% - robot unresponsive

Sounds like a code thing. You're probably running a loop somewhere under a certain condition. Usually you wouldn't see anything close to 100% from normal usage.

The fact that your drivetrain becomes unresponsive but continues to drive could mean 1 of 2 things. One is that something else loops and takes all processing, without updating the drivetrain voltage (this is less likely - motor safety would need to be off). The other is that something in your code (PID?) is looping, and setting your drivetrain a certain speed for that period of time, without letting anything else continue to run. Is one of your triggered commands running something on the drivetrain?

Let me know if that makes sense, I could take a look at the code if it's Java too.
__________________
All opinions are my own.
Reply With Quote
  #4   Spotlight this post!  
Unread 15-04-2014, 23:16
Jared Russell's Avatar
Jared Russell Jared Russell is offline
Taking a year (mostly) off
FRC #0254 (The Cheesy Poofs), FRC #0341 (Miss Daisy)
Team Role: Engineer
 
Join Date: Nov 2002
Rookie Year: 2001
Location: San Francisco, CA
Posts: 3,078
Jared Russell has a reputation beyond reputeJared Russell has a reputation beyond reputeJared Russell has a reputation beyond reputeJared Russell has a reputation beyond reputeJared Russell has a reputation beyond reputeJared Russell has a reputation beyond reputeJared Russell has a reputation beyond reputeJared Russell has a reputation beyond reputeJared Russell has a reputation beyond reputeJared Russell has a reputation beyond reputeJared Russell has a reputation beyond repute
Re: CPU spike to 100% - robot unresponsive

I have helped teams debug this problem at events many times over the years, and the majority of the time the issue is a "for" or "while" loop where there shouldn't be one. With an Iterative or Command-based robot, you seldom need loops in user code.

Other less common causes are interrupts or TimerTasks that take too long to run and miss their deadlines, very long print statements, and deadlocks.

I would put the robot up on blocks and try to reproduce the failure.

One strategy that is quick-and-dirty that I use on desktop apps is to run the program in a debugger, then pause execution and look at the current line once I hit 100% CPU usage. If your CPU utilization is normally fairly low, this method has good odds of stopping in the problematic section. Consider it CPU profiler roulette
Reply With Quote
  #5   Spotlight this post!  
Unread 15-04-2014, 23:30
JohnGilb JohnGilb is offline
Programming Mentor, Drive Mentor
FRC #0488
 
Join Date: Mar 2011
Rookie Year: 2003
Location: Redmond, WA
Posts: 116
JohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura about
Re: CPU spike to 100% - robot unresponsive

Are the motor safeties on a separate thread (and thus somewhat unaffected by a cpu usage spike)? I would hope that the watchdog is somehow resilient to some of our code demanding lots of cpu time.

joelg236, I'll see about packaging up our code.

Reproducing the failure has been the hardest part - we've only been able to get it to reproduce in matches. We ran the robot substantially in the pits to no effect.

We have a spare cRIO back at home that we can run code on, but it will be missing sensors/actuators.

Last edited by JohnGilb : 15-04-2014 at 23:37. Reason: add detail.
Reply With Quote
  #6   Spotlight this post!  
Unread 15-04-2014, 23:39
Thad House Thad House is offline
Volunteer, WPILib Contributor
no team (Waiting for 2021)
Team Role: Mentor
 
Join Date: Feb 2011
Rookie Year: 2010
Location: Thousand Oaks, California
Posts: 1,099
Thad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond repute
Re: CPU spike to 100% - robot unresponsive

Quote:
Originally Posted by JohnGilb View Post
Are the motor safeties on a separate thread (and thus somewhat unaffected by a cpu usage spike)? I would hope that the watchdog is somehow resilient to some of our code demanding lots of cpu time.

joelg236, I'll see about packaging up our code.
I think the motor safeties are in a separate thread. Although I do know that they are not enabled by default, unless you are using the RobotDrive class, which since you guys are octocanum I would guess that you are not. So if you have not explicitly enabled them and are not using the RobotDrive class, there would be no motor watchdog enabled. The FPGA watchdog will only shut the robot off if it looses connection to the DriverStation.

What do the ping status's look like during those matches? That could help debug as well.

Usually when I've seen 100% cpu usage, the crio actually stops responding to the DriverStation, and the watchdog would usually kill the robot. That would show up in the logs as well.
__________________
All statements made are my own and not the feelings of any of my affiliated teams.
Teams 1510 and 2898 - Student 2010-2012
Team 4488 - Mentor 2013-2016
Co-developer of RobotDotNet, a .NET port of the WPILib.
Reply With Quote
  #7   Spotlight this post!  
Unread 15-04-2014, 23:43
JohnGilb JohnGilb is offline
Programming Mentor, Drive Mentor
FRC #0488
 
Join Date: Mar 2011
Rookie Year: 2003
Location: Redmond, WA
Posts: 116
JohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura about
Re: CPU spike to 100% - robot unresponsive

If anybody wants to look at our code, I've placed a drop here:
https://github.com/Team488/Shared

For some basic explanation of the architecture:
http://www.chiefdelphi.com/media/papers/2912
Reply With Quote
  #8   Spotlight this post!  
Unread 15-04-2014, 23:46
JohnGilb JohnGilb is offline
Programming Mentor, Drive Mentor
FRC #0488
 
Join Date: Mar 2011
Rookie Year: 2003
Location: Redmond, WA
Posts: 116
JohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura about
Re: CPU spike to 100% - robot unresponsive

I've also attached a picture of the DS log of the elimination match. Ping / packet loss looks consistent before and after the 100% event.

The complete loss of connection you see about 75% of the way through is where our drivers did a remote soft reset of the cRIO - it comes back up just seconds before the match ends, I think.
Attached Thumbnails
Click image for larger version

Name:	Cpu100.png
Views:	40
Size:	70.5 KB
ID:	16836  

Last edited by JohnGilb : 15-04-2014 at 23:48. Reason: Add more context
Reply With Quote
  #9   Spotlight this post!  
Unread 15-04-2014, 23:59
Thad House Thad House is offline
Volunteer, WPILib Contributor
no team (Waiting for 2021)
Team Role: Mentor
 
Join Date: Feb 2011
Rookie Year: 2010
Location: Thousand Oaks, California
Posts: 1,099
Thad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond repute
Re: CPU spike to 100% - robot unresponsive

Quote:
Originally Posted by JohnGilb View Post
I've also attached a picture of the DS log of the elimination match. Ping / packet loss looks consistent before and after the 100% event.

The complete loss of connection you see about 75% of the way through is where our drivers did a remote soft reset of the cRIO - it comes back up just seconds before the match ends, I think.
So that log definitely has something wrong. The line that is across 16 shows what the DS is commanding to the robot. The lines above that show what mode the robot is actually in. By the looks of it, it looks like the robot stopped reporting what mode it was in when the cpu jumped to 100%. I don't know if its the FPGA that reports those things, or if its the CPU, and that would be a question that the NI people could answer better. But that could help get a better understanding of where the error is.
__________________
All statements made are my own and not the feelings of any of my affiliated teams.
Teams 1510 and 2898 - Student 2010-2012
Team 4488 - Mentor 2013-2016
Co-developer of RobotDotNet, a .NET port of the WPILib.

Last edited by Thad House : 16-04-2014 at 00:02.
Reply With Quote
  #10   Spotlight this post!  
Unread 16-04-2014, 00:10
Joe Ross's Avatar Unsung FIRST Hero
Joe Ross Joe Ross is offline
Registered User
FRC #0330 (Beachbots)
Team Role: Engineer
 
Join Date: Jun 2001
Rookie Year: 1997
Location: Los Angeles, CA
Posts: 8,572
Joe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond repute
Re: CPU spike to 100% - robot unresponsive

Quote:
Originally Posted by Thad House View Post
So that log definitely has something wrong. The line that is across 16 shows what the DS is commanding to the robot. The lines above that show what mode the robot is actually in. By the looks of it, it looks like the robot stopped reporting what mode it was in when the cpu jumped to 100%. I don't know if its the FPGA that reports those things, or if its the CPU, and that would be a question that the NI people could answer better. But that could help get a better understanding of where the error is.
The robot reports the code state, in the iterative robot base class. It lends credence to the theory that the robot is getting stuck in a loop.
Reply With Quote
  #11   Spotlight this post!  
Unread 16-04-2014, 00:16
Jon Stratis's Avatar
Jon Stratis Jon Stratis is offline
Electrical/Programming Mentor
FRC #2177 (The Robettes)
Team Role: Mentor
 
Join Date: Feb 2007
Rookie Year: 2006
Location: Minnesota
Posts: 3,784
Jon Stratis has a reputation beyond reputeJon Stratis has a reputation beyond reputeJon Stratis has a reputation beyond reputeJon Stratis has a reputation beyond reputeJon Stratis has a reputation beyond reputeJon Stratis has a reputation beyond reputeJon Stratis has a reputation beyond reputeJon Stratis has a reputation beyond reputeJon Stratis has a reputation beyond reputeJon Stratis has a reputation beyond reputeJon Stratis has a reputation beyond repute
Re: CPU spike to 100% - robot unresponsive

We had a similar issue, caused by a loop. Essentially, we had something like (pseudocode):
Code:
while(limitSwitch == false)
{
    motor = -1.0;
}
We just added a small timer.delay in there and it completely solved our problem.

We were able to find the issue by comparing our match logs to what we actually did in the match - we saw that every time the CPU spiked, it was while we were shooting the ball. So, we could narrow down where we were looking and walk through the code until we found it. Note that it didn't cause issues every time it spiked, only a couple of times.
__________________
2007 - Present: Mentor, 2177 The Robettes
LRI: North Star 2012-2016; Lake Superior 2013-2014; MN State Tournament 2013-2014, 2016; Galileo 2016; Iowa 2017
2015: North Star Regional Volunteer of the Year
2016: Lake Superior WFFA
Reply With Quote
  #12   Spotlight this post!  
Unread 16-04-2014, 13:57
JohnGilb JohnGilb is offline
Programming Mentor, Drive Mentor
FRC #0488
 
Join Date: Mar 2011
Rookie Year: 2003
Location: Redmond, WA
Posts: 116
JohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura about
Re: CPU spike to 100% - robot unresponsive

Yeah, we suspect the same - but there are no obvious candidates. We've been searching through our code and only have a few candidate loops (of the while / for variety).

This weekend we'll try and run some longhaul/stress tests on our backup cRIO and see if we can reproduce the issue.
Reply With Quote
  #13   Spotlight this post!  
Unread 17-04-2014, 09:08
Camilo86's Avatar
Camilo86 Camilo86 is offline
Registered User
AKA: camilo
FRC #0125 (Nutrons)
Team Role: Programmer
 
Join Date: Jun 2013
Rookie Year: 2012
Location: Boston
Posts: 21
Camilo86 is a glorious beacon of lightCamilo86 is a glorious beacon of lightCamilo86 is a glorious beacon of lightCamilo86 is a glorious beacon of lightCamilo86 is a glorious beacon of lightCamilo86 is a glorious beacon of light
Re: CPU spike to 100% - robot unresponsive

As a general rule, try not to use while loops. There are parts on your robot project that automatically update (on your command you should be able to use execute() as your loop. You can also use the teleopPeriodic() in the main project file too)

https://github.com/FRC125/NU14
__________________
Hopper finalist 2015
Dean's List winner 2015
Reply With Quote
  #14   Spotlight this post!  
Unread 19-04-2014, 15:07
JohnGilb JohnGilb is offline
Programming Mentor, Drive Mentor
FRC #0488
 
Join Date: Mar 2011
Rookie Year: 2003
Location: Redmond, WA
Posts: 116
JohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura about
Re: CPU spike to 100% - robot unresponsive

We've been trying to repro the issue and have been inspecting our code, with no luck.

Let's assume for a minute that we can't find the root issue in time. What other options are there?

Is there a way to perform a quick program reset? Instead of rebooting the cRIO, is there any way to just kill the robot process and start a new one?
Reply With Quote
Reply


Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 06:17.

The Chief Delphi Forums are sponsored by Innovation First International, Inc.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi