OCCRA
Go to Post 610 is an all boys school, and we were surprised to see our pit and robot neatly decorated with hot pink boas and flowers upon arrival this morning. - Mr. Lim [more]
Home
Go Back   Chief Delphi > Technical > Programming > NI LabVIEW
CD-Events   CD-Media   CD-Spy   FRC-Spy  
portal register members calendar search Today's Posts Mark Forums Read FAQ rules

 
Reply
Thread Tools Rate Thread Display Modes
  #1   Spotlight this post!  
Unread 04-26-2012, 10:36 PM
Clayton Yocom's Avatar
Clayton Yocom Clayton Yocom is offline
Technokat's Kodekat
FRC #0045 (Technokats)
Team Role: Programmer
 
Join Date: Jan 2011
Rookie Year: 2011
Location: Kokomo, IN
Posts: 83
Clayton Yocom will become famous soon enoughClayton Yocom will become famous soon enough
Send a message via AIM to Clayton Yocom Send a message via MSN to Clayton Yocom Send a message via Yahoo to Clayton Yocom Send a message via Skype™ to Clayton Yocom
100% CPU usage and double timeout bug

Hello Everyone!
I'm the programming lead for the TechnoKats, and we've been having some interesting issues/results that we'd like to share and see if anyone else has been having similar issues.

Today, after our practice match, we took to the practice field where we had large delays and weird results from our PID loops. This included our arm pid occelating as if we set the PID wrong. The first thing we did when we got back to the pit is tether the robot and try to replicate the problem. Here, we found that not only could we not replicate the problem, but we could not find any issues with communications or dropped packets in our driverstation logs.

Now we go into debug mode. Where is this mysterious bug and what could be causing it? We then noticed we had been pushing 100% cpu on the cRIO without even running teleop. (This is odd because all of our vision processing is done off-board, on the driverstation) We then checked all of our code and loops for delays. After not finding any, we asked NI for help. They continued the search, and we ran into dead end after dead end. We started disabling code to find the errornous code, and we started to narrow it down when we ran into the double timeout bug.

The double timeout bug is where (as it has been explained to me) where the cRIO thinks its connected to something but isn't. That causes it to do weird things like not let you deploy code and such. The only way to fix it (that I've found) is to reboot the robot. If it continues then turn on "No App" switch on the cRIO and deploy code again. Then restart the cRIO without the no app switch. This should solve the problem at least for awhile.

So as we were testing for the code that was flooring the cpu, we decided just to upload the original code and run the profiler, and that's where the weirdest thing happened. The cRIO stopped being floored. It cut the usage in half without a single change. If anyone else has had an issue similar, please let us know so we can fix this properly.
Thanks!
Reply With Quote
  #2   Spotlight this post!  
Unread 04-27-2012, 05:32 PM
JohnGilb JohnGilb is offline
Programming Mentor, Drive Mentor
FRC #0488
 
Join Date: Mar 2011
Rookie Year: 2003
Location: Redmond, WA
Posts: 96
JohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura aboutJohnGilb has a spectacular aura about
Re: 100% CPU usage and double timeout bug

I'm not sure about the double timeout as a cause, but I've seen the effect you've described this year. After some optimization, our code uses about 50% of available CPU, but we still need to use the No-App switch to deploy new code, or we time out during deploy:

1) Flip switch to No-App
2) Software reboot robot
3) Deploy built LabVIEW project
4) Flip switch away from No-App
5) Software reboot robot

I've also seen intermittent instances where for some runs the machine will be pegged to 100% cpu, but the issue will not repro once we reboot the machine.
Reply With Quote
  #3   Spotlight this post!  
Unread 04-27-2012, 10:32 PM
MAldridge's Avatar
MAldridge MAldridge is offline
Lead Programmer
AKA: Rube #1
FRC #0418 (LASA Robotics)
Team Role: Programmer
 
Join Date: Jan 2011
Rookie Year: 2010
Location: Austin
Posts: 117
MAldridge will become famous soon enoughMAldridge will become famous soon enough
Re: 100% CPU usage and double timeout bug

we had a similar issue, which turned out to be caused by a loop freewheeling. Because it was our main drive loop, we didn't see any difference, but we couldn't deploy while it was running...
__________________
'Why are you a programer?' --Team Captain
'Because the robot isn't complicated enough!' --Me
Reply With Quote
  #4   Spotlight this post!  
Unread 04-28-2012, 01:10 AM
sanddrag sanddrag is offline
back to school ;-)
FRC #0696 (Circuit Breakers)
Team Role: Teacher
 
Join Date: Jul 2002
Rookie Year: 2002
Location: Glendale, CA
Posts: 7,875
sanddrag has a reputation beyond reputesanddrag has a reputation beyond reputesanddrag has a reputation beyond reputesanddrag has a reputation beyond reputesanddrag has a reputation beyond reputesanddrag has a reputation beyond reputesanddrag has a reputation beyond reputesanddrag has a reputation beyond reputesanddrag has a reputation beyond reputesanddrag has a reputation beyond reputesanddrag has a reputation beyond repute
Re: 100% CPU usage and double timeout bug

I am not at all a Labview programmer, but we had a similar issue of CPU pegged and weird things happening, and it turned out to be certain "tasks" (events? what do you call them?) being executed in pseudo-random order in the loop because they were not wired together, like I suppose they needed to be, but nowhere documented that they had to be. I believe one of the things that was part of the issue was the way we had implemented the compressor code or library.

Sorry for the vague description. That's the best I have from my student. He knows what he's doing but can't communicate it to a level I can understand and re-communicate. This is why I'm the design and mechanical guy. :/
__________________
Teacher/Engineer/Machinist - Team 696 Circuit Breakers, 2011 - Present
Mentor/Engineer/Machinist, Team 968 RAWC, 2007-2010
Technical Mentor, Team 696 Circuit Breakers, 2005-2007
Student Mechanical Leader and Driver, Team 696 Circuit Breakers, 2002-2004

Purchase Cree LED Bulbs from the Team 696 Online Store
Reply With Quote
  #5   Spotlight this post!  
Unread 05-04-2012, 07:32 AM
Greg McKaskle Greg McKaskle is offline
Registered User
no team (Team NI)
 
Join Date: Apr 2008
Rookie Year: 2008
Location: Austin, TX
Posts: 3,952
Greg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond repute
Re: 100% CPU usage and double timeout bug

I saw the issue with deploying with a few teams this year. I'll probably be disabling the safety config in the disabled code next year. We are also asking the RT team to make the deploy more aggressive. Currently, it is too easy for a busy cRIO to take a long time to do the deploy.

It isn't clear why this is called the double timeout, or how the deploy is related to the excessive CPU usage.

We're there diagnostic errors due to timing or other issues? That seems like a possible reason for the CPU usage to be high.

Greg McKaskle
Reply With Quote
  #6   Spotlight this post!  
Unread 05-04-2012, 08:17 AM
Clayton Yocom's Avatar
Clayton Yocom Clayton Yocom is offline
Technokat's Kodekat
FRC #0045 (Technokats)
Team Role: Programmer
 
Join Date: Jan 2011
Rookie Year: 2011
Location: Kokomo, IN
Posts: 83
Clayton Yocom will become famous soon enoughClayton Yocom will become famous soon enough
Send a message via AIM to Clayton Yocom Send a message via MSN to Clayton Yocom Send a message via Yahoo to Clayton Yocom Send a message via Skype™ to Clayton Yocom
Quote:
Originally Posted by Greg McKaskle View Post
It isn't clear why this is called the double timeout, or how the deploy is related to the excessive CPU usage.
I agree, that's why I came here to share my lack of knowledge so hopefully we can not have the bug again. It was termed the double timeout bug by one of the NI guys, Kevin something. He said he had seen it before and hated it.

Quote:
Originally Posted by Greg McKaskle View Post
We're there diagnostic errors due to timing or other issues? That seems like a possible reason for the CPU usage to be high.
It kept giving us errors as if begin was finishing after periodic tasks and such were opened, refnum errors and such that didn't actually cause anything to break once the robot started, but it did put out LOTS of errors. I can post the code later today when I go to the shop if you'd like.
Reply With Quote
  #7   Spotlight this post!  
Unread 08-10-2012, 06:02 PM
Levansic's Avatar
Levansic Levansic is offline
Registered User
AKA: Len Evansic
FRC #0585 (Cyber Penguins)
Team Role: Mentor
 
Join Date: Jan 2012
Rookie Year: 2008
Location: Tehachapi, CA
Posts: 146
Levansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud of
Re: 100% CPU usage and double timeout bug

Quote:
Originally Posted by Greg McKaskle View Post
...I'll probably be disabling the safety config in the disabled code next year.
I was going to post on the "Watchdog Not Fed!!!" thread, but the quoted statement in this thread caught my eye.

Our robot returned from St. Louis, less one of our 10 CAN Jaguars, due to our bridge-tipper being removed for shipment after CMP. As we don't have a bridge in our lab, we felt it OK to leave the manipulator and controlling Jaguar off.

We fired it up several times this summer, and kept getting the watchdog error, accompanied by shuddering, when all motors would momentarily switch off, and then right back on.

We immediately thought that the code was waiting for a reply from the non-existent jaguar, so we drew disable blocks over the related code in begin.vi and our timed_task.vi. The behavior did not change.

Returning to the above quote, our understanding was that drawing the disable blocks, removed the underlying code from the compilation step. Is there still code (like the safety system) that still gets compiled in, even if it is "disabled"?

Not sure if it makes a difference, but we are using the original 8-port cRIO, and occasionally find it temperamental to deploy code, or re-image.

-- Len

Last edited by Levansic : 08-10-2012 at 06:06 PM.
Reply With Quote
  #8   Spotlight this post!  
Unread 08-10-2012, 08:15 PM
Greg McKaskle Greg McKaskle is offline
Registered User
no team (Team NI)
 
Join Date: Apr 2008
Rookie Year: 2008
Location: Austin, TX
Posts: 3,952
Greg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond repute
Re: 100% CPU usage and double timeout bug

The disable structure does what you think. It disables the code as if it were deleted. The wires leaving the structure have the default value or whatever you wire up to the Enabled frame.

The comment about safety being disabled was referring to a simple modification to the default Disabled.vi to ensure that it doesn't cause safety errors.

As for the potential cause of your error. Does the CAN topology make sense with that one disabled? Also make sure that the CAN connections, cables, and terminator are good. Check to see if there are errors on the Diagnostics Message box, and potentially add the Elapsed Times VI to the loop that you think may be running to slowly to update the RobotDrive often enough.

Greg McKaskle
Reply With Quote
  #9   Spotlight this post!  
Unread 08-11-2012, 12:52 AM
Levansic's Avatar
Levansic Levansic is offline
Registered User
AKA: Len Evansic
FRC #0585 (Cyber Penguins)
Team Role: Mentor
 
Join Date: Jan 2012
Rookie Year: 2008
Location: Tehachapi, CA
Posts: 146
Levansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud of
Re: 100% CPU usage and double timeout bug

CAN is working great. We are using a star topology, and termination is working. 2CAN reports no errors. We'll be looking at inserting elapsed times next week when we get back into the lab, after school starts.

The weird thing, is that everything was "working great" after CMP for a few local demos. At least that is what our students maintain. This summer, there were no on board code changes. We did change our custom dashboard to disable target seeking for our turret. Now it looks like our robot has epileptic fits. The packet structure out of the dashboard is the same, and we can switch over to the prior seeking code or our original calibration mode. Everything works, just with the watchdog error. This lead us to chase possible intermittent connection causes.

We swapped out Ethernet cables and checked every cable connection we could think of. We had some prior problems with cable retention on one port of the D-Link switch, but this was not the problem. Not finding problems with any cables, we're back to searching for potential software issues.

I didn't even think to check for code in the disable.vi. I know that we reference and close that missing jaguar in the disable.vi. Do you think that could be triggering the problem?

-- Len
Reply With Quote
  #10   Spotlight this post!  
Unread 08-11-2012, 05:30 AM
Greg McKaskle Greg McKaskle is offline
Registered User
no team (Team NI)
 
Join Date: Apr 2008
Rookie Year: 2008
Location: Austin, TX
Posts: 3,952
Greg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond repute
Re: 100% CPU usage and double timeout bug

I reread the post with less bleary eyes and noticed that you said you received a watchdog error. The disable topic was about Safety Config, so I had them confused.

Watchdog is not on by default, and Safety is enabled only for the RobotDrive. Please determine which is on, try turning it off and verify that the symptoms change. If the jerking is being caused by WD or Safety, then it means you are missing deadlines. It may also mean you have a workaround.

Assuming you are missing deadlines, I'd verify you have no errors in the Diagnostics panel, as the current mechanism for catching errors and shuttling them to the window is quite heavy and can cause you to miss deadlines. If a missing jag is still being referenced, or a disable structure causes a wire to be bad and causes errors, ...

The disable issue is problematic to the original thread because most robots are in disable when they are being reimaged or reprogrammed. If disabled robots are throwing errors they take longer to respond and sometimes require a NoApp switch or similar. Making the disable code less CPU intensive due to errors seems like it will resolve many of these issues. I don't think disable has any impact in your robot's twitch.

Greg McKaskle
Reply With Quote
  #11   Spotlight this post!  
Unread 08-11-2012, 04:24 PM
Tom Line's Avatar
Tom Line Tom Line is offline
Raptors can't turn doorknobs.
FRC #1718 (The Fighting Pi)
Team Role: Mentor
 
Join Date: Jan 2007
Rookie Year: 1999
Location: Armada, Michigan
Posts: 2,139
Tom Line has a reputation beyond reputeTom Line has a reputation beyond reputeTom Line has a reputation beyond reputeTom Line has a reputation beyond reputeTom Line has a reputation beyond reputeTom Line has a reputation beyond reputeTom Line has a reputation beyond reputeTom Line has a reputation beyond reputeTom Line has a reputation beyond reputeTom Line has a reputation beyond reputeTom Line has a reputation beyond repute
Re: 100% CPU usage and double timeout bug

The closest thing I've seen to what you describe is when we deploy code from a given computer while tethered. Now, while the code is running, we disconnect the tether. If you reconnect your tether cable you will be unable to redeploy code until you reboot the crio or power your robot up then down.

I cannot say that I've ever seen a situation exactly like what you describe. Once the cRio timed out, it won't do anything for us until we reboot it.

Regarding the disabled mode and not deploying, we ran into the same problems with problematic deploys ALL the time last year (2011). What we found was that because we have every sensor, data accumulator, etc enabled in disabled (including vision) so that we could debug, it pushed the the CPU too high to do a successful deploy.

We now encase all of our disabled code into a single If/Then case connected to a Button on the front panel. The default position of that button is OFF, so when we deploy to the robot permanently none of the extraneous sensor stuff runs in disabled mode.

When we temporarily deploy from the programming computer, we turn the button on when we need the data to tune things, then turn it back off to deploy so there is no code running in disabled mode.
Reply With Quote
  #12   Spotlight this post!  
Unread 08-12-2012, 12:33 PM
Levansic's Avatar
Levansic Levansic is offline
Registered User
AKA: Len Evansic
FRC #0585 (Cyber Penguins)
Team Role: Mentor
 
Join Date: Jan 2012
Rookie Year: 2008
Location: Tehachapi, CA
Posts: 146
Levansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud of
Re: 100% CPU usage and double timeout bug

Quote:
Originally Posted by Greg McKaskle View Post
Watchdog is not on by default, and Safety is enabled only for the RobotDrive. Please determine which is on, try turning it off and verify that the symptoms change. If the jerking is being caused by WD or Safety, then it means you are missing deadlines. It may also mean you have a workaround.
That's what has been puzzling us. We don't have anything referring to the watchdog and certainly didn't add any watchdog vi's, and we have safety config disabled on our drive. We didn't add safety config to any of our other motor controls. Our cRIO's are old, dating back to 2008. Even though we re-image them several times a year, is there a a possibility that old watchdog code is still lingering in a zombie state?

Quote:
Originally Posted by Greg McKaskle View Post
Assuming you are missing deadlines, I'd verify you have no errors in the Diagnostics panel, as the current mechanism for catching errors and shuttling them to the window is quite heavy and can cause you to miss deadlines. If a missing jag is still being referenced, or a disable structure causes a wire to be bad and causes errors, ...
We'll have to look closer this week. I don't remember what else was coming up as errors. The watchdog one stands out, as we didn't think we had any code that used that deprecated system.

-- Len
Reply With Quote
  #13   Spotlight this post!  
Unread 08-12-2012, 12:40 PM
Levansic's Avatar
Levansic Levansic is offline
Registered User
AKA: Len Evansic
FRC #0585 (Cyber Penguins)
Team Role: Mentor
 
Join Date: Jan 2012
Rookie Year: 2008
Location: Tehachapi, CA
Posts: 146
Levansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud of
Re: 100% CPU usage and double timeout bug

Quote:
Originally Posted by Tom Line View Post
Regarding the disabled mode and not deploying, we ran into the same problems with problematic deploys ALL the time last year (2011). What we found was that because we have every sensor, data accumulator, etc enabled in disabled (including vision) so that we could debug, it pushed the the CPU too high to do a successful deploy.
We saw that as well. We pushed our vision processing onto our driver station, to keep CPU utilization down. Interesting solution for debugging and quiet mode. We'll probably implement something similar this year.

-- Len
Reply With Quote
  #14   Spotlight this post!  
Unread 09-14-2012, 12:13 PM
Levansic's Avatar
Levansic Levansic is offline
Registered User
AKA: Len Evansic
FRC #0585 (Cyber Penguins)
Team Role: Mentor
 
Join Date: Jan 2012
Rookie Year: 2008
Location: Tehachapi, CA
Posts: 146
Levansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud ofLevansic has much to be proud of
Re: 100% CPU usage and double timeout bug

I'm a little late in posting about this, but here's a follow-up on our issue.

Our main programmer deleted all of the code related to the no-longer present CAN Jaguar. I verified that this code was covered by Disable blocks, but in some cases, still had wires coming or going.

All of the problems and all of the watchdog errors we were having went away!

Now, our diagnostic log on the DS is completely empty, except for a new message at references the watchdog. Again, we are not in any way calling any watchdog functions in our team code.

The new solitary error code is as follows:

Watchdog Expiration: System 1, User 0

Because the robot is now working great, I'm of the opinion that this error doesn't matter. At the same time, I am a little concerned because we shouldn't have any error for a function we aren't calling.

-- Len
Reply With Quote
  #15   Spotlight this post!  
Unread 09-14-2012, 12:41 PM
Joe Ross's Avatar Unsung FIRST Hero
Joe Ross Joe Ross is offline
Registered User
FRC #0330 (Beachbots)
Team Role: Engineer
 
Join Date: Jun 2001
Rookie Year: 1997
Location: Los Angeles, CA
Posts: 7,895
Joe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond repute
Re: 100% CPU usage and double timeout bug

Quote:
Originally Posted by Levansic View Post
Now, our diagnostic log on the DS is completely empty, except for a new message at references the watchdog. Again, we are not in any way calling any watchdog functions in our team code.

The new solitary error code is as follows:

Watchdog Expiration: System 1, User 0

Because the robot is now working great, I'm of the opinion that this error doesn't matter. At the same time, I am a little concerned because we shouldn't have any error for a function we aren't calling.
The system watchdog is based on communications and can't be disabled. The user watchdog is what you can control programmatically. I believe it's normal (or at least not uncommon) to see a single System watchdog expiration when starting the program.
Reply With Quote
Reply


Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 06:58 AM.

The Chief Delphi Forums are sponsored by Innovation First International, Inc.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi