Go to Post I'd just give you the URL, but then you wouldn't learn to look for yourself. - DonRotolo [more]
Home
Go Back   Chief Delphi > Technical > Control System
CD-Media   CD-Spy  
portal register members calendar search Today's Posts Mark Forums Read FAQ rules

 
Reply
Thread Tools Rate Thread Display Modes
  #1   Spotlight this post!  
Unread 12-04-2016, 23:32
Thomas DeSilva's Avatar
Thomas DeSilva Thomas DeSilva is offline
Programming Fury
FRC #1983 (Skunkworks)
Team Role: Mentor
 
Join Date: Feb 2009
Rookie Year: 2009
Location: Des Moines, WA
Posts: 15
Thomas DeSilva is on a distinguished road
Robot enters stateless "state" on official FMS

The last two finals that we've been in this season, our robot has had control issues in one of the matches. If you watched the PNW District Championship in Portland last weekend (shoutout to 360 and the event staff for allowing that replay of the second match), you probably saw the symptoms: The robot started the match off fine, but a few seconds into auto, it stopped moving and didn't start again until close to the end of the match. Looking at the driver station logs, I see that the robot was in auto as it should have been, but then it just wasn't. Not disabled, just stateless. Our driver noted that according to the driver station, we still had comms and the code was still running and enabled. After he rebooted the RoboRIO, everything was fine and we played the last few seconds of the match.

We ran into a similar problem in the Philomath finals. The robot made it through autonomous, but a bit into teleop we lost code and the robot drove itself into the low bar (we suspect that the Talons were just running using their last input). Just like in Portland, the robot code had somehow lost its state.

We've been trying to figure out what the problem could be, but we're at a loss. Typically, when the robot code crashes, its handler will restart it fairly quickly; we don't see that happening here, so we don't think it's a run-of-the-mill crash. The logs--from the driver station and our own logging--don't indicate that the program crashed either. We know we didn't get an errant disable signal from the field because we would have seen that state change. Does anyone have any ideas as to what could be happening here? The logs in question are attached.
A few notes:
  • The RoboRIO continues running after the code stops and our round trip times stay fairly low.
  • The errors in the Portland log that occur near the crash are coincidental. Similar errors show up in most of our logs, and in particular they don't show up before the crash in Philomath.
  • Some of the messages in the Portland log got a bit polluted by a stray "cout" that made it into the program.
Attached Files
File Type: zip logs.zip (124.6 KB, 16 views)
__________________
Programmer 2009-2011
Driver 2011
WPI Class of 2015
Reply With Quote
  #2   Spotlight this post!  
Unread 12-04-2016, 23:39
MikLast's Avatar
MikLast MikLast is offline
CAO/Drive Coach
AKA: Mikal Dieatrick
FRC #4513 (Circuit Breakers)
Team Role: Leadership
 
Join Date: Jan 2015
Rookie Year: 2014
Location: Medical Lake, WA
Posts: 585
MikLast is a splendid one to beholdMikLast is a splendid one to beholdMikLast is a splendid one to beholdMikLast is a splendid one to beholdMikLast is a splendid one to beholdMikLast is a splendid one to behold
Re: Robot enters stateless "state" on official FMS

For anyone wanting to watch the match, click here for the stream feed and here for the full field feed (thank you so much FIRSTWA!)

Mind if i ask what programming language you are using? That could be part of the issue.
__________________

Check out the FRC Discord!

2014: programmer, scout
2015: programmer, admin, drive team
Innovation in control award, WVHS district event
Innovation in control award, CWU district event
finalist, PNW district championship
2016: CAO, Drive team.
Excellence In Engineering awad, WVHS District event
Reply With Quote
  #3   Spotlight this post!  
Unread 13-04-2016, 00:38
Thomas DeSilva's Avatar
Thomas DeSilva Thomas DeSilva is offline
Programming Fury
FRC #1983 (Skunkworks)
Team Role: Mentor
 
Join Date: Feb 2009
Rookie Year: 2009
Location: Des Moines, WA
Posts: 15
Thomas DeSilva is on a distinguished road
Re: Robot enters stateless "state" on official FMS

Quote:
Originally Posted by MikLast View Post
Mind if i ask what programming language you are using? That could be part of the issue.
We're using C++ with the WPILib command-based model.
__________________
Programmer 2009-2011
Driver 2011
WPI Class of 2015
Reply With Quote
  #4   Spotlight this post!  
Unread 13-04-2016, 00:54
rsisk's Avatar
rsisk rsisk is offline
The GURU Channel
AKA: Richard Sisk
FRC #2493 (Robokong)
Team Role: Mentor
 
Join Date: Jan 2008
Rookie Year: 2007
Location: Riverside, CA
Posts: 2,746
rsisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond reputersisk has a reputation beyond repute
Send a message via MSN to rsisk
Re: Robot enters stateless "state" on official FMS

Very similar thing happened to us on our final 4 matches at AZW last weekend. Started match fine, during auto the robot stopped, about 50 seconds later control was reestablished.

NI has our log files and is looking at them, but from what the CSA told us, the robo rio said the radio was lost, then when it came back up, the code restarted.
__________________
Quote:
The views expressed are mine and should not be construed to represent the views of anyone else.
Reply With Quote
  #5   Spotlight this post!  
Unread 13-04-2016, 01:02
Thad House Thad House is online now
Volunteer, WPILib Contributor
no team (Waiting for 2021)
Team Role: Mentor
 
Join Date: Feb 2011
Rookie Year: 2010
Location: Thousand Oaks, California
Posts: 1,075
Thad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond repute
Re: Robot enters stateless "state" on official FMS

I sent you a message on Facebook, but I might have an idea on what caused the crash at Portland. Still looking into the one at Philomath.

The one at Portland looks like the DriverStation Thread on the RoboRIO locked up. If that happened, comms and robot code would still be reported, however the robot would stop sending requests saying its enabled, and the main loop would not continue to run. I have a guess on where in the WPILib that would be, but would like to see your guy's code to double check.
__________________
All statements made are my own and not the feelings of any of my affiliated teams.
Teams 1510 and 2898 - Student 2010-2012
Team 4488 - Mentor 2013-2016
Co-developer of RobotDotNet, a .NET port of the WPILib.
Reply With Quote
  #6   Spotlight this post!  
Unread 13-04-2016, 07:37
RufflesRidge RufflesRidge is offline
Registered User
no team
 
Join Date: Jan 2012
Location: USA
Posts: 987
RufflesRidge has a brilliant futureRufflesRidge has a brilliant futureRufflesRidge has a brilliant futureRufflesRidge has a brilliant futureRufflesRidge has a brilliant futureRufflesRidge has a brilliant futureRufflesRidge has a brilliant futureRufflesRidge has a brilliant futureRufflesRidge has a brilliant futureRufflesRidge has a brilliant futureRufflesRidge has a brilliant future
Re: Robot enters stateless "state" on official FMS

Quote:
Originally Posted by Thad House View Post

The one at Portland looks like the DriverStation Thread on the RoboRIO locked up. If that happened, comms and robot code would still be reported
What would indicate to the communications process that robot code was running if the DS thread was locked and not calling into the communications process anymore?
Reply With Quote
  #7   Spotlight this post!  
Unread 13-04-2016, 13:08
Thad House Thad House is online now
Volunteer, WPILib Contributor
no team (Waiting for 2021)
Team Role: Mentor
 
Join Date: Feb 2011
Rookie Year: 2010
Location: Thousand Oaks, California
Posts: 1,075
Thad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond repute
Re: Robot enters stateless "state" on official FMS

Quote:
Originally Posted by RufflesRidge View Post
What would indicate to the communications process that robot code was running if the DS thread was locked and not calling into the communications process anymore?
Actually I was backwards. If user code freezes but the DS loop stays running, those would be the symptoms. The DS loop would flag the comms saying it is running, however when using IterativeRobot, the flags stating which mode the robot is in is are called in the main user code loop. So if the main user thread froze, or got stuck in a blocking call, the DS would still see comms and code, however the mode report calls don't happen and no movements would be responded to since no main loop is running.
__________________
All statements made are my own and not the feelings of any of my affiliated teams.
Teams 1510 and 2898 - Student 2010-2012
Team 4488 - Mentor 2013-2016
Co-developer of RobotDotNet, a .NET port of the WPILib.
Reply With Quote
  #8   Spotlight this post!  
Unread 14-04-2016, 00:53
Thomas DeSilva's Avatar
Thomas DeSilva Thomas DeSilva is offline
Programming Fury
FRC #1983 (Skunkworks)
Team Role: Mentor
 
Join Date: Feb 2009
Rookie Year: 2009
Location: Des Moines, WA
Posts: 15
Thomas DeSilva is on a distinguished road
Re: Robot enters stateless "state" on official FMS

Thad helped us track down a potential cause of a deadlock in our autonomous code that may have been the issue in Portland, but we're still trying to figure out what led to the initial failure in Philomath. We'd be happy to hear from anyone who comes up with something.
__________________
Programmer 2009-2011
Driver 2011
WPI Class of 2015
Reply With Quote
  #9   Spotlight this post!  
Unread 14-04-2016, 11:30
Greg McKaskle Greg McKaskle is offline
Registered User
FRC #2468 (Team NI & Appreciate)
 
Join Date: Apr 2008
Rookie Year: 2008
Location: Austin, TX
Posts: 4,748
Greg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond reputeGreg McKaskle has a reputation beyond repute
Re: Robot enters stateless "state" on official FMS

The trace for auto, teleOp, and Disabled are reported as your code responds to the auto, teleOp, or disabled state of the control packet.

Apparently, as Thad was saying, that thread was no longer making that call. This could be because of logic, sleeps, infinite loops, etc.

In the philomath log, the CPU usage drops to background levels when this happens. Your disabled code CPU usage is 50%. Your auto was 67% then ~%51. TeleOp was 66%. Then it drops to 15% when the traces stop. About 10 seconds later, the driver station starts to request code restarts and finally a reboot. The roboRIO and user process come back up and go back to that state of low CPU usage. More code restart requests, and another roboRIO reboot request. When the code finally came back during disabled, the CPU usage looks like it matches the mode.

During TeleOp, my guess is that the code was crashing or locked waiting for a mutex that was never released.

The Portland logs show the same Disabled - 50%, Auto 61% and then 51%. But when teleOp starts, that doesn't change like the other log file. It appears that the code doesn't call the trace instrumented function, and remains in auto. The CPU stays at 51% until the code restart. This is quite different from he other log since it wasn't tracing, but seemed to still be running auto.

When the code comes back from the code restart request, the CPU usage is at 70%, but I don't see any indication that the mode function was called. The roroRIO was then rebooted, and when it comes back up the roboRIO is at 70% and the getMode is being called normally.

At this point, there are no messages in the logs that definitively log a crash of a C++ program or Java program. I hope we can improve that. Hopefully this additional info helps you identify anything that may be contributing to this.

Greg McKaskle
Reply With Quote
  #10   Spotlight this post!  
Unread 14-04-2016, 12:32
Thad House Thad House is online now
Volunteer, WPILib Contributor
no team (Waiting for 2021)
Team Role: Mentor
 
Join Date: Feb 2011
Rookie Year: 2010
Location: Thousand Oaks, California
Posts: 1,075
Thad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond reputeThad House has a reputation beyond repute
Re: Robot enters stateless "state" on official FMS

Quote:
Originally Posted by Greg McKaskle View Post
The trace for auto, teleOp, and Disabled are reported as your code responds to the auto, teleOp, or disabled state of the control packet.

Apparently, as Thad was saying, that thread was no longer making that call. This could be because of logic, sleeps, infinite loops, etc.

In the philomath log, the CPU usage drops to background levels when this happens. Your disabled code CPU usage is 50%. Your auto was 67% then ~%51. TeleOp was 66%. Then it drops to 15% when the traces stop. About 10 seconds later, the driver station starts to request code restarts and finally a reboot. The roboRIO and user process come back up and go back to that state of low CPU usage. More code restart requests, and another roboRIO reboot request. When the code finally came back during disabled, the CPU usage looks like it matches the mode.

During TeleOp, my guess is that the code was crashing or locked waiting for a mutex that was never released.

The Portland logs show the same Disabled - 50%, Auto 61% and then 51%. But when teleOp starts, that doesn't change like the other log file. It appears that the code doesn't call the trace instrumented function, and remains in auto. The CPU stays at 51% until the code restart. This is quite different from he other log since it wasn't tracing, but seemed to still be running auto.

When the code comes back from the code restart request, the CPU usage is at 70%, but I don't see any indication that the mode function was called. The roroRIO was then rebooted, and when it comes back up the roboRIO is at 70% and the getMode is being called normally.

At this point, there are no messages in the logs that definitively log a crash of a C++ program or Java program. I hope we can improve that. Hopefully this additional info helps you identify anything that may be contributing to this.

Greg McKaskle
The weird thing I noticed in the Philomath log is that the DS stopped sending enabled packets and was sending disabled packets, however clearly the motors were still turning, as thats the only way the battery voltage could be that low. I don't have an explanation for how that happened, as I always thought that if the DS was sending disabled packets that the motor controller outputs would be disabled. They use CAN Talons, so I wonder if something happened there.
__________________
All statements made are my own and not the feelings of any of my affiliated teams.
Teams 1510 and 2898 - Student 2010-2012
Team 4488 - Mentor 2013-2016
Co-developer of RobotDotNet, a .NET port of the WPILib.
Reply With Quote
  #11   Spotlight this post!  
Unread 14-04-2016, 12:54
JesseK's Avatar
JesseK JesseK is offline
Expert Flybot Crasher
FRC #1885 (ILITE)
Team Role: Mentor
 
Join Date: Mar 2007
Rookie Year: 2005
Location: Reston, VA
Posts: 3,624
JesseK has a reputation beyond reputeJesseK has a reputation beyond reputeJesseK has a reputation beyond reputeJesseK has a reputation beyond reputeJesseK has a reputation beyond reputeJesseK has a reputation beyond reputeJesseK has a reputation beyond reputeJesseK has a reputation beyond reputeJesseK has a reputation beyond reputeJesseK has a reputation beyond reputeJesseK has a reputation beyond repute
Re: Robot enters stateless "state" on official FMS

We had these exact symptoms during two consecutive matches at our DC district.

The core issue was that we enabled our custom comms code for the first time in a live match. We attempted a connection to a socket (RIO->D/S), yet didn't change the default timeout on the connection (2 minutes). When the robot attempted to connect during the beginning of auton and didn't connect right away, the connecting thread would block. Depending on where the code that calls the connection lives, this could (and did for us at first) block the entire robot thread. For reference, we use Java.

We never actually solved why the robot wouldn't connect to the D/S through the FMS using our custom comms on the correct ports - we simply moved in a different direction after a hitting a certain level of frustration.
__________________

Drive Coach, 1885 (2007-present)
CAD Library Updated 5/1/16 - 2016 Curie/Carver Industrial Design Winner
GitHub
Reply With Quote
  #12   Spotlight this post!  
Unread 05-08-2016, 00:05
bobbysq bobbysq is offline
Registered User
FRC #4646 (Team ASAP)
Team Role: Leadership
 
Join Date: Apr 2016
Rookie Year: 2015
Location: Des Moines, IA
Posts: 269
bobbysq has much to be proud ofbobbysq has much to be proud ofbobbysq has much to be proud ofbobbysq has much to be proud ofbobbysq has much to be proud ofbobbysq has much to be proud ofbobbysq has much to be proud ofbobbysq has much to be proud ofbobbysq has much to be proud ofbobbysq has much to be proud of
Re: Robot enters stateless "state" on official FMS

Are any of you running vision code off of the RoboRIO? That could be taking resources from the main code.
Reply With Quote
  #13   Spotlight this post!  
Unread 05-08-2016, 16:20
snekiam snekiam is offline
Registered User
FRC #3322 (Eagle Imperium)
Team Role: Programmer
 
Join Date: Dec 2015
Rookie Year: 2010
Location: SE Michigan
Posts: 85
snekiam has a spectacular aura aboutsnekiam has a spectacular aura aboutsnekiam has a spectacular aura about
Re: Robot enters stateless "state" on official FMS

We had a weird problem during a few of our matches where the roborio had 99.99% CPU usage, and we had VERY jerky control of our robot - probably like a second or two of delay. We had no vision code running, but we did have the OpenCV libraries included but NOT in use. We did not experience this problem every match, but it was often enough for it to be a concern. Eventually, the problem just went away and never came back. This was after going through the code line by line (C++ command based BTW) many many times looking for any sort of resource leaks, and found none.
Reply With Quote
Reply


Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 03:43.

The Chief Delphi Forums are sponsored by Innovation First International, Inc.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi