![]() |
Robot enters stateless "state" on official FMS
1 Attachment(s)
The last two finals that we've been in this season, our robot has had control issues in one of the matches. If you watched the PNW District Championship in Portland last weekend (shoutout to 360 and the event staff for allowing that replay of the second match), you probably saw the symptoms: The robot started the match off fine, but a few seconds into auto, it stopped moving and didn't start again until close to the end of the match. Looking at the driver station logs, I see that the robot was in auto as it should have been, but then it just wasn't. Not disabled, just stateless. Our driver noted that according to the driver station, we still had comms and the code was still running and enabled. After he rebooted the RoboRIO, everything was fine and we played the last few seconds of the match.
We ran into a similar problem in the Philomath finals. The robot made it through autonomous, but a bit into teleop we lost code and the robot drove itself into the low bar (we suspect that the Talons were just running using their last input). Just like in Portland, the robot code had somehow lost its state. We've been trying to figure out what the problem could be, but we're at a loss. Typically, when the robot code crashes, its handler will restart it fairly quickly; we don't see that happening here, so we don't think it's a run-of-the-mill crash. The logs--from the driver station and our own logging--don't indicate that the program crashed either. We know we didn't get an errant disable signal from the field because we would have seen that state change. Does anyone have any ideas as to what could be happening here? The logs in question are attached. A few notes:
|
Re: Robot enters stateless "state" on official FMS
|
Re: Robot enters stateless "state" on official FMS
Quote:
|
Re: Robot enters stateless "state" on official FMS
Very similar thing happened to us on our final 4 matches at AZW last weekend. Started match fine, during auto the robot stopped, about 50 seconds later control was reestablished.
NI has our log files and is looking at them, but from what the CSA told us, the robo rio said the radio was lost, then when it came back up, the code restarted. |
Re: Robot enters stateless "state" on official FMS
I sent you a message on Facebook, but I might have an idea on what caused the crash at Portland. Still looking into the one at Philomath.
The one at Portland looks like the DriverStation Thread on the RoboRIO locked up. If that happened, comms and robot code would still be reported, however the robot would stop sending requests saying its enabled, and the main loop would not continue to run. I have a guess on where in the WPILib that would be, but would like to see your guy's code to double check. |
Re: Robot enters stateless "state" on official FMS
Quote:
|
Re: Robot enters stateless "state" on official FMS
Quote:
|
Re: Robot enters stateless "state" on official FMS
Thad helped us track down a potential cause of a deadlock in our autonomous code that may have been the issue in Portland, but we're still trying to figure out what led to the initial failure in Philomath. We'd be happy to hear from anyone who comes up with something.
|
Re: Robot enters stateless "state" on official FMS
The trace for auto, teleOp, and Disabled are reported as your code responds to the auto, teleOp, or disabled state of the control packet.
Apparently, as Thad was saying, that thread was no longer making that call. This could be because of logic, sleeps, infinite loops, etc. In the philomath log, the CPU usage drops to background levels when this happens. Your disabled code CPU usage is 50%. Your auto was 67% then ~%51. TeleOp was 66%. Then it drops to 15% when the traces stop. About 10 seconds later, the driver station starts to request code restarts and finally a reboot. The roboRIO and user process come back up and go back to that state of low CPU usage. More code restart requests, and another roboRIO reboot request. When the code finally came back during disabled, the CPU usage looks like it matches the mode. During TeleOp, my guess is that the code was crashing or locked waiting for a mutex that was never released. The Portland logs show the same Disabled - 50%, Auto 61% and then 51%. But when teleOp starts, that doesn't change like the other log file. It appears that the code doesn't call the trace instrumented function, and remains in auto. The CPU stays at 51% until the code restart. This is quite different from he other log since it wasn't tracing, but seemed to still be running auto. When the code comes back from the code restart request, the CPU usage is at 70%, but I don't see any indication that the mode function was called. The roroRIO was then rebooted, and when it comes back up the roboRIO is at 70% and the getMode is being called normally. At this point, there are no messages in the logs that definitively log a crash of a C++ program or Java program. I hope we can improve that. Hopefully this additional info helps you identify anything that may be contributing to this. Greg McKaskle |
Re: Robot enters stateless "state" on official FMS
Quote:
|
Re: Robot enters stateless "state" on official FMS
We had these exact symptoms during two consecutive matches at our DC district.
The core issue was that we enabled our custom comms code for the first time in a live match. We attempted a connection to a socket (RIO->D/S), yet didn't change the default timeout on the connection (2 minutes). When the robot attempted to connect during the beginning of auton and didn't connect right away, the connecting thread would block. Depending on where the code that calls the connection lives, this could (and did for us at first) block the entire robot thread. For reference, we use Java. We never actually solved why the robot wouldn't connect to the D/S through the FMS using our custom comms on the correct ports - we simply moved in a different direction after a hitting a certain level of frustration. |
Re: Robot enters stateless "state" on official FMS
Are any of you running vision code off of the RoboRIO? That could be taking resources from the main code.
|
Re: Robot enters stateless "state" on official FMS
We had a weird problem during a few of our matches where the roborio had 99.99% CPU usage, and we had VERY jerky control of our robot - probably like a second or two of delay. We had no vision code running, but we did have the OpenCV libraries included but NOT in use. We did not experience this problem every match, but it was often enough for it to be a concern. Eventually, the problem just went away and never came back. This was after going through the code line by line (C++ command based BTW) many many times looking for any sort of resource leaks, and found none.
|
| All times are GMT -5. The time now is 21:59. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi