We are having a problem with our roboRIO losing communication with the Driver Station. We’ll be using our robot when suddenly we will lose communication. Resetting the roboRIO with the RESET button resolves the problem. When this happens, Driver Station reports No Robot Communication.
To narrow down the problem and eliminate networking as the culprit, when this occurs we have tried connecting our laptop directly to the roboRIO with USB and ethernet, and DS is still unable to connect. Additionally, when we lose connectivity to the roboRIO we can still communicate with our Limelights over wifi so the network and radio appear to be working.
We found that we can get this hang to happen pretty consistently if we add System.out.println() statements inside of an execute block (which we shouldn’t do anyway), but this isn’t the only culprit. We also did more with Shuffleboard this year to build a dashboard with our subsystems and commands. When we commented out the dashboard code, we weren’t able to reproduce the problem.
Is anyone else having trouble with their roboRIO hanging/losing communication? I’m surprised that writing some info to the NetworkTables for Shuffleboard would be making this problem worse, but it sure seems to be.
We had this problem happen at an event when our mxp circuit (which we made ourselves) shorted out. So, I wonder if any short could cause it. I would start by using sample code that you know works (perhaps ensure it works in sim first), and see if in fact it is the problem, or something else. If the sample software works, then it is in code. If not, it is either the Rio, or the wiring. Check the wiring carefully making sure everything looks good. If you are using CAN, a reversed Can wire may be able to cause this. Then, if you still did not find the issue, I would eliminate as much wiring as possible (easy to do if your motors are CAN driven, much more difficult if you are using PWM), and add devices piece by piece.
We just went through this with our shooter yesterday (with only one motor). It was working fine until it needed to literally be jump started to work. Luckily, we found the culprit (a bad Anderson connection that would not mate correctly) before tearing the entire shooter assembly apart (though we came close). It was an excellent learning experience, and I am thankful we figured it out the day we made the cable rather than at an event.
It seems unlikely to be a wiring or roboRIO problem, but of course we cannot rule it out. We started to see this problem with our robot with the electronics on a test board before the final wiring was done. We tried swapping out the roboRIO at that time. Between then and now, we also redid all of the wiring to bring the components over from the temporary board to our final robot.
It’s hard to completely rule things out because we haven’t been able to reproduce the condition on demand. At this point, signs point to the code using Shuffleboard on the roboRIO.
That is annoying and also points to a non-wiring issue. When it happened to us, it was relatively quick to reboot. It also is odd that it sounds like it happened on multiple Rios.
I suspect there is a timing issue / race condition in the code and that the addition of print and shuffleboard activity is changing the timing, making a condition that is there anyway, more likely to occur.
Are there multiple threads running? If you have just one background thread (perhaps running a vision pipeline or some PID control) could it be deadlocking with the main thread?
If just a single thread, perhaps there is an unlikely condition the can get it stuck in an infinite loop.
Can you post a link to your code in GitHub or elsewhere?
I know you could do what I am about to suggest in the old Eclipse environment and I think you can in VSCode, but try deploying for debug, attaching a debugger with no breakpoints and then drive it until it fails. In Eclipse, you could suspend any thread at any time from the debugger. If you can do that in VSCode, suspend the main thread when it hangs and you should be able to see exactly where it is getting stuck.
Does the code that you posted have the necessary print statements / shuffleboard statements to cause this? How long does it take? I tried running your code for a few minutes and didn’t see an issue, but don’t have all the same hardware as your robot.
We are not certain, but the likely cause was the REV V3 Color Sensor, and making calls to getColor(). We see these calls overrun the 20ms loop, and our dashboard code was calling it multiple times per iteration to put color information on Shuffleboard.
Beware! If you use the REV V3 color sensor from the KOP, considering calling it on a background thread! See Strange Loop Execution Time Jump for 5172’s experience.