A new behavior from our robot program. Every few deploys of the code, on the first access to a CAN device (reset sticky faults on PDP), we see the error: CAN: Not initialized. After that, as one would expect, CAN devices don’t function and we get a lot of CAN errors logged. We re-deploy and it works fine. Some deployments after than and the error happens again.
We got a similar thing where it couldn’t find the Spark Maxes. Rebooting the RIO fixed it.
Deploying again fixes it as well as rebooting…but it comes back. It is concerning as I would not want that to happen during start up for a match, though I have not observed that yet (occurring right after a boot). Always seems to occur after a series of deployments but that is not definitive.
We’re having the same issue here. We’ve retraced our CAN wiring (thought maybe there was a loose wire somewhere) and made sure everything was good. Also made sure everything was up-to-date. Same deal though: CAN fails to initialize every once in a while and we get tons of errors on CAN ID versions. Reboot/redeploy, and suddenly everything is fine. Do several more runs, eventually get the same error, redeploy, repeat.
Interested to see what the solution is. Using Victor SPX’s, if that helps narrow down the issue.
It sounds like you have a loose CAN connection somewhere. Do a pull test at each connection point and see if the status LEDs on your CAN devices changes. If that happens, fix the connection and then power cycle your robot and repeat this on all the other connection points until you don’t see this behavior anymore.
normally if you have a break in CAN network, you can use Phoenix Tuner to look at the list of devices it can see and determine where the break is. When this CAN not initialized error happens, Phoenix Tuner connects but can’t load ANY devices. This suggests to me it is not a simple wiring problem. And we have seen this while the robot is motionless, so little vibration no G forces. And to the extent we can reach all the CAN connections, we have done the pull test. Found nothing so far.
Our team is having the same problem and we either restart robot code in the FRC Driver Station or redeploy code in order to get around the problem.
Everything looks fine in phoenix tuner and the issue only occurs sometimes after deploying code.
I want to really call attention to this problem. We are still having it, every few deploys (maybe every 5 or 6 ish) the CAN Not Initialized error occurs and the robot is essentially dead as we do most things through CAN. A redeploy or reset cures the problem until it happens again. This is a pain right now but during an actual season with events and matches, this would be a REAL problem. While I can’t 100% rule out wiring, we have checked to the best of our ability and that does not appear to be the cause. The robot runs fine. We park, change the code, deploy and bang, CAN fails. Deploy again and off we go.
We need some help on this. It would be useful to know what conditions, beyond wiring might lead to this. Also, any team that has seen this problem, please post so we can try to get an idea how wide spread this issue really is.
CAN also confirm (pun intended) exact same symptoms and frequency, mostly with spark maxes. We also don’t see any of our CTRE controllers in Phoenix Tuner ever, which might be related. Every controller works until the fourth to sixth redeploy.
Still have the issue here, too. Next time it happens, I’ll save the DS log to see if that helps.
To help narrow things down, we have Victor SPX’s and Falcons in our CAN. Verified everything is up-to-date several times and have had a couple of people trace the wiring. Redeploying always fixes it.
We have the same issue. We’ve double checked wiring and have the same symptoms. CAN not initialized/stale CAN frame errors in the DS and no devices show up in Phoenix Tuner with a reboot fixing it. It’s never been an issue for us until this year and the only changes we’ve made this year are updating software & installing 4 Falcon 500s.
TL;DR This issue is caused only by deploy’s, and usually by a team having a misbehaving thread. This is very unlikely to happen during initial bootup on a field.
Long Version:
On the RoboRIO, Only 1 process can have access to the CAN bus hardware. Prior to 2020, the NetCommDaemon was this process, and all CAN accesses went through the NetComm daemon. This was the reason CAN accesses were slow. For 2020, this was changed so that the robot process would directly access the CAN Hardware. This is fantastic for performance. However, this does create a problem if a robot program does not shut down correctly. If this happens, the next instance of the user program that gets started will not be able to access the CAN bus, resulting in this error.
During deploy, we attempt to kill any old process, however this doesn’t always work correctly, especially with misbehaving user programs that don’t handle thread interrupts correctly. If this happens, the old program will still be running when we start the new process, which can cause this issue.
During a fresh boot, you’re guaranteed to not have an old process, which makes this issue basically not an issue.
That error message is entirely related to the CAN interface on the roborio, and has nothing to do with any external devices connected to the bus. A roborio reboot should always fix the issue. This is also why Phoenix Tuner reports no devices. Its not just that it can’t find any devices, it in fact can’t even access the can hardware on the rio.
If you’re noticing this a lot, make sure you’re handling ThreadInterruptedExceptions correctly. If you’re just swallowing them, things will break in this case.
What’s the correct way?
When this fails, usually InterruptedException is just swallowed. Instead of swallowing it, make sure to actually make your thread exit.
Now that you mention it, it is always deploying onto a RIO running code when this happens. We don’t know any other way to do it, nor do we create our own Threads. What might be causing a misbehaving thread?
Are you using any vendor libraries besides the Spark Max?
Rev, Phoenix, navX and wpili New commands.
There could be an errant thread in any of those.
Any remaining thread will hang on to the CAN bus, not just ones that interface directly with can controllers right?
It’s any remaining thread that keeps the robot program alive, whether they interface with CAN or not.