First, some background. When NEOs became legal, we switched some of our motors to NEOs and Spark Max, in order to take advantage of the integrated encoders.
In 2019, it was fine, mostly, but everyone once in a while, our elevator, which was the only Spark Max on the system, would behave a little jittery, or would not respond. There was no obvious issue, to us anyway, but a CSA looked at our code, and our robot, and the driver station, and frowned, and borrowed a screwdriver and played with our wires and said that our wiring wasnāt so good. Tighten down the wiring. It seemed to work.
Fast forward to this year. Weāve got Spark Max/Neo combinations for all four drive wheels, plus our shooter. In two of our competition matches this year, at some point, we lost the left side drive train. We couldnāt move, unless you count spinning in a circle. The drive team checked the joystick during the match. (Tank drive style, two joystick control) Both joysticks are responding normally. After the match, they inspect. Everything mechanical looks normal. The software for this part of the drive is extra simple. Read joystick, apply deadband, set motor speed. Works like a charm all the time. Nothing funky in the software that could produce some occasional error.
Brink robot back to pits. Everything is normal. It works great.
So, we start asking, what happened? Drive team suspects software error. Of course. If you canāt see it, it must be a software error. Speaking as the software mentor, Iām confident it wasnāt a software error. No problem with chains. No problem with wheels or transmissions. Joystick was checked while still on the field. It was responding in the driver station, but the left side wasnāt moving.
Can bus error, maybe?
I check the driver station. Iām not exactly a driver station expert. I check events. No exceptions or funky warnings. Look at CAN bus utilization in the log. I see a fuzzy gray area centered around 30%. No big spikes. I donāt know what Iām looking for, just āsomething oddā. However, I donāt see anything odd.
Itās not something crazy like bad configuration of CAN IDs. Those would cause problems every time. Software logic errors would show up a lot more frequently in simple code like this. It happened twice, out of 28 competition matches, 5 practice matches, and several hours of operation in our practice area. (Besides which, the code for moving the right side wheels is inside the same set of brackets as the left side wheels. Java code. If it wasnāt updating the left side, it wouldnāt update the right side.) No broken chains, and no sign of sticking wheels or anything funky mechanically.
My mind, though, goes back a few years to that CSA who saw something in our logs, frowned, and started fixing wires. What was it that she saw? I really have no idea myself. Iām not sure what information there was available. All I can see is utilization, and weāre fine. What do you suppose she saw? In general, is it common to have intermittent CAN errors, and if so, what would we see in the logs if it were happening?
And, is there any other non-mechanical way of causing two motors to stop responding? In one of the two matches where the phenomenon happened, we did have an abnormally high level of packet loss, but it seems odd that it would affect only one pair of motors, and no other system. Also, in the other match where it happened, we didnāt have high levels of packet loss.
Our pneumatic systems, and the other motors (two right side motors, plus shooter) were working fine.
Any suggestions would be welcome.