First, some background. When NEOs became legal, we switched some of our motors to NEOs and Spark Max, in order to take advantage of the integrated encoders.
In 2019, it was fine, mostly, but everyone once in a while, our elevator, which was the only Spark Max on the system, would behave a little jittery, or would not respond. There was no obvious issue, to us anyway, but a CSA looked at our code, and our robot, and the driver station, and frowned, and borrowed a screwdriver and played with our wires and said that our wiring wasn’t so good. Tighten down the wiring. It seemed to work.
Fast forward to this year. We’ve got Spark Max/Neo combinations for all four drive wheels, plus our shooter. In two of our competition matches this year, at some point, we lost the left side drive train. We couldn’t move, unless you count spinning in a circle. The drive team checked the joystick during the match. (Tank drive style, two joystick control) Both joysticks are responding normally. After the match, they inspect. Everything mechanical looks normal. The software for this part of the drive is extra simple. Read joystick, apply deadband, set motor speed. Works like a charm all the time. Nothing funky in the software that could produce some occasional error.
Brink robot back to pits. Everything is normal. It works great.
So, we start asking, what happened? Drive team suspects software error. Of course. If you can’t see it, it must be a software error. Speaking as the software mentor, I’m confident it wasn’t a software error. No problem with chains. No problem with wheels or transmissions. Joystick was checked while still on the field. It was responding in the driver station, but the left side wasn’t moving.
Can bus error, maybe?
I check the driver station. I’m not exactly a driver station expert. I check events. No exceptions or funky warnings. Look at CAN bus utilization in the log. I see a fuzzy gray area centered around 30%. No big spikes. I don’t know what I’m looking for, just “something odd”. However, I don’t see anything odd.
It’s not something crazy like bad configuration of CAN IDs. Those would cause problems every time. Software logic errors would show up a lot more frequently in simple code like this. It happened twice, out of 28 competition matches, 5 practice matches, and several hours of operation in our practice area. (Besides which, the code for moving the right side wheels is inside the same set of brackets as the left side wheels. Java code. If it wasn’t updating the left side, it wouldn’t update the right side.) No broken chains, and no sign of sticking wheels or anything funky mechanically.
My mind, though, goes back a few years to that CSA who saw something in our logs, frowned, and started fixing wires. What was it that she saw? I really have no idea myself. I’m not sure what information there was available. All I can see is utilization, and we’re fine. What do you suppose she saw? In general, is it common to have intermittent CAN errors, and if so, what would we see in the logs if it were happening?
And, is there any other non-mechanical way of causing two motors to stop responding? In one of the two matches where the phenomenon happened, we did have an abnormally high level of packet loss, but it seems odd that it would affect only one pair of motors, and no other system. Also, in the other match where it happened, we didn’t have high levels of packet loss.
Our pneumatic systems, and the other motors (two right side motors, plus shooter) were working fine.
Any suggestions would be welcome.