TLDR: REVLib 2024.2.2 makes every call to (the zero-argument version of) getEncoder() on a SPARK Flex liable to crash your program!
Workaround:
0. If you don’t use velocity feedback on SPARK Flex, I would advise that you do not update to REVLib 2024.2.2. For everyone else:
If you trust yourselves to use the correct motor controller class for your hardware, then replace your usage of CANSparkFlex.getEncoder() with .getEncoder(SparkRelativeEncoder.Type.kQuadrature, 7168).
As a safeguard against other unknown issues caused by the same bug, replace CANSparkMax.getEncoder() with .getEncoder(SparkRelativeEncoder.Type.kHallSensor, 42).
Even with the above replacements, call them only once and retain the returned encoder object. This is REVLib’s nearly undocumented usage assumption.
The below was what I emailed to REV’s support line last night:
Tonight we updated to REVLib 2024.2.2 for the SPARK Flex velocity filter changes, and identified an issue with the library that causes fatal crashes if the message to check SPARK model fails. In short (as far as I can tell):
CANSparkBase.getEncoder() now calls getSparkModel() to determine whether to set up the main encoder as a hall sensor or the Spark Flex high-res quadrature encoder. This is a JNI method that makes a CAN message to query the device for its model type on every call.
If getSparkModel() fails due to CAN bus issues or other causes, it seems to default to Spark MAX while printing an error message. In this failure case the encoder for a Flex would be setup in code as a hall sensor.
If a user then calls getEncoder() again and the model check succeeds, the encoder will be set up as a quadrature, causing a fatal exception.
The opposite situation can also cause a crash, where the model check succeeds at first and later fails, because now the model check is being done on every getEncoder() call. (This is a separate performance/CAN utilization issue, but less critical).
I understand calling getEncoder() every time it’s needed is not intended usage, but until this update it was “safe” to do so, and I have seen many teams, including my own, doing so. The issue in 2024.2.2 can crash the program of any team using Flexes and calling their getEncoder() multiple times, and a failed model check on any of those calls will trigger it.
This is baffling. I can’t really understand why the RIO code should have to query over CAN to figure out what sort of device it’s controlling, much less why this is done dynamically at runtime. That’s a pretty serious code smell.
My guess would be that it’s trying to find what motor is connected due to the spark flex being able to control an original neo (when the hardware is released)
Okay slight amendment: getSparkModel() doesn’t default to SparkModel.SparkFlex, it defaults to SparkModel.Unknown, which doesn’t matter for much given the relevant code
getSparkModel() does not appear to have anything to do with the motor being attached, and this code will be even more broken when the flex can control a neo.
Do you happen to have a stack trace or other indicators of this crash? …asking as a CSA so if I see this at events this week I can help identify this quicker.
We (8513) have been seeing this issue, but were unable to diagnose. It always coincided with other CAN issues, and when they went away the code stopped crashing. Before we rollback to 2024.2.1 we will induce a can error to force a crash to get a stacktrace for you.
The “root” error message will be “This encoder has already been configured as a Quadrature with countsPerRev set to 7168!” Or “as a Hall sensor!”. A few layers into the stack trace will tell you which part of robot code is calling the getEncoder().
Off topic, but what would be nice would be, from REV, a list of API calls that generate CAN traffic. I would have never dreamed a call to .getEncoder would generate CAN traffic.
I had a periodic loop that made that call several times in a single iteration of the loop.
Vortex motors not running when being commanded. The API appears to put the motor in a weird state that makes it not able to run. I can do a “complete factory reset” on a Flex/Vortex, then it runs correctly, but starting the code breaks it again. It appears to not commutate correctly at first glance.
We didn’t change anything in our code, only change is the REV vendor library.