I’ve deployed the latest version of WPILIB 2023.4.1 and when trying to run SysID, i lose comms with the robot and can’t ever seem to get it back without doing a hard reset and reimage. here are the logs from the console.
********** Robot program startup complete **********
Robot disabled
Collected: 0 data points.
navX-Sensor Connected.
navX-Sensor Board Type 50 (navX-MXP (Classic))
navX-Sensor firmware version 3.1
navX-Sensor startup initialization and startup calibration complete.
navX-Sensor onboard startup calibration complete.
navX-Sensor Yaw angle auto-reset to 0.0 due to startup calibration.
NT: Got a NT4 connection from 10.55.72.167 port 59497
NT: CONNECTED NT4 client 'shuffleboard@1' (from 10.55.72.167:59497)
NT: Got a NT4 connection from 10.55.72.167 port 59498
NT: CONNECTED NT4 client 'sysid@2' (from 10.55.72.167:59498)
Warning 0 [phoenix] Library initialization is complete.
Warning 0 [phoenix-diagnostics] Server 2023.0.1 (Jan 26 2023, 19:27:56) running on port: 1250
Robot disabled
Collected: 18108 data points.
Warning 16 Warning: Watchdog not fed within 0.005000s Main
Warning at Main: Warning: Watchdog not fed within 0.005000s
at frc::ReportErrorV(int, char const*, int, char const*, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::appender, char> >) + 0x178 [0xb6abd214]
at frc::Watchdog::Impl::Main() + 0x274 [0xb6adee14]
at + 0xae2b8 [0xb61692b8]
ERROR -111 Error: Loop time of 0.005000s overrun PrintLoopOverrunMessage
Error at PrintLoopOverrunMessage: Error: Loop time of 0.005000s overrun
at frc::ReportErrorV(int, char const*, int, char const*, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::appender, char> >) + 0x178 [0xb6abd214]
at frc::IterativeRobotBase::PrintLoopOverrunMessage() + 0x70 [0xb6aa19ec]
at frc::Watchdog::Impl::Main() + 0x94 [0xb6adec34]
at + 0xae2b8 [0xb61692b8]
Warning 16 Warning: LiveWindow::UpdateValues(): 0.000007s
RobotPeriodic(): 0.000009s
SmartDashboard::UpdateValues(): 0.000008s
DisabledPeriodic(): 0.000306s
DisabledInit(): 0.057491s
Shuffleboard::Update(): 0.000008s
PrintEpochs
Warning at PrintEpochs: Warning: LiveWindow::UpdateValues(): 0.000007s
RobotPeriodic(): 0.000009s
SmartDashboard::UpdateValues(): 0.000008s
DisabledPeriodic(): 0.000306s
DisabledInit(): 0.057491s
Shuffleboard::Update(): 0.000008s
at frc::ReportErrorV(int, char const*, int, char const*, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::appender, char> >) + 0x178 [0xb6abd214]
at frc::Tracer::PrintEpochs() + 0x130 [0xb6a768e4]
at frc::IterativeRobotBase::LoopFunc() + 0x1ec [0xb6aa1440]
at frc::TimedRobot::StartCompetition() + 0x154 [0xb6ac1dbc]
at void frc::impl::RunRobot<DriveRobot>(wpi::priority_mutex&, DriveRobot**) + 0x54 [0x30830]
at int frc::StartRobot<DriveRobot>() + 0x1b0 [0x30b40]
at __libc_start_main + 0x10c [0xb5ec3624]
I don’t seen anything out of the ordinary in the log you posted. I’m not sure what you mean by doing a hard reset, can you explain exactly what you do? What happens if you try to deploy your robot project? Can you post the log from that?
The deployment worked just fine. It was when we tried to run the SysID that after like to attempts, the rio would lose comms and we could never connect again. I would end up having to hold the reset button until it went into recovery mode and reflash it. We ended up figuring out the issue was due to selecting the NavX MXP connected via I2C when we meant to select Serial Port.
I was running into another issue with SysID where it kept complaining that the system we were trying to characterize didn’t match the mechanism that we had selected. I wish i had copied the full message, but we we’re trying to characterize a dropdown intake (an arm) that had 2 motors and using their own encoders. But didn’t have any luck. We ended up just manually characterizing our systems
Yesterday we were trying to use SysID to characterize our drive train, and at first it was going in circles on the first test rather than driving straight, so we inverted the right motors and…
Well, it didn’t drive in circles, but that’s because it didn’t drive at all. The compressor kicked on though (through the REV hub on an analog pressure sensor). After playing for a bit, those were the only two results we could get until we pulled the breaker to the pneumatics hub.
That fixed the issue and we were able to run all four tests.
This confused everyone involved, because SysID isn’t supposed to write the slightest bit of code that would turn on the compressor, and that the same code would run the drive motors as long as the PHub didn’t have power, but wouldn’t if it did?
This sounds like a cursed vendor library interaction of some sort.
I think we’re seeing that the one-binary-fits-all approach of SysId is becoming untenable. The vendor interactions have been a pain since the frc-characterization days and aren’t going to get better…
I’m also finding SysID to be incredibly unstable this year. To the point that it’s barely usable.
A few weeks ago we tried to use it for a simple mechanism with two falcon motors. After 30-minutes of it telling us we were characterizing the wrong mechanism type, we switched to a different computer. It worked the first time in the other computer.
Last week we used it on a new iteration of the same mechanism. We found that if we tried to run the same test more than once, SysID would insist we were characterizing the wrong mechanism type and would run the motors. Closing SysID, and reopening it worked around this.
On Friday I attempted to characterize an arm using a single Neo, with a through-bore encoder (connected via an Absolute Encoder Adapter) and I never got it working. There are two encoder options for characterizing, and neither worked. There was an option for “data port”, which is where the through-bore encoder is attached. Selecting this option and deploying caused Driver Station to report “No robot code”, and log nothing at all. There was an option for “encoder port”, which is where the Neo’s built-in encoder goes. When I selected this it either said the motor didn’t turn, or it turned 1 rotation. After nearly an hour I gave up.
If it’s happening in sysid, it’s likely only a matter of time before it shows up in normal robot programs that use both vendor libraries. Sysid’s robot code is just a (complex, run-time configured) C++ robot program. We may be seeing the future of mixed vendor robot programs in a few years unless something changes.
The struggle is still very real with 2023.4.1. Tonight I characterized our swerve drive and ran into several problems:
Our first run of tests were successful, but the analysis gave us an error:
Multiple times we ran into the issue where SysID said the robot reported the wrong project type. Closing and re-opening SysID worked around this, but I ran into it >5 times before I was finished.
One time everything was right but the robot just didn’t move. I stopped the test and the popup said the robot reported moving 0 meters.
Loading a project that has a Motor CANivore Name value does not repopulate the value. I had entered “swerve”, but loading the config populates the field with “rio”. This wasn’t a big deal once I figured it out, but I had to re-enter it every time after closing a re-opening SysID to work around the “wrong project type” bug. It took a good 30-minutes to crack the first time. I’ll open a bug for this because it’s reproducible.
After a few runs, we finally got the analysis error to go away by changing the Samples Per Average and Time Measurement Window. We increased the Samples Per Average to 8 (the tooltip recommends a value between 5-10, and the only option between 5-10 and 8 We set the Time Measurement Window to 5.
I looked into the code that deals with project type and have no idea where the problem would be. We didn’t really change it from last year.
I’ve seen this with SysId 2021 as well. The only thing I can think of is NT transmit issues.
Please do.
I’d have to see the analysis JSONs before and after to diagnose this.
We’ve considered replacing the SysId project generation and C++ robot binary with users logging their own data in Java, but the quality was really bad. We’ve discussed implementing DMA and CANController Area Network streams to make the sampling more deterministic.
It’s still concerning though, because the SysId binary is just a C++ binary with dynamic object construction for all the vendors. The connection issues we’ve seen would theoretically affect any C++ team using both CTRECross The Road Electronics and REV in an RT context.