SysID borks RoboRIO

I’ve deployed the latest version of WPILIB 2023.4.1 and when trying to run SysID, i lose comms with the robot and can’t ever seem to get it back without doing a hard reset and reimage. here are the logs from the console.

 ********** Robot program startup complete ********** 
 Robot disabled 
 Collected: 0 data points. 
 navX-Sensor Connected. 
 navX-Sensor Board Type 50 (navX-MXP (Classic)) 
 navX-Sensor firmware version 3.1 
 navX-Sensor startup initialization and startup calibration complete. 
 navX-Sensor onboard startup calibration complete. 
 navX-Sensor Yaw angle auto-reset to 0.0 due to startup calibration. 
 NT: Got a NT4 connection from 10.55.72.167 port 59497 
 NT: CONNECTED NT4 client 'shuffleboard@1' (from 10.55.72.167:59497) 
 NT: Got a NT4 connection from 10.55.72.167 port 59498 
 NT: CONNECTED NT4 client 'sysid@2' (from 10.55.72.167:59498) 
Warning  0  [phoenix] Library initialization is complete.   
Warning  0  [phoenix-diagnostics] Server 2023.0.1 (Jan 26 2023, 19:27:56) running on port: 1250   
 Robot disabled 
 Collected: 18108 data points. 
Warning  16  Warning: Watchdog not fed within 0.005000s  Main 
 Warning at Main: Warning: Watchdog not fed within 0.005000s 
 	at frc::ReportErrorV(int, char const*, int, char const*, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::appender, char> >) + 0x178 [0xb6abd214] 
 	at frc::Watchdog::Impl::Main() + 0x274 [0xb6adee14] 
 	at  + 0xae2b8 [0xb61692b8] 
  
ERROR  -111  Error: Loop time of 0.005000s overrun  PrintLoopOverrunMessage 
 Error at PrintLoopOverrunMessage: Error: Loop time of 0.005000s overrun 
 	at frc::ReportErrorV(int, char const*, int, char const*, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::appender, char> >) + 0x178 [0xb6abd214] 
 	at frc::IterativeRobotBase::PrintLoopOverrunMessage() + 0x70 [0xb6aa19ec] 
 	at frc::Watchdog::Impl::Main() + 0x94 [0xb6adec34] 
 	at  + 0xae2b8 [0xb61692b8] 
  
Warning  16  Warning: 	LiveWindow::UpdateValues(): 0.000007s
	RobotPeriodic(): 0.000009s
	SmartDashboard::UpdateValues(): 0.000008s
	DisabledPeriodic(): 0.000306s
	DisabledInit(): 0.057491s
	Shuffleboard::Update(): 0.000008s
  PrintEpochs 
 Warning at PrintEpochs: Warning: 	LiveWindow::UpdateValues(): 0.000007s 
 	RobotPeriodic(): 0.000009s 
 	SmartDashboard::UpdateValues(): 0.000008s 
 	DisabledPeriodic(): 0.000306s 
 	DisabledInit(): 0.057491s 
 	Shuffleboard::Update(): 0.000008s 
  
 	at frc::ReportErrorV(int, char const*, int, char const*, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::appender, char> >) + 0x178 [0xb6abd214] 
 	at frc::Tracer::PrintEpochs() + 0x130 [0xb6a768e4] 
 	at frc::IterativeRobotBase::LoopFunc() + 0x1ec [0xb6aa1440] 
 	at frc::TimedRobot::StartCompetition() + 0x154 [0xb6ac1dbc] 
 	at void frc::impl::RunRobot<DriveRobot>(wpi::priority_mutex&, DriveRobot**) + 0x54 [0x30830] 
 	at int frc::StartRobot<DriveRobot>() + 0x1b0 [0x30b40] 
 	at __libc_start_main + 0x10c [0xb5ec3624] 

I don’t seen anything out of the ordinary in the log you posted. I’m not sure what you mean by doing a hard reset, can you explain exactly what you do? What happens if you try to deploy your robot project? Can you post the log from that?

What is your SysID configuration?

The deployment worked just fine. It was when we tried to run the SysID that after like to attempts, the rio would lose comms and we could never connect again. I would end up having to hold the reset button until it went into recovery mode and reflash it. We ended up figuring out the issue was due to selecting the NavX MXP connected via I2C when we meant to select Serial Port.

I was running into another issue with SysID where it kept complaining that the system we were trying to characterize didn’t match the mechanism that we had selected. I wish i had copied the full message, but we we’re trying to characterize a dropdown intake (an arm) that had 2 motors and using their own encoders. But didn’t have any luck. We ended up just manually characterizing our systems

1 Like

Ran into a similar issue - are you running the latest version of SysID? I think we fixed it by just updating our SysID.

1 Like

running 2023.4.1 as of this morning and had the issue practically every time we were trying to control 2 motors for the arm.

Any chance you could post the configuration for the motors/encoders?

1 Like

I don’t have it off hand, i dumped all the config files once we started doing the characterization manually

1 Like

Yesterday we were trying to use SysID to characterize our drive train, and at first it was going in circles on the first test rather than driving straight, so we inverted the right motors and…

Well, it didn’t drive in circles, but that’s because it didn’t drive at all. The compressor kicked on though (through the REV hub on an analog pressure sensor). After playing for a bit, those were the only two results we could get until we pulled the breaker to the pneumatics hub.

That fixed the issue and we were able to run all four tests.

This confused everyone involved, because SysID isn’t supposed to write the slightest bit of code that would turn on the compressor, and that the same code would run the drive motors as long as the PHub didn’t have power, but wouldn’t if it did?

Something there is seriously awry.

This sounds like a cursed vendor library interaction of some sort.

I think we’re seeing that the one-binary-fits-all approach of SysId is becoming untenable. The vendor interactions have been a pain since the frc-characterization days and aren’t going to get better…

1 Like

I’m also finding SysID to be incredibly unstable this year. To the point that it’s barely usable.

  • A few weeks ago we tried to use it for a simple mechanism with two falcon motors. After 30-minutes of it telling us we were characterizing the wrong mechanism type, we switched to a different computer. It worked the first time in the other computer.
  • Last week we used it on a new iteration of the same mechanism. We found that if we tried to run the same test more than once, SysID would insist we were characterizing the wrong mechanism type and would run the motors. Closing SysID, and reopening it worked around this.
  • On Friday I attempted to characterize an arm using a single Neo, with a through-bore encoder (connected via an Absolute Encoder Adapter) and I never got it working. There are two encoder options for characterizing, and neither worked. There was an option for “data port”, which is where the through-bore encoder is attached. Selecting this option and deploying caused Driver Station to report “No robot code”, and log nothing at all. There was an option for “encoder port”, which is where the Neo’s built-in encoder goes. When I selected this it either said the motor didn’t turn, or it turned 1 rotation. After nearly an hour I gave up.

If it’s happening in sysid, it’s likely only a matter of time before it shows up in normal robot programs that use both vendor libraries. Sysid’s robot code is just a (complex, run-time configured) C++ robot program. We may be seeing the future of mixed vendor robot programs in a few years unless something changes.

1 Like

This is exactly what we were seeing too

If you haven’t yet, could you please try sysid 2023.4.1 (released Saturday) to see if these issues still persist?

That’s what we were using, by the way.

1 Like

The struggle is still very real with 2023.4.1. Tonight I characterized our swerve drive and ran into several problems:

  • Our first run of tests were successful, but the analysis gave us an error:
    image
  • Multiple times we ran into the issue where SysID said the robot reported the wrong project type. Closing and re-opening SysID worked around this, but I ran into it >5 times before I was finished.
  • One time everything was right but the robot just didn’t move. I stopped the test and the popup said the robot reported moving 0 meters.
  • Loading a project that has a Motor CANivore Name value does not repopulate the value. I had entered “swerve”, but loading the config populates the field with “rio”. This wasn’t a big deal once I figured it out, but I had to re-enter it every time after closing a re-opening SysID to work around the “wrong project type” bug. It took a good 30-minutes to crack the first time. I’ll open a bug for this because it’s reproducible.
  • After a few runs, we finally got the analysis error to go away by changing the Samples Per Average and Time Measurement Window. We increased the Samples Per Average to 8 (the tooltip recommends a value between 5-10, and the only option between 5-10 and 8 :thinking: We set the Time Measurement Window to 5.

I looked into the code that deals with project type and have no idea where the problem would be. We didn’t really change it from last year.

I’ve seen this with SysId 2021 as well. The only thing I can think of is NT transmit issues.

Please do.

I’d have to see the analysis JSONs before and after to diagnose this.

We’ve considered replacing the SysId project generation and C++ robot binary with users logging their own data in Java, but the quality was really bad. We’ve discussed implementing DMA and CAN streams to make the sampling more deterministic.

It’s still concerning though, because the SysId binary is just a C++ binary with dynamic object construction for all the vendors. The connection issues we’ve seen would theoretically affect any C++ team using both CTRE and REV in an RT context.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.