On today’s FIRST Updates Now stream, there was discussion about desiring an uptime counter on the roboRIO, with the example given of a roboRIO losing communication being identified by field staff as the roboRIO rebooting, without it actually rebooting.
While wpilib doesn’t expose an uptime counter in the roboRIO, the driver station log will show you the time since reboot when the FRC_Netcomm task loads. This can be seen on the following page on frc-docs:
Using that, and the code starting notification, you can determine if the roboRIO rebooted, the code restarted without a roboRIO reboot, or if communication was lost for another reason (like a bad Ethernet cable).
There was also discussions about the FMS log. The DS log actually logs much more then the FMS log, so as a CSA my first choice is to look at the DS log. The only things (that I remember) that are in the FMS log that aren’t in the DS log are the bandwidth and trip times.
It gives you the system uptime since the most recent boot (cat /proc/uptime also works), just like on any Linux system. However, I don’t think this information is currently collected by FRC_NetComm.
Yeah, this was the crux of the issue we encountered with this problem. We could clearly tell that the system had not rebooted even though we kept being told “you lost power” because of how the FTAs/CSAs were checking things.
I’m glad to know this is now logged in the DS and it isn’t just reliant on teams to log it.
It would still be handy to have this without having to do the mental math on it. Appreciate the insight though @Joe_Ross.
Good to know - heat of the moment, no one seemed to have recognized this information was available or how to use it. I’m glad it exists - I wish it were a bit more obvious but knowing it is there is a solid start.
We ended up tracking our issue down to several code problems with libraries and starving a process but the side trips into the land of looking for loose cables took time away from diagnosing that.
One thing I would love to see someone do a video lecture on is how to use the DS Log to diagnose problems for yourself. So many teams I meet at competition either don’t know that the DS Log exists or they only know how to use the most basic features. I try to teach the team members what I’m doing as I going through their logs, but it’s hard in the heat of the moment when everyone wants the problem found and fixed ASAP.
The video can explain things like:
how to differentiate between a roboRIO reboot, radio reboot, and code crash
how to look at bus voltage and current draws on different ports
how to tell if your robot is browning out
what trip time and package loss are and why they’re important
how to read log events
If it can also explain the first places to look to fix each of those problems, that would make it even more helpful. I would make the video myself but I don’t have access to good logs to use.
I think videos would be very helpful. Maybe @Tyler_Olds should invite @Joe_Ross on to a future FUN segment to do something like this live? I would be happy to help host it.
I think the DS logging needs a bit of an overhaul. I wish it were openly available for editing to change up the UI or exposed via an API. It seems like something the great community could improve. It’s possible it is and I’m just unaware but I believe it’s closed source.
As an example, this is one of the metric indicators that I use regularly:
Lots of great information exposed in some simple graphs. I can also flip back to the “logs” tab to get all of the logs. The DS logging interface leaves something to be desired at times, for me at least.
Lessons on the tools would be excellent. Also, (and IMO the bigger problem I run into), teaching folks how to figure out which tool to use, and when… Teach the basics of debugging in general, I guess.
Basic steps like looking at how information flows, which things can cause other things, testing hypothesis, and determining “next step”. How to avoid “shotgun” strategy. That kind of thing…
+1 to this - teams frequently are unaware of how to use the DS log to understand what happened to their robot during a match, and look amazed when I show them the kind of info they can find when they ask for help diagnosing an issue.
I’m all in favor of raising the bar of troubleshooting knowledge among the average team to reduce confusion and make it easier to quickly understand what happened on the field.
Sorry, can’t give you the names of some stuff but if there is interest in vRealize Orchestrator then you’re welcome to head over to https://hol.vmware.com and take labs with it (for free!), some written and edited by myself. You can also sign up for VMworld, which is virtual and also free this year and there will be even more labs soon.
Adding uptime to DS logs is a good idea – should be a nice linear slope. Occasionally, I see bugs that trip when something has been running for a long time and this could be a good clue there also.
+1 to anything that makes teams more aware of debugging techniques, especially the DS.