Safety issue with Driver Station cRIO mode

Today we were testing a cRIO based demo robot, using the 2015 driver station software in 2014 protocol mode to control it, when we ran into a major issue that I believe is due to a bug in the DS.

We were having communication issues, which were mainly due to interference from the many networks (>15) in the area. There were times where we lost communication for short periods (~250ms) every couple of seconds, as evidenced by the flickering of the robot code (but not communications, oddly) indicator. Every once in a while, the DS would freeze after this happened (I could not switch tabs or disable/enable the robot), but the voltage readout would continue to be updated. The only way I could fix it was by waiting a few minutes or forcibly killing the DS process.

This wasn’t that big a problem until one time when I was driving the robot away from the router and it did not stop when I took my hand off the joystick. The robot did not respond to disable or emergency stop and was eventually stopped by lifting it up and shutting off power.

I was not watching exactly what was happening on the DS, because I was a little preoccupied, but my hypothesis is that this loss of control was not caused by full communication loss, but by the DS freezing and no longer responding to joystick inputs.

While I was driving, the communication probably dropped out a few times (for nearly unnoticeable amounts of time), causing the DS to freeze. Because the voltage still updates when the rest of the DS freezes, the communication seems to be running in a separate thread, so the robot was likely still receiving updates from the DS, but the joystick values were not being updated. I don’t believe the loss of control was caused by full communication loss, because in that case the robot should stop, as it always has in the past when we left the range of our wifi network.

Obviously, this is mostly speculation, since I have no idea how the DS works internally, but there it seems almost certain that there is some problem with the DS because it freezes, which should not happen.

Can you verify the version the DS, and can you post the log files, or contact me and I’ll give you contact info so that we can determine what may be causing this.

Greg McKaskle

The DS version is 09021500. I attached the logs from that run. (31.5 KB) (31.5 KB)

I attached an image of the log file.

Since the laptop CPU usage is quite low, I think you do truly have pretty bad wifi comms in this location. At the beginning and end, you basically can’t stay connected due to the timeouts of comms. I mention the laptop CPU because sometimes, plots like this are actually due to a laptop that is installing updates or otherwise max’ed out and unable to process return packets.

The other thing that is quite odd in your log is that your code trace shows that the DS is requesting that teleOp run, but the code is reporting that it ran auto instead. I suspect that it isn’t that simple, and that you are actually running teleOp but the trace has been delete somehow. But are you running auto too? Does that shed any light on why the robot is behaving that way?

I certainly see lots of gaps in processing, lots of times when the robot was disabled, and I suspect that there are periods when the control packets will not make it to the robot very often, causing the robot to enable/disable like shown.

Because the CPU usage is low, I wouldn’t expect any issues reading keys or joystick, and a timeout will disable the robot. So I think the next thing to understand is why the code claims to be running auto.

Hope this helps.
Greg McKaskle

It seems weird that the robot is supposedly running autonomous, but that shouldn’t even affect anything, because this robot has no autonomous mode. It is using the command based system, so it should keep running the teleop drive command even while autonomous is running because that is the default command for the drive subsystem.

Here is the code for the robot, but I don’t think it is the cause of the problem:

I still have a hunch that the problem is related to the DS freezing. It definitely was not frozen because the laptop CPU was maxed out, so it seems like there is some bug causing it. It probably doesn’t cause any anomalies in the logs because only the UI is freezing, but this might be affecting something. I’ll try to reproduce this problem later this week, but it might be difficult because there is much less interference at the school.

Another weird thing I noticed in the logs is that the DS seems to be constantly reenumerating the joysticks and printing out the list. This happens even when the connection is working well, but it doesn’t happen when using the 2015 protocol (with a roboRIO).

Perhaps a better explanation is that the 2014 protocol and 2015 protocol rearranged the control bits, and perhaps the log file is confusing because of that. I say perhaps because I don’t have anything with me to test. But additional log files that work correctly may show that the auto and tele are always flipped with the older protocol and newer tools.

The DS now puts joystick and CPU messages into the log even when everything is normal and this doesn’t mean it is reenumerating the joysticks. You should see it with old and new protocol and with cRIO and roboRIO.

I truly would like to understand why the UI is sluggish. Anytime we find this, we try to move the functionality out into its own loop. So the CPU code that .NET can hang for fifteen seconds is out. The mDNS code is out, … So I would try turning off firewalls and things like that. If you find the thing that causes the UI to get sluggish, please report it.

Greg McKskle

Today we worked demoed the robot again at the same location, and experienced the same types of problems, but we did not lose control. While using wifi I was entirely unable to control the robot with the DS. The robot would connect, and the clickable parts of the UI would immediately freeze (so I couldn’t enable/disable or switch tabs), but the battery voltage would update and the robot code indicator would flicker 1-2 times a second, just like before. Everything worked perfectly while tethered and so did all previous tests done this week (over wifi, but in an environment without much interference).

What was strange was that I was able to control the robot over wifi using this Android app, albeit with occasional latency/connectivity problems. This app does not do much in the way of checking for a functioning connection to the robot, so it seems to be more tolerant of communication problems. Since we were under time pressure at the demo, I was not able to do extensive testing/comparison of the Android app and the official DS.

It seems like there is a subtle bug in the DS that is triggered by a bad connection. I would assume the DS should be able to, if not maintain control of the robot, at least not freeze when communication problems occur. I have no idea if this bug also exists when controlling a roboRIO.

I can provide logs tomorrow if they would be helpful.

The log files may be helpful and so will the version info on the DS.

My suspicion is that the UI controls are “frozen” waiting for ftp of the version files to complete. This is done differently between cRIO and roboRIO. It is performed when we first connect to a robot, and if comms are dropping in and out, this may be happening over and over.

What was your networking topology? Was the robot the AP that the laptop joined? Were they both on the school’s network? Or something else?

Greg McKaskle

I attached the log files from Sunday.

Which version numbers do you want? The only number I can get right now (without the robot) is the DS version, which is 09021500.

As for the network topology, the first time the problem occured (last week), the robot was connected over wifi to our own router, and the DS laptop was connected over ethernet to that router.

The past Sunday, the robot was connected to a network provided at the event we were demoing at, because it supposedly had a more powerful signal. The laptop was also connected over wifi to that network. (3.18 KB) (3.18 KB)

I’m almost positive the UI issues are something network related. I talked to the FTA who ran the practice field, and he saw an unusually high number of driver station freezes when he opened up some additional ports. When he closed the extra ports, the problem became less frequent.

I don’t remember exactly which ports were opened, but I believe ports for FTP, SSH, roboRIO webpage, eclipse deploy/debug, and netconsole/riolog were opened.

We never experienced another freeze after we disabled the themes service in windows, but I’m having a hard time imagining why this would relate at all to the problem.

Team 5096 had two matches during the Wisconsin Regional - Qualification 39 and 70 where we connected to the FMS but were unable to communicate with our robot during some or all of the match. This occurred with two different Classmate computers running DS version 09021500. The error logs are characterized by lost packets, especially in match 70. I think we restarted the drivers station in match 39 and were able to compete during half of the match. We never experienced problems tethered or using Wifi during development.

The LabVIEW rep at the match looked at our code and helped us find a problem that was slowing our motor loop, but did not feel this would have completely disrupted our communications with the field. He would not speculate on the root cause, but suggested we restart the DS if it happened again. On other forums I saw several suggestions - like changing the team number to 1 and then back to 5096, etc., but no solutions.

Naturally, the team was extremely disappointed with being completely disabled on the field during the few matches available to us. I would like some suggestions for how we could duplicate or troubleshoot this problem. I attached the logs.

By the way, we are using a RoboRIO controller. Perhaps I logged this issue on the wrong thread, and apologize for that.

Problem (68.7 KB)

Problem (68.7 KB)

I noticed that in the logs from Sunday, the DS is reporting being disabled the whole time, and the robot keeps switching in and out of disabled. Maybe that is completely unrelated to the problem or something, but when we were at our Regional, like Bob said, our robot was also having connection problems, and for one match we couldn’t move at all - and when I looked at the logs, our DS was also reporting to be disabled for both matches, except for the very end of match 39, when we suddenly gained control again.
Match 39 was a very strange match, because another robot on our alliance was having problems connecting to the FMS and we not only had to wait a long time for the match to start, but we also disconnected and connected to the FMS a few times. Match 39 for us also looks a lot like both logs from Sunday for you.
We were not the only team at the Regional to have connection problems, and the other teams with connection problems were different from us in may ways. I have been trying to get down to the root of this problem so our Regional is not stopped by this again next year, and I’ve narrowed it down to a few things (though I am new at this, so there’s definitely room for error)

  1. A problem with the FMS (which would not explain your problems, so it seems less likely)
  2. A problem with Drivers Station (which could explain your problems too)
  3. A problem with the roboRIO (which would not explain your problems again, but maybe they’ve stemmed from different things)
  4. A problem with too much interference, bandwidth, or related (regionals are very busy, and you said the place you were demoing at had lots of interference)

Again, maybe our problems are from completely unrelated things, but the logs and the problems themselves seem pretty similar, so maybe we could help each other a bit.

*When I looked at the log again I saw that we got a “code start notification” twice, and after we would get them we could move, but we stopped again after the first one - something your log didn’t show. Maybe our problems aren’t as similar as I originally thought…

I have been out of town a few days, so just saw the posted logs. Match 39 really looks like the robot lost power to both the roboRIO and to the radio. I’d check for loose connection at the main breaker and check the terminals on your battery. While you are at it, you may as well check the PDP connections and the verify that the fuses in the PDP are fully inserted.

I don’t see any info in log 70.

Greg McKaskle