Robot disconnecting and reconnecting

Hello. We have been experiencing an issue where our robot has been disconnecting then reconnecting constantly. We aren’t sure what the issue could be. We have replaced the RIO, Radio, ethernet cable, and our POE cable, still having the same issue. I have been paying attention to the radio’s less, it’s having constant power, but just losing connection. I think it could be software but I’m not sure where in our code could be the issue. Here is our code: GitHub - frc5431/RobotCode2021 at path_weaver. Are there another ideas on what the issue could be?

Please post some plots showing times when this has happened using the log viewer: Driver Station Log File Viewer — FIRST Robotics Competition documentation.

1 Like

Here are plots, its a long one so I thought taking a screenshot would not be beneficial, so I sent the files instead
eventsandplots.zip (38.0 KB)

I had a look at the logs you uploaded – thanks! It was from March 2nd (the filename encodes date and time), so I didn’t spend a whole lot of time looking at it. If this is the most current info you have, I can dig into it some more.

If you are seeing a lot of activity in the logs, just pick a match length section of time and screenshot that (there’s a button to scale the display to about two and a half minutes and you can then scroll to a section where there seems to be something going on). May as well get something current, either way. Also, be sure to check the driver station “View Console” (see here) for messages – you can paste a section of that also (you already included this in your upload).

For now, let’s try a different approach…

Does this happen when the robot is just sitting there? If so, does it still happen tethered (using an Ethernet or USB cable to connect the driver station laptop and the robot directly? Based on your initial post, a software problem seems likely – but this is only a guess with very limited information.

Between the log viewer and console output, it should help to narrow things down quite a bit.

That is the latest log we have. We haven’t been able to work on our robot since March 2nd. I have the laptop because I was working on a side project at home.

This happens whenever. Sometimes while just sitting, sometimes while driving around. It a real annoyance when we are trying to run the Robot Characterization cause it seems up our data.

When we are ethernet, it is fine.

OK, sorry for not spending more time looking at what you sent – I didn’t realize you didn’t have access to newer data.

The messages look OK (at some point, you should look at the CAN messages – there’s something going on there that’s at least cluttering the logs and your CAN% utilization is on the high side of where I like to see it, but these are not critical).

The thing that does stand out is you have a high “Packet Loss %” (and also a high “Latency ms”, which is often correlated with the former). Was your robot running a long way from the driver station laptop? Is your radio surrounded by a lot of metal? Have you tried using a different laptop? In the logs you sent, I don’t see anything rebooting, no obvious signs of any kind of software problem, and no brownouts.

What I do see is several occasions where there is no data coming in for intervals of around 250ms or so, followed by a disconnect event. Several of these events directly correlate with really high “Packet Loss %” (and “Latency ms”) spikes, and for the one that happens when these numbers are just high, missing data could easily be hiding a bad spike. But even in cases where you are not seeing an explicit “Disconnect” event, these two metrics are really bad.

So, my theory is that the main issue that is causing you trouble is poor over-the-air WiFi connectivity. Things I’d try to improve the situation:

  1. Make sure there isn’t a huge distance between driver station and robot;
  2. Make sure the robot radio is not surrounded by metal (if so, relocate it);
  3. Try a different laptop (to rule out some WiFi problem on one laptop);
  4. Be sure your radio is operating as an access point (WiFi LED on radio is yellow/orange);
  5. You can find a utility for Android Phones or a PC to check the WiFi environment (signal strength, possible interference from other WiFi users, etc.).

I haven’t used any of these, but here’s an article on some Windows WiFi “scanners”. You should see a strong signal from the WiFi on the robot, and you want to see the channel it is on (and those nearby) are not super busy.

No worries, you didn’t know, it’s cool lol.

We will definitely look into the CAN messages the next time we have access to the Robot (hopefully, the coming Monday).

The Packet Loss is something that also caught my eyes. The Robot barely changed between Plano 2020 to when we were able to have access to our bot (our bot was basically inactive for almost a year).

The distance was completely fine, we have been able to run our robots, (and sometimes many robots when other teams come over) for many years. We have a makeshift field (it’s not even a full field).

The radio is in fact around metal but this is the same placement we had for Plano 2020 and during practice before Plano 2020.

We have tried a different laptop. We used a different laptop when we were trying to get data from the Robot Characterization tool but this issue kept giving us bad data.

I wouldn’t say our robot was rebooting I would say, our robot would stay on, it’s just when we are connected to the driver station, it would disconnect, then reconnect, and when it reconnects, it enables itself, acting as nothing happened.

We will definitely do the WiFi “scanners” when we are back at practice. It should be busy at all. We have has busier moments before, these times should be much cleaner but we will definitely look more into it.

1 Like

When you get access to the robot, check to be sure the data is telling you the same thing. If nothing has changed, it should be doing so. On the robot, you can check the Ethernet between the radio and the roboRIO – but from your first post, the cable, radio, and roboRIO have all been swapped already. Sometimes, the jack at one end or the other can be the problem.

You can use “ping” to try to see if the problem is packets being lost getting to the radio, or getting lost going from the radio to the roboRIO. On Windows, open a text shell (“cmd”) and type “ping /4 /f /l 1000 /n 1000 10.54.31.1” or “ping /4 /f /l 1000 /n 1000 10.54.31.2” – you can even open two shells and run these in two windows at the same time. Note that I got “54” and “31” from your team number, so be sure that is right (5431).

The “.1” IP address should be the radio, and the “.2” IP address, the roboRIO. In your case, the times when you are losing connectivity are short and somewhat infrequent, so leave these running for a while. The “/n 1000” part causes the actual ping to be done 1000 times, so it will take a while.

When it finishes, you should see something similar to this:

Ping statistics for 10.54.31.1:
Packets: Sent = 1000, Received = 1000, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 2ms, Maximum = 109ms, Average = 4ms

If you see 0% loss to the radio, but something higher for the roboRIO, you can focus on the Ethernet connection between the two. If the loss percentage is similar, the issue is very likely something with the WiFi part of the path. In this case, I’d try moving the radio so there is less metal around it, since you’ve ruled out most other causes.

1 Like

I am responding to bring this thread alive because we are still having the same issue. Here is a list(not in order) of what we have done since the last post of this thread.

  1. Replace the Rio, Radio, POE, and Ethernet
  2. Deployed an empty template 2021 code
  3. Tried a different laptop
  4. Configured the Radio to 5ghz

All of this didn’t fix the issues, we were still having connection issues.
We also decided to test our 2018 robot, we did update it, nothing, we turned on the bot, connected to it, and ran whatever code was on the rio and from our minimal testing, we got 0 disconnection which tells us that it has be something with our 2020/2021 robot itself.
We would appreciate if someone could potentially help us and help investigate what the issue could be. If you have any questions that you need answers to that could help solve this, we would be happy to give them to you to the best of our ability. We would like to solve this before we head off to San Antonio for The Texas Cup.

Thank you for your help
Team 5431

Fresh data from the logs and results from the ping test (see earlier post) would be helpful.

Here are 11 logs from the DriverStation. I don’t have the results from the ping test but there were moments where it would disconnect but then reconnect.
DS Plots.zip (202.5 KB)

The later logs are showing a huge number of USB camera connect events.
On the order of several times per second.
At times, the packet latency climbs fairly high, but is cleaner in later runs when the USB camera is faulting-not necessarily a correspondence.
Perhaps remove the USB camera and run a modified version of your code with any camera code commented out as a test.

They also all show a lot of packet loss due to roboRIO disconnects that you have described.
The only time you were driving the robot (1:10pm) was the cleanest-there is an interesting correspondence between drawing a lot of battery power and WiFi noise.
How closely is your radio mounted to noise sources: VRM, roboRIO, battery, motors, etc.

For example, in this plot yellow is the battery voltage dipping while driving and purple is the corresponding WiFi Db noise while the robot is active:

2 Likes

Disconnect the camera as a test. If that clears things up, look into dropping frame rate, resolution, color depth, and/or increasing compression. Also, if you have code which is sending a lot of data to the dashboard, test without this (the simplest project/code approach). Strongly prefer UDP to TCP if you are doing anything with networking over-the-air in this context.

Here is a useful additional diagnostic to collect at this stage.

As you’ve replace all the H/W (correct?), there must be something else going on. Simplifying the S/W is a good way to eliminate other possibilities. One hypothesis is that you are running over the WiFi network bandwidth cap and this is causing you to lose many packets. Another is that the radio is located close to a strong noise source – as Mark noted. Try moving it away from everything, out in the open (you can do this temporarily, as a test).

Updated:

Is there a reason you need to run with your code on the robot to debug the issue (rather than simply using the template project)?

As a general debugging strategy, I try to create the simplest system which has an identical symptom to the initial problem observed. If I can simplify or remove a component without changing the system’s behavior, it likely wasn’t involved.

There could definitely be multiple issues at play here. But unless you absolutely need to be running your code to reproduce the issue, I’d start with the template project.

Similarly:

Reading just the message history I see here, this really seems like your smoking gun to me. If you can consistently get the problem to happen whenever you change anything except the radio/wifi, but consistently get it to go away whenever you take the radio/wifi out of the system, the logical conclusion is to look into the radio/wifi portion itself, or things impacted by the differences between wifi and an ethernet wire.

1 Like

Additional debug steps:

What were the results of this test?

If you swap the radios between the two robots, does the problem move with the radio?

These are some things that we’ve found that aren’t obvious that can effect connectivity, in no particular order:

  1. CPU usage on the rio. If you’re high on cpu (over 90%) we’ve experienced disconnects.
  2. Having your robot access point set up without a password. We found this one at maker faire - so many phones walking around trying to talk to the access point just kills it.
  3. CPU usage on the laptop.
  4. Having multiple other networks in range on the same or overlapping channels.
  5. Having other programs on the laptop. You never know when the other programs (Autodesk is absolutely notorious) are going to try to ‘phone home’ and just kill the connection. Windows was never meant to be a ‘real time’ operating system, so anything else sharing time on the computer is going to mess with you. Everyone would do themselves a favor if their driver station was truly dedicated to only being a driver station (I know many teams can afford that financially, but it’s best practice).

Unfortunately, I can’t see your logs because I’m at work and “security”.

1 Like

We’ve had problems like this before that seemed “random”, and seemed to correspond to other events.

By and large, the cause is always intermittent power. I would suggest you run the “tug test” against all the power terminals to make sure they are absolutely secure and don’t move.

The last time our robot had these exact symptoms we realized that the connection to the PDP from the battery was loose enough the cable would move but not loose enough to ever actually lose power. So the intermittent power connection was causing the electronics to shut down quickly enough to break functionality but not long enough to actually seem like power failures.

1 Like