Robot comm in noisy public locations

Our team recently participated in a parade through an area that can best be described as very noisy from a 2.4GHz 802.11 basis. It’s a mix of commercial & residential, and there are many, many WiFi APs in the area. Over the course of ~1.25 miles and ~45 minutes, we probably passed within range of 50-75 different APs.

The intent was to have our 2013 FRC robot follow behind a pickup truck, with the drivers in the back of the truck driving the robot, shooting Frisbees, etc. What we experienced was what can best be described as a complete breakdown in controllability of the robot. The robot had to be kept w/in ~20 ft of the driver station to have any control at all, and even then it was not reliable. When we did have control, there was often considerable lag (2-3 sec) between control inputs and robot response. It almost seemed like the robot had a mind of its own, taking off in seeming random directions even when the drivers weren’t giving it any inputs. We ended up disabling it and physically pushing it through the streets – not very impressive from a recruiting standpoint. As soon as we got out of the densely populated area, everything returned to normal.

Without getting into all of the details of our exact configuration, examining the DS logs, etc., can anyone give any suggestions as to configuration/settings that should be used (or avoided) in a situation such as this? Specifically:

  • 2.4GHz vs. 5.8GHz
  • 802.11 b vs. g vs. n
  • DAP-1522 rev A vs. rev B
  • Setting the DAP-1522 to auto-select the channel
  • AP and/or laptop power levels
  • Classmate (circa 2011) vs. higher-end laptop
  • DS WiFi card settings (roaming aggressiveness, CTS, WMM, preamble, etc.)

In noisy environments, we have found the same thing. Robot will not stay connected at 2.4 & works well at 5.8. What we normally use is the older brown bridge (2010 Breakaway) with the computer & a DAP 1522 router. I have not noticed if rev A or B makes a difference. We also run encryption to keep stray people from trying to connect.

I am going to guess that the system is trying to connect to every access point it sees and that takes time, hence the lag.

I’ve seen the same kind of behaviour before.

In some cases the following has helped:

Setup a separate router that acts as AP.
Make it a 5 GHz only network, hidden SSID, WPA2 Personal encryption with both TKIP/AES, 802.11n only. If this router has multiple external antennae, even better. Don’t skimp out on this router, and make sure it supports 5GHz, because many don’t.

Most importantly, patch the driver station into this separate router with a wired connection, and disable the wireless on the driver station computer.

Make sure the radio on the robot is switched over to bridge mode, and that you are getting good and consistent Ping times.

When we do important demos, the separate AP is the setup we like to use. When possible, we avoid switching the robot radio over to AP.

Could you post an image of the DS log for the event. Or if you want, PM me and I’ll provide email info. I’d like to see the log, and review it with you.

Greg McKaskle

We’ve had this happen at nearly every crowded event we’ve ever done, even just with lots of cell phones, and few traditional wifi access points.

We had this issue last year with our first pitch, this year we got a directional dual band Wi-Fi adapter from amped (I’ll check the model number later and add it). This about doubled it our 2.4 range in trafficless areas, and with the 5 Ghz, we were able to do the pitch without any issues. Keep in mind that this was in a stadium with 40,000 people each with a cell phone, in the middle of downtown Detroit, with just our laptop adapter, we didn’t even have a frc field’s length of range.

Here’s a screenshot of the DS log for the first ~15 min of the parade.







This is exactly what we found with the robot radio at the Maker Faire in Detroit. There were numerous other robot radios running, plus a bunch of what looked like cell phone hotspots.

The most telling metric is that in the 3000 second log, your robot was system-watchdog disabled 2750 times. If at all possible, I’d switch to 5Ghz, and/or look into using a different router. The charts tab will give you a pretty good indiation of when you should just cable it.

For the NIWeek demo, team 2468 just left it tethered. 4000 nerds in the audience at a convention center is not a good time to give 2.4 wifi a try. They accidentally estopped their robot when waking up the screen saver, but recovered, scored, and then shot a water bottle of of the head of a VP. Way cool.

Greg McKaskle

It would have been way cooler if they did it with Odd Jobs hat rather than a Frisbee…:slight_smile:

Ditto for the game. Lethal hats make robots cooler.

Greg McKaskle

That’s what we need for the 2014: A James Bond game!!:]

Many thanks to Greg McKaskle from NI for looking into our log files and helping explain how to decipher what’s in them and from Tom, Bryce, and Mr. Lim for suggestions re: WiFi. Their comments raised two branches of thoughts:

First, re: the log files:
Is there any documentation that explains the errors/warnings/logging system in more details? The “FRC Driver Station Errors/Warnings” on ScreenStepslive.com is helpful, but incomplete. In particular, it would be helpful to understand how status checking/logging works from a high-level point of view. Does everything come from the DS, or can the cRIO or DLink log events? Which components can log which events? Which components check on which other components? Where (DS, cRIO) does the Watchdog Expiration come from and what does it mean? How is it possible to have a Watchdog Expiration when you haven’t had a 44004 “DS has lost comm with the robot” or any other warning/error?

Second, re: WiFi configuration:
Setting up, configuring, and debugging WiFi deployments is part of my day job, though I am by no means an expert – if I had been on my game, we wouldn’t have had the problems we did during the parade. That said, I’ll take what the guys above have suggested and add my own thoughts at a first stab at a concise list of best practices for “mission critical” demonstrations:

  • Do a site survey
    ahead of time. While it won’t tell you the impact of 40k spectators or guarantee that the environment will be the same during your event, it will tell you what bands & channels are in use at that time. At the very least, do a survey using something like inSSIDer or a similar tool. If you have the means, scan for non-802.11 traffic using tools such as Wi-Spy, Channalyzer, etc.
  • Use 5.8GHz
    , and unless you really know what you’re doing re: FCC channel allocation rules in that band, set the AP to auto-select the channel and/or use DFS. On the other hand, if you know what you’re doing and read the FCC rules very carefully, and depending on where you are located and the results of your site survey, it may be possible to find a channel that you are almost guaranteed to have all to yourself.
  • Have a separate AP that the DS is hard-wired
    to, and setup the radio on the robot in bridge (aka client) mode to connect to the AP.
  • Use a hidden/suppressed SSID
    w/ WPA2 Personal & AES.
  • Use MAC filtering
    on the AP to restrict WiFi access to the client radios that you want to have access (the robot radio, possibly a backup DS). This may cut down on the work the AP has to do to reject attempts to associate.
  • Don’t scrimp on the AP
    – Does anyone have any specific models they have had success with? I’d lean towards building a custom unit based on Mikrotik hardware & RouterOS, if only because that’s what I’m most familiar with.
  • Hi gain antennas and high powered radios may do more harm than good! Yes, they may allow the AP to reach a mile away, but that’s not useful for our robots. They’re only necessary when the transmitter & receiver are “really far” apart. What constitutes “really far” is subject to interpretation, though I’d put it at 20-30+yds in open space. The problem is that hi gain antennas have a much narrower “beam”, and if the antenna isn’t aligned correctly (usually vertically), you may find that the signal strength 200yds away is higher than it is 30ft away. And even if it is aligned correctly, you don’t want to let the cell phones 200yds away know that you’re there – or to be able to pickup their signals if they attempt to associate.
  • Use a directional/sector antenna on the AP – IF you can be sure that it will always be pointing in the direction of the robot.
  • An AP that supports MIMO (multiple antennas) should help, especially indoors.
  • I’m not sure I agree w/ Mr. Lim re: using 802.11n – Our robots don’t need the bandwidth that N-mode provides, and forcing the radio into N-mode means using a wider swath of the RF spectrum, which means that you’d be more likely to run into interference. I don’t have any hard evidence to back this up, though.
  • In general, setup the AP & client to the most restrictive set of “advanced” WiFi settings that they can both agree no. The default settings of most APs allow compatibility with the widest variety of devices possible. For the fixed configurations we’re dealing with, that’s not necessary. For instance, use Long preamble only, disable WMM, disable “burst mode”, etc.
  • Don’t use proprietary extensions that promise to boost throughput or range. They almost always rely on the AP and all clients being of the same brand (sometimes even specific models). Many are only useful for cases where there are a large number of associated client devices and/or extremely high bandwidth requirements – neither of which apply for a robot & DS.
  • Always, always, always have a really long Ethernet patch cable
    as a fallback in case you can’t get the wireless to work!

Lastly, YMMV!!! Some of the above are my opinions/hunches based on non-FIRST experience, and my team will be testing them out at some point in the future. Don’t take any of it as gospel - try it out on your own and report back the results. If something doesn’t work or makes things worse, narrow it down to the specific thing that’s causing the problem.

The reason why I prefer N-only is to avoid any down-training of speeds if a G or B only device somehow gets on to the network.

This is a known limitation with a lot of wireless networks such as:
https://discussions.apple.com/thread/3701015?start=0&tstart=0

It’s safer just to reject all non-N clients than to try and down-train the entire network to support the slowest client.

On some routers it’s also possible to set the bandwidth to only 20MHz instead of 40Mhz. 20MHz is probably a good call in your best practices outline to prevent it from eating up too much of the airwaves.

Lastly, we’ve been using the old Linksys WRT610N that used to come in the FRC kits many years ago. These have proven to be pretty excellent routers specifically for this purpose, although I am sure there are much better, more modern alternatives now.

The diagnostics have evolved and I doubt the documentation of all the info is up to date. I’ll address your questions here for now.

The DLink is a relatively closed device. I’m sure it could log things just through configuration, but thus far we haven’t found it useful, and we are running stock FW, so at this point, the DLink logs nothing.

The cRIO cooperates in the protocol and helps the DS to distinguish certain issues which would otherwise look identical. In particular, the cRIO and DS will both start pinging devices when comms is down. When comms comes back up, this info is used to identify the break in the comms chain and tattle on the device that rebooted.

The majority of the diagnostics and all of the logging are done by the DS.

The system watchdog is a low level FPGA timed service that turns off the I/O when the RT system doesn’t feed it. This is to provide safety and isn’t specific to comms.

Comms is required in order for the RT task to feed the FPGA. Loss of comms, cable, router, DS, etc. will interrupt the feedings. Because the FPGA enables the I/O instantly when it is fed, comms that is just a bit late can cause stutters, so it contains a counter. The system watchdog count indicates transitions, but not duration. The FPGA watchdog timeout is 100ms, which is quite short.

The DS or other components don’t implement the watchdog, but if they are late or negligent in feeding, they will cause a watchdog. The DS reports the 44004 error when the UDP read from the robot times out after 500ms. Thus in a situation where packet loss is not absolute, it is more likely to have 100ms windows with comms loss than 500ms windows of loss.

Additionally, the error messages reporting on the comms are being sent over potentially faulty comms. So lost packets may drop error messages. This is why there are summary messages and sometimes confusing inconsistencies where you may expect to see messages come in pairs, but find that sometimes they are solo.

To try a summary, the FPGA and its deadline are concerned with safety. The DS uses the timeouts to help diagnose the issue and report problems. The DS does its best to filter and present a summary of the comms in the log file viewer, but this is complicated because of missing messages.

You didn’t ask about it, but the charts tab shows two primary measurements of comms health in addition to errors/warnings. The latency or trip time measurement is of successful packets. In FRC, this number generally indicates retries at the wifi layer. The wifi layer will retry the UDP traffic at least four times in a clean environment and more in a noisy environment before moving on and letting the upper layers deal with it. UDP will deals with it by doing nothing. TCP would deal with NACKs, timeouts, and retransmission.

The blue bars indicate the packets that never arrived. They didn’t make the full trip, but some portion of them may have made it from DS to robot – keeping it enabled and allowing it to drive but the status from robot to DS was lost.

If you had clean room and a robot connected to DS over wifi, you’d expect no retransmissions or failures. Add RF noise and you’d start to see latency rise as retransmission is used to overcome noise. Add more noise and latency will peak and loss will start to rise.

To add to your list of configurations to attempt.
You may want to consider cutting back or turning off the video stream in a noisy demo venue. The TCP traffic will degrade more poorly and will interfere with the UDP traffic.

Greg McKaskle