Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   General Forum (http://www.chiefdelphi.com/forums/forumdisplay.php?f=16)
-   -   Team 548 Einstein Statement (http://www.chiefdelphi.com/forums/showthread.php?t=107906)

DampRobot 22-08-2012 10:54

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Al Skierkiewicz (Post 1182669)
There was no evidence of a second attack. The original attacker suspected that other failures (for known and documented reasons) were being caused by the attack method that had been discovered. As to the three second attack, please read the report again! Once a device had attempted to communicate with a robot, the disruption could last the entire match. The attacker could easily move on to another robot(s) after the first disruption.


If others knew or suspected an issue at other events, they did not come forward with that info. The Einstein Investigation had a clear set of goals and that was to determine what caused so many failures on the Einstein Field. We were not tasked with investigation outside of Einstein and the twelve robots involved in that part of the competition.

Al Skierkiewicz, thank you for pointing out that what might seem obvious to me might be completely contrary to others' points of view. To address your comments using my interpretation of the report:

First, the official FRC report describes a Galxey Nexus running Android 4.0.4 was probably used for at least one attack ("Failed Client Authentication on Einstein") that we recently learned was committed by the 548 mentor. Another section of the report ("Alternative Source Testing") describes in detail the attempts to bring down communications with the failed client authentication attack, and that downtimes in communications could be as low as three seconds with that device and by using a specific strategy. Especially if the mentor had tried this before (which I'm certainly not trying to imply!), he certainly could have only brought down communications for only three seconds.

The second attacker was, to me, implied by the fact that the mentor left the field before Final 1 and 2 and that continued attacks occurred. Also, witnesses saw an individual selecting teams to take down from a cell phone, who may or may not have been the same mentor. Although they believe they are one and the same, the mentor repeatedly denies doing this attack more than once (and if he had, why wouldn't he have used the strategy that would have resulted in only 3-second downtimes? Malicious intent?). He certainly may have been lying, but the fact of the continued attacks considerably longer than three seconds and their continuance even after this person left the field remains.

I think the question of whether there was knowledge in FIRST about this type of hole is a fair question. It states in the Eisenstein report that they only discovered this error accidentally after championships. Shouldn't the actions of this individual, as well as their attempt to contact field personal, given them at least a hint that something was up? Did someone know about this, and was not heard? I certainly don't know, and I don't really expect that anyone on CD can answer all of my questions conclusively.

As always, no offense meant. Hopefully my comments are seen as constructive.

Alan Anderson 22-08-2012 11:44

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1182681)
Also there are probably more devices than one might realize at any one event that can use 5GHz because they are not line of sight to the field. Consider all the driver's station laptops in the pits...

The number of driver station laptops in the pits capable of 5 GHz WiFi was vanishingly small. As a robot inspector, checking for wireless networking of teams' laptops was part of my job. I saw exactly zero with 5 GHz radios in three regional competitions and a championship division.

techhelpbb 22-08-2012 11:53

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Alan Anderson (Post 1182690)
The number of driver station laptops in the pits capable of 5 GHz WiFi was vanishingly small. As a robot inspector, checking for wireless networking of teams' laptops was part of my job. I saw exactly zero with 5 GHz radios in three regional competitions and a championship division.

Fair enough but it can be added in a second with a USB port or card if they choose. Also what about the other laptops often in the pits:

Apple laptops, most all of them since 2006, have dual band.

Including the MacBook, the MacBook Pro, and the MacBook Air.

I know I saw a few of those in my trips into the pits at various events even if they weren't driver's stations.

Al Skierkiewicz 22-08-2012 11:55

Re: Team 548 Einstein Statement
 
Damp,
The three seconds referred to in the report is the response to a specific set of steps taken and observed by the First engineering team testing the Samsung Galaxy Nexus phone at HQ. It is not suggested that this is what action was taking place on Einstein, merely an additional failure using that phone during testing. The alternative testing was performed after it was noted that a 5GHz enabled wireless device had caused some issues on Einstein. It was noted by First engineering that devices have this tendency to 'phone home' once they see a wireless network that they recognize. That is the "repeat interval" listed in that part of the report.
In addition from the report..."Each of these authentication attempts has the potential to cause working communication to drop and a dropped connection to be reestablished between the driver station and the robot. Repeated attempts to connect to multiple SSID’s can result in robots that are drivable and robots that are not over the course of the match."

Siri 22-08-2012 12:21

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Astrokid248 (Post 1182672)
You wouldn't necessarily have to know the cause of the issue to happen upon the exploit. With the growing number of applications that can control any number of robots with a smartphone, it's really not surprising that between week 4 and Einstein someone whipped out a phone and thought, "What if I connect in during a match?"

It's the "1000 monkeys with 1000 typewriters" postulate at work, and I think it would be wise of FIRST to challenge all teams to try and find these exploits and notify FIRST as they appear. Crowd-source the troubleshooting of these systems, and allow teams to have active feedback throughout the season. It would solve a lot of problems. And I agree with the idea that FIRST should have some kind of pre-written response to let teams know that emails are at least going through.

I agree with you--in "1000 people" [likely more] that were around fields on/after Week 4, it seems somewhat plausible to me that someone else who happened to have 5GHz WiFi happened to try to connect to a robot who happened to have Revision A, and happened to try entering a password and cause FCA, and happened to be one of the people that would keep it to themselves. Not likely, but plausible.

What I find significantly less plausible is that FIRST officials happened to do so. Not only is the sample size many, many times smaller, but they are naturally quite busy during matches and additionally have every reason to trust in FIRST's testing. (I acknowledge the potential for complacency.) I cannot picture an FTA or FTAA (etc), much less Dean or Woodie, whipping out their phone in the middle of a match. They have every reason to be among the most busy people in the stadium and no reason to distrust their own selections. This is my argument against DampRobot's question of institutional knowledge.

Al Skierkiewicz 22-08-2012 12:26

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Siri (Post 1182696)
What I find significantly less plausible is that FIRST officials happened to do so. Not only is the sample size many, many times smaller, but they are naturally quite busy during matches and additionally have every reason to trust in FIRST's testing. (I acknowledge the potential for complacency.) I cannot picture an FTA or FTAA (etc), much less Dean or Woodie, whipping out their phone in the middle of a match. They have every reason to be among the most busy people in the stadium and no reason to distrust their own selections. This is my argument against DampRobot's question of institutional knowledge.

HUH?????

Astrokid248 22-08-2012 12:38

Quote:

Originally Posted by Siri (Post 1182696)
What I find significantly less plausible is that FIRST officials happened to do so. Not only is the sample size many, many times smaller, but they are naturally quite busy during matches and additionally have every reason to trust in FIRST's testing. (I acknowledge the potential for complacency.) I cannot picture an FTA or FTAA (etc), much less Dean or Woodie, whipping out their phone in the middle of a match. They have every reason to be among the most busy people in the stadium and no reason to distrust their own selections. This is my argument against DampRobot's question of institutional knowledge.

Should've clarified a bit. I'm not at all surprised FIRST didn't find it. There is no scenario in which they could've found the issue before Einstien. I'm saying that a) if they implement some sort of a feedback system, maybe the troubleshooting will be more comprehensive and b) the mystery mentor probably isn't the only guy who was aware of the problem before he tried to alert the FTAs. Just my 2¢.

Siri 22-08-2012 13:09

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Al Skierkiewicz (Post 1182697)
HUH?????

uh oh. What'd I do, Al? :eek:


Quote:

Originally Posted by Astrokid248 (Post 1182698)
Should've clarified a bit. I'm not at all surprised FIRST didn't find it. There is no scenario in which they could've found the issue before Einstien. I'm saying that a) if they implement some sort of a feedback system, maybe the troubleshooting will be more comprehensive and b) the mystery mentor probably isn't the only guy who was aware of the problem before he tried to alert the FTAs. Just my 2¢.

Oh. Yeah, then we totally agree with each other. (Does that make it 4¢?)

Al Skierkiewicz 22-08-2012 13:15

Re: Team 548 Einstein Statement
 
I am trying to figure what you are saying in that post.

Jon Stratis 22-08-2012 13:27

Re: Team 548 Einstein Statement
 
Al, I think his point was that the likelihood of FIRST officials stumbling onto the FCA issue prior to Einstein was extremely small. The FIRST officials who would be most likely to recognize it for what it is (like the FTA's) are too busy during competition and matches to be flipping through their phones, so accidentally stumbling on it would be difficult for them.

I would add one more item... those individuals are probably the last ones who would actually "try" to connect to the field if the option pops up on their phone. They would just cancel out of the option and go about their business.

JesseK 22-08-2012 14:00

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1182681)
Also there are probably more devices than one might realize at any one event that can use 5GHz because they are not line of sight to the field. Consider all the driver's station laptops in the pits. I'll assume that no one on the field with a 5GHz laptop has time to be doing anything but what is expected of them.

The assumption is a bit naive.

While I agree that 5Ghz wireless cards on battery-powered mission-critical laptops are far and few between (energy mongers...), any individual that tries to interfere from a driver's station laptop will probably not rely on a driver to do so. It's conceivable that the drive team wouldn't know it's happening. Most likely it'd go in a batch file or background script (rundll32.exe anyone?) that doesn't show up. Additionally, it could happen from the queue rather than on the field.

Now that an exploit is public knowledge, it's only a matter of creativity for how it's attempted to be abused. FIRST needs to find a solution for the root cause (sounds like they are). Turning wireless off for the laptops is a start.

techhelpbb 22-08-2012 14:19

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by JesseK (Post 1182709)
The assumption is a bit naive.

While I agree that 5Ghz wireless cards on battery-powered mission-critical laptops are far and few between (energy mongers...), any individual that tries to interfere from a driver's station laptop will probably not rely on a driver to do so. It's conceivable that the drive team wouldn't know it's happening. Most likely it'd go in a batch file or background script (rundll32.exe anyone?) that doesn't show up. Additionally, it could happen from the queue rather than on the field.

Now that an exploit is public knowledge, it's only a matter of creativity for how it's attempted to be abused. FIRST needs to find a solution for the root cause (sounds like they are). Turning wireless off for the laptops is a start.

It's hard to really enforce the zone around a field by just policing devices that are off.

You can't jam because if you do you probably will jam yourself unless you use a very well designed jamming system. Plus FIRST is a publicly visible corporation and you're taking your legal chances jamming like that. You can't count on the devices staying off after you look at them (if we assume no trust it's no problem to just turn it on or for an attacker to use resource kit tools to turn it back on). You can't even count on a spectrum analyzer and a near field antenna to find the devices because a device could be disabled when you look. You can't rely on denial of service detection because wireless by it's very nature is prone to short service disruptions which makes any channel disruptions less than a complete denial of service harder to detect. You can't even sort the process with a Bayesian filter because there are layers of complication and that requires some amount of repetition.

So in reality your choices to prevent future issues get quickly more difficult.

One could track communications losses per match and replay those that don't seem to be due to power issues to the radio (assuming we consider power issues to the radio to be a build quality issue). However, that does not fit with the current process that seems to be at work. Given the current process if an interloper can interfere and not get caught the match outcomes stand. So all it takes is someone with the knowledge and the willingness to absorb the risk.

Stick your head in on a DEFCON or Black Hat convention discussion some time. They'll pull stunts that obviously are pushing or breaking the law right in front of the authorities they know are watching them in the very same room. They aren't shy about it. It's going to be really hard to deny what they were doing if they get busted with a video of them doing it with an audience. At least they aren't concealing their efforts with what they know.

Jon Stratis 22-08-2012 14:37

Re: Team 548 Einstein Statement
 
There are simply too many ways for a robot to fail (as we saw in the Einstein report) for the refs for FTA's to make a snap call to replay a match unless there is conclusive evidence that the cause was out of the teams control. When there is such evidence, matches are replayed. I've seen it happen when the field has had issues, which does occasionally happen. The issue is identifying the root cause of the failure.

So they're really already doing what you suggest in the last paragraph... it's just rare to be able to make the decision as to root cause.

techhelpbb 22-08-2012 14:54

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Jon Stratis (Post 1182719)
There are simply too many ways for a robot to fail (as we saw in the Einstein report) for the refs for FTA's to make a snap call to replay a match unless there is conclusive evidence that the cause was out of the teams control. When there is such evidence, matches are replayed. I've seen it happen when the field has had issues, which does occasionally happen. The issue is identifying the root cause of the failure.

So they're really already doing what you suggest in the last paragraph... it's just rare to be able to make the decision as to root cause.

I agree and with the logging on the field communications devices off and the robots mostly not logging the power to the radios (cause we were forbidden to do so per the official answer to my Q&A from 2012) there was no way anyone in the FIRST field crew would have had a good quick way to even narrow down on that issue.

Even if they monitor that radio power there are known and not well known programming pitfalls that can swamp the radios so I admit even the above wouldn't be entirely complete.

Quality of service monitoring on field side isn't a perfect solution either because the field channels can be swamped robot side or by the already mentioned wireless issues. One could track packet communications on the WiFi bridge as well but that'll probably require some custom firmware and some place to stick the data.

Alan Anderson 22-08-2012 15:00

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by JesseK (Post 1182709)
Now that an exploit is public knowledge, it's only a matter of creativity for how it's attempted to be abused.

The specific exploit in question is no longer possible. The access point firmware bug that permitted it will not be present in the future.

Other exploits do still exist. Some are essentially impossible to prevent because they are inherent in the nature of 802.11 wireless networking and established security protocols, but they are detectable.

techhelpbb 22-08-2012 15:29

Re: Team 548 Einstein Statement
 
There is no way I can state my case that the remedies presented in the Einstein report will not be sufficient to prevent exploit in this FIRST related forum or any other FIRST forum publicly. If I make my case, eventually escalating to successful public proof of concept. All I'll be doing is enabling people with bad intentions. Proving my point is not worth the harm it will probably cause to hundreds of thousands of kids.

There is clearly no time remaining to do anything about the issues anyway.

Come what may. I'm glad that having the highest score is not my highest priority.

Siri 22-08-2012 15:30

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Al Skierkiewicz (Post 1182702)
I am trying to figure what you are saying in that post.

I seem to have done a pretty poor with this one. This is what I thought I was saying:

@Post 93 DampRobot questions how no one on the official FRC team could have known about this FCA hole.

@Post 95 I point out that it only occurs under limited circumstances, and say I'd therefore be surprised if someone from FIRST tripped on it themselves (while sharing the understanding that they would/should have acted had someone told them).

@Post 97 Astrokid points out that you don't need to know the cause of the issue to happen upon it, and says it's not surprising that someone just thought "What if I connect in during a match?"

@Post 105 (the "HUH?" one), I agree but draw a distinction--which, unbeknownst to me, Astrokid agrees with--between "someone" accidentally discovering it, and a FIRST official happening to do so. I draw this distinction because there are a lot more random "someones" than FIRST officials, FIRST officials tend to be rather preoccupied during matches, and given the otherwise extensive testing of the new Cisco firmware, FIRST officials have every reason to trust their selection and the system as a whole.

@Post 106, Al asks be what on Earth I'm talking about.



On a totally different note, does anyone know if the field saves the data records from the spectrum analyzer, or is it solely live feed?

BigJ 22-08-2012 15:44

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1182726)
There is no way I can state my case that the remedies presented in the Einstein report will not be sufficient to prevent exploit in this FIRST related forum or any other FIRST forum publicly. If I make my case, eventually escalating to successful public proof of concept. All I'll be doing is enabling people with bad intentions. Proving my point is not worth the harm it will probably cause to hundreds of thousands of kids.

There is clearly no time remaining to do anything about the issues anyway.

Come what may. I'm glad that having the highest score is not my highest priority.

Those with bad enough intentions will probably discover it sooner or later (or have already figured it out. Many exploits in software end up working this way). Disclosure is not always a problem. If you believe there is a reasonable mitigation (such as a firmware update, or more stringent procedures in pits+field) that could be made I'm sure many would appreciate it being public knowledge, especially if you have tried reaching out to FIRST already.

However, if you believe it is an issue with no easy mitigation that shakes the current technology foundation of the field and robot control systems to its core, disclosure might not be the best idea unless you are reasonably sure someone is using it.

Just my two cents.

techhelpbb 22-08-2012 15:53

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by BigJ (Post 1182729)
Those with bad enough intentions will probably discover it sooner or later (or have already figured it out. Many exploits in software end up working this way). Disclosure is not always a problem. If you believe there is a reasonable mitigation (such as a firmware update, or more stringent procedures in pits+field) that could be made I'm sure many would appreciate it being public knowledge, especially if you have tried reaching out to FIRST already.

However, if you believe it is an issue with no easy mitigation that shakes the current technology foundation of the field and robot control systems to its core, disclosure might not be the best idea unless you are reasonably sure someone is using it.

Just my two cents.

I have both sorts of exploits and I have already disclosed this to FIRST 30 days ago so let's start with this:

For one the problem is the way the fields are laid out geometrically and the way areas of common play are positioned. I won't say why this is a problem I will say that a single WIPS sensor per field is not sufficient because of it.

There should be a minimum of 2 of those sensors per field diagonal from each other across the long dimension of the field. Take a good look at where the current AirTight sensor generally ends up and it's proximity to the Cisco hardware.

By the way, this was the very first thought to run through my head given the fact that one alliance or another seemed to be disproportionally likely to have issues.

Al Skierkiewicz 22-08-2012 16:26

Re: Team 548 Einstein Statement
 
Siri,
I read your post and thought that you were indicating that First engineering had already made the attempt to connect to robots by the time Einstein occurred. then I read further and became more and more confused as to what point you were trying to make. So let me make a few statements. No one at First, to my knowledge, had attempted to connect to a robot during competition. Of course they performed all kinds of testing in the off season and during other events. They constantly take in info from team members, even though they may not acknowledge that they received the info. They perform tests at HQ and ask FTAs to try things in the field. First also reaches out to trusted technical volunteers for their input and testing when needed, to insure that a good cross section of robot design and programming platforms are tested.
Teams attempting to control their own robots at home on their practice fields using 2.4 GHz wifi bands are common. I have done it with my phone and my robot. There are several apps available for Android and iPhone that identify available networks and some actually will do spectrum display showing network ID and signal strength. I checked my phone (I can turn wifi access on and off) between matches on Einstein and found a total of three at 2.4 GHz in addition to any robot radios that were on, and two of them were house Fan network points for use during games. Not what you would expect with all those phones and tablets out there in the stands. There was a spectrum analyzer in place to check for networks coming on line and searching for available connections. To my knowledge that does not keep a log but I can tell you several people were checking that during matches, myself included.
I would like to point people to the list of experts that were present during the Einstein weekend. In that list you will find people from Qualcom who were part of the design team for 802.11 communications and set the specifications, people from Cisco, RF experts from Deka, the wireless consultant that designs systems in the Boston area and worked on the FRC wifi design and a variety of First engineering staff, computer experts and RF Engineers. All of them brought or ordered up whatever tools they felt would be needed to analyze the field and robot communications. Their intention was to find what caused the failures on Einstein and to make an attempt to break the control system in use. I have not seen a group of people so anxious to break something and show off than those assembled. Yes, they found that there are some things that can be done to improve the wifi configuration and improve data transfers and prevent outages. However, and I can't stress this enough, during Einstein the only repeatable failure of wifi control that is supported by robot logs, observation, robot action, etc. was that of the admitted intrusion by a mentor on the field.

ratdude747 22-08-2012 16:30

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Alan Anderson (Post 1182690)
The number of driver station laptops in the pits capable of 5 GHz WiFi was vanishingly small. As a robot inspector, checking for wireless networking of teams' laptops was part of my job. I saw exactly zero with 5 GHz radios in three regional competitions and a championship division.

I find that hard to believe... In my house there are 3 Dell Latitudes with 5GHZ capability:

D400- My old laptop, has a Broadcom BCM4306 chip that can do WPA2 and B/G/A.
D800- My dad's laptop, has an older version of ^ that has the same capabilities.
D630- My current laptop. Used to have an Intel 3945 B/G/A, I later upgraded it to an Intel 4965 B/G/N/A.

I've seen those models in pits before... I've seen a couple D400s used as driver stations as well. Not every D400 has a dualband chip but the BCM 4306 was very common in the D_00 units (Dell offered it as a free upgrade from the base Intel B chip).

IIRC they make USB/PCMCIA/ExpressCard adapters that are dual band that one could hide and later plug in when nobody was looking.

EricVanWyk 22-08-2012 16:38

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1182731)
I have both sorts of exploits and I have already disclosed this to FIRST 30 days ago so let's start with this:

For one the problem is the way the fields are laid out geometrically and the way areas of common play are positioned. I won't say why this is a problem I will say that a single WIPS sensor per field is not sufficient because of it.

There should be a minimum of 2 of those sensors per field diagonal from each other across the long dimension of the field. Take a good look at where the current AirTight sensor generally ends up and it's proximity to the Cisco hardware.

By the way, this was the very first thought to run through my head given the fact that one alliance or another seemed to be disproportionally likely to have issues.

Brian, please stop spreading FUD. I can already see the direction you are aiming, and quite simply physics does not work that way. You are simultaneously crying that the sky is falling and threatening to make the sky fall.

I ask you to consider why you feel that FRCHQ is unresponsive, and why others do not feel that way. Is it HQ? Is it the others? Or is it you?

Al Skierkiewicz 22-08-2012 16:38

Re: Team 548 Einstein Statement
 
Larry,
Not all devices that claim full 802.11 wifi can actually do 5 GHz. Most devices, phones especially, are very difficult to determine as to what frequencies they can operate at.

DMetalKong 22-08-2012 16:40

Re: Team 548 Einstein Statement
 
As far as I understand the extent of the problems, and as far as I understand the OSI model, the attacks that people are talking about are mostly happening on the network layer, which means that they would have to be resolved on the network layer or above. Since I doubt we will be moving away from 802.11 as the physical layer, and since I doubt we will be messing with MAC addressing and whatnot on the data link layer, this means that issues would have to be resolved at the network layer*.

So, possible solution time: what if FIRST developed custom firmware for the routers that would require a handshake using PKI in addition to the normal procedures for connecting to the field AP? Give every team a SD card or flash drive that contains a signed public-private keypair belonging to the team, as well as the certificate for the field APs. As long as every team's private key remains private, this would ensure that any request to connect to the field by a team would be irrevocably linked to that specific team (so no posing as team XXX trying to disrupt field communications), and any request to connect to the field that is not signed could safely be ignored. MITM should be mitigated in this scenario as well. Denial-of-service or other types of jamming would be possible, but I am assuming they would be more easily detected (because blocking out a user's communication entirely should require more bandwidth than simply impersonating them (I think? Even the FCA attack described did not stop communications on the physical layer, it only made the router ignore a valid connection attempt))*.

* I am by no means an expert, I am just spouting off from my understanding of a couple of networking courses in school.

techhelpbb 22-08-2012 16:45

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by EricVanWyk (Post 1182738)
Brian, please stop spreading FUD. I can already see the direction you are aiming, and quite simply physics does not work that way. You are simultaneously crying that the sky is falling and threatening to make the sky fall.

I ask you to consider why you feel that FRCHQ is unresponsive, and why others do not feel that way. Is it HQ? Is it the others? Or is it you?

Eric you did not address the point. You could have addressed the point but instead you went directly for me as the problem.

Yeap there's the response I already predicted in this very topic (look back page or 2 or ask me to quote it).

You are simultaneously saying you want help and information then simultaneously being highly selective of who offers that help without a second thought to the point they make or any proof they may offer.

I asked weeks ago for merely a description of the process for these additional concerns. None has been provided.
I asked again in this topic and none has been provided.

I asked why people that send e-mails to the designated address aren't even granted the courtesy of an auto-responder and got no response.

I asked people at FIRST and the mere response I got was they were 'looking into it' which is often the response I get when you're not getting a call back.

The argument you think counters my point isn't as strong as you'd like to believe.

Now what am I supposed to do to refute your commentary Eric? Show you this works publicly?
Then what? What's going to be the process then, demand I resign as a mentor, or go after the team I helped start?


Here's what I'm going to do for this forum. I'm not posting again in here today.
Come what may I don't play this contest to score the most points, so in the end the threat to my priorities is trivial.

I do this to help kids and to honor what I do for a living...whether or not we can score the most points has little to do
with that. Even the years with the worst robots the kids still come out the winners and that's fine in my score book.

Akash Rastogi 22-08-2012 17:32

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1182742)
Eric you did not address the point. You could have addressed the point but instead you went directly for me as the problem.

Yeap there's the response I already predicted in this very topic (look back page or 2 or ask me to quote it).

You are simultaneously saying you want help and information then simultaneously being highly selective of who offers that help without a second thought to the point they make or any proof they may offer.

I asked weeks ago for merely a description of the process for these additional concerns. None has been provided.
I asked again in this topic and none has been provided.

I asked why people that send e-mails to the designated address aren't even granted the courtesy of an auto-responder and got no response.

I asked people at FIRST and the mere response I got was they were 'looking into it' which is often the response I get when you're not getting a call back.

The argument you think counters my point isn't as strong as you'd like to believe.

Now what am I supposed to do to refute your commentary Eric? Show you this works publicly?
Then what? What's going to be the process then, demand I resign as a mentor, or go after the team I helped start?


Here's what I'm going to do for this forum. I'm not posting again in here today.
Come what may I don't play this contest to score the most points, so in the end the threat to my priorities is trivial.

I do this to help kids and to honor what I do for a living...whether or not we can score the most points has little to do
with that. Even the years with the worst robots the kids still come out the winners and that's fine in my score book.

Brian,

Please take a step back from your own commentary as well. I am not sure how you came to some of these conclusions from Eric's post. If you two want to argue, carry it to a PM. Sometimes "we're looking into it" has to be taken as good enough. Please avoid drawing random conclusions from what others say on here. But yes, please do take a few days off from this thread.

Thank you,
Akash

Al Skierkiewicz 22-08-2012 17:32

Re: Team 548 Einstein Statement
 
David,
The specific phone attack only occurred when a 5 GHz enabled device attempted to connect to a robot. No data transfers took place, no handshaking, no virus like attacks, no special apps or software, no involvement with the FMS. Just the simple operation of attempting to connect to the robot access point.

DampRobot 22-08-2012 18:06

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1182742)
Now what am I supposed to do to refute your commentary Eric? Show you this works publicly?
Then what? What's going to be the process then, demand I resign as a mentor, or go after the team I helped start?

Someone needed to say this (although perhaps a bit less vehemently). There needs to be an official route for security holes that simply does not exist now. I understand that the good folks at FRC have a ton on their plate already, but there is no incentive structure that exists to make sure these types of problems get reported and solved before they cause havoc at the world championships.

This is what I was getting at with my question about institutional knowledge. Either someone at FIRST knew about this hole, and there was an error in communications, or no one found out about this, because there was no reason for someone outside the small FRC team to go an official route.

I think there needs to be an official way to report bugs and to encourage people to report this type of exploit. An official FRC award for work in security, where as part of the submission process there would be a demonstration of the exploit discovered, would help these problems come out officially rather than being used maliciously. Instead of trying to fight "hackers" by ignorance and fear of persecution, give them a reason to strengthen the system, not destroy it.

linuxboy 22-08-2012 18:35

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Siri (Post 1182656)
I certainly don't take T14 to be the only allowable interaction (having talked to enough FTAs in my day), but it is the only guaranteed interaction. While I've never done it on Einstein, I head refs--even busy ones--seem listen to polite students in the box. I think you'd be hard-pressed to find a ref that wouldn't listen twice to "I know what's wrong; please let me show you how anyone in the stadium can shut down any robot on this field". As I understand it, the demonstration is rather quick (pull up the network list and show you can send a client authorization). If so, the student could show this directly to the ref for added clout.

Thanks, this is pretty much what I meant to say. While it is totally valid to talk to the other volunteers, the "official" route for raising an issue is in the question box (and after a match with connection issues, FTAs tend to get to the person in the question box just as soon as the head ref in my experience).

EricH, While it seems that going to the head ref could have yielded the same result, I think its just as likely that the ref (along with the FTA) may have chosen to hear the student out and see a demonstration. That's completely my opinion, there's no way of knowing what would have happened.

EricH 22-08-2012 19:59

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by linuxboy (Post 1182759)
EricH, While it seems that going to the head ref could have yielded the same result, I think its just as likely that the ref (along with the FTA) may have chosen to hear the student out and see a demonstration.

It's just as likely, yes. But what you missed is this:

By the time the student has told the ref, who has told the FTA, you have the following chain:

1) Mentor thinks there may have been a DoS attack. (or other issue)
2) Mentor tells student to tell the ref that there may have been a DoS attack.
3) Student tells ref that there may have been a DoS attack, and the FTA may want to know about it.
4) Ref tells FTA (if the FTA isn't already there listening).

That's a minimum of twice removed, on a suspicion. The FTA is going crazy trying to figure out what's going on--and remember, all eyes are on the FTA and his crew (normally they blend into the background, or are supposed to). And, remember, there's an alert that is supposed to catch DoS attacks and it hasn't gone off.

If I'm the FTA, I'm likely to go, "Tell your mentor that there wasn't one detected and we're trying to get to the bottom of this" and get back to trying to get to the bottom of the problem. It won't be until the second match at least that I look at it and go "Hey, there might be something to what that kid was saying his mentor thought. Now what team was he on again?"


Now, if the student was there and said, "We think someone tampered with a robot during a match by this process, which you might not be able to detect", the FTA would be a whole lot more likely to take action, because a) they now have an idea that their detectors aren't working and b) they have something concrete that they can look for if the logs haven't disappeared yet. But that whole thing involves a mentor explaining the process to a student, which takes time.

ratdude747 22-08-2012 21:13

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Al Skierkiewicz (Post 1182739)
Larry,
Not all devices that claim full 802.11 wifi can actually do 5 GHz. Most devices, phones especially, are very difficult to determine as to what frequencies they can operate at.

I know... I'm just saying there were popular laptops out there that COULD.

How do I know? My router is a dualband N (two APs) and all 3 laptops can see and connect to my 5ghz Network (set to 5ghz only) just fine.

DMetalKong 22-08-2012 22:22

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Al Skierkiewicz (Post 1182749)
David,
The specific phone attack only occurred when a 5 GHz enabled device attempted to connect to a robot. No data transfers took place, no handshaking, no virus like attacks, no special apps or software, no involvement with the FMS. Just the simple operation of attempting to connect to the robot access point.

Al,

Correct me if I misunderstand though, but for 802.11 there is a standard protocol for the router (or other device) to attempt to make the connection. What I was suggesting was modifying this protocol through the router/AP firmware so that the routers/APs that are part of the field network could ignore unauthorized connection attempts.

I see so much discussion of problems with the field without much discussion of solutions. That is not to say that people do not have solutions; I think it is easier to focus on what went wrong than on plans for the future (especially when I get the impression that people feel like they do not have a means of influencing change in the organization as a whole). As much as this discussion is veering from the original intent of the thread (the apology), I would rather see it derailed in a constructive fashion focusing on possible solutions, even if those solutions won't necessarily work.

Siri 22-08-2012 22:37

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Al Skierkiewicz (Post 1182735)
Siri,
I read your post and thought that you were indicating that First engineering had already made the attempt to connect to robots by the time Einstein occurred. then I read further and became more and more confused as to what point you were trying to make. So let me make a few statements..

Ok, that was the exact opposite of what I meant/said, so I'm glad we cleared that up. Thank you and thanks for the statements, too. I know I can't understand what it's like working inside something so complex and critically-viewed, much less when it's a volunteer organization. At the same time, your point about FIRST constantly collecting information from teams even if they don't say so worries me somewhat. As may have been noticed on this thread and others, the lack of two-way communication before and at events is difficult to handle in some cases. Community members are left to feel they have little recourse, whether or not we actually do. Nothing good seems to happen when officials are overwhelmed with advice (or complaints) and members feel overwhelmed with things to advise about. (I've also been on both sides of this in FIRST and neither is easy or pleasant.)

I do argue with others on this thread that we need a more consistent/accepted/responsive/official/useful/publicized/whathaveyou reporting channel for these sorts of things. So I ask as nicely and respectfully as physically possible towards both parties: how do we do this?

Alan Anderson 22-08-2012 23:12

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by DMetalKong (Post 1182802)
Correct me if I misunderstand though, but for 802.11 there is a standard protocol for the router (or other device) to attempt to make the connection. What I was suggesting was modifying this protocol through the router/AP firmware so that the routers/APs that are part of the field network could ignore unauthorized connection attempts.

There's probably no need to modify the protocol. It already dismisses failed client authentication attempts. The disruption to the field network seen on Einstein was due to a bug in the access point firmware, which combined with one version of robot router hardware to cause an unexpected loss of the network connection. That bug is no longer an issue.

An 802.11 protocol change that encrypts "management packets" could probably prevent deauthorization flood attacks from succeeding. It would also break a lot of things in the process.

Quote:

I see so much discussion of problems with the field without much discussion of solutions. That is not to say that people do not have solutions; I think it is easier to focus on what went wrong than on plans for the future (especially when I get the impression that people feel like they do not have a means of influencing change in the organization as a whole). As much as this discussion is veering from the original intent of the thread (the apology), I would rather see it derailed in a constructive fashion focusing on possible solutions, even if those solutions won't necessarily work.
Did you read the Einstein investigation report through to the end? The last two pages are all about planned possible changes, with a half dozen of them as specific solutions to observed problems.

EricVanWyk 22-08-2012 23:20

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Siri (Post 1182806)
I do argue with others on this thread that we need a more consistent/accepted/responsive/official/useful/publicized/whathaveyou reporting channel for these sorts of things. So I ask as nicely and respectfully as physically possible towards both parties: how do we do this?

At an event, the "question box" is the best way to begin communication, you just need to be patient as your question gets routed to the best person to answer it. Outside an event, email is your best bet. Specific to these types of situations, you can use 2012frcfeedback@usfirst.org (as stated in the Einstein report). Please note that many people are currently on vacation, and the ones that aren't are buried in work.

The important thing to remember is that the hardest part of engineering is communication. The value of your ideas are limited to the people you can influence with them. As a volunteer I've been cursed out several times by people trying to influence me with their ideas, and it is turns out that screaming in someone's face it isn't very effective persuasion. By the time they've finished commenting on my heritage and IQ, they could have instead told me their idea and provided supporting information.

So, when you "attempt to notify FIRST personnel of [your] belief", please be clear, concise, and civil.

DMetalKong 22-08-2012 23:22

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Alan Anderson (Post 1182808)
An 802.11 protocol change that encrypts "management packets" could probably prevent deauthorization flood attacks from succeeding. It would also break a lot of things in the process.

I think that breaking things could be acceptable for use in FRC if the need is strong enough. As long as only firmware is changing, and not hardware, the cost of deployment would not be as great as an entirely custom solution.

Quote:

Originally Posted by Alan Anderson (Post 1182808)

Did you read the Einstein investigation report through to the end? The last two pages are all about planned possible changes, with a half dozen of them as specific solutions to observed problems.

I did read the report :) I was referring more to the various threads on CD that have been started about the topic.

stjonl 22-08-2012 23:27

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Gray Adams (Post 1182439)
I want to echo this point. By the mentor's own admission, he used the attack, but why should we believe his admission of guilt isn't the full story from his perspective? Every single one of us has been looking for someone or something to blame for what happened on Einstein. The full report has brought up a multitude of points of failure during the finals, and its really not hard to believe the answer to all of this is not as simple as blaming this all on one mentor. As soon as news broke that there was an attack during play, all of the failures on the field were attributed to that. But things just aren't that simple, and we discovered how many root causes for all the different problems there really were. But I firmly believe we still know far too little to place all of the blame on this one attacker. With thousands of incredibly smart people in the dome, its entirely possible that someone else used this attack, whether or not their team was on einstein, and whether or not they were fully aware of their actions.

We've heard 2 sides of the story so far, and unless someone would like to point out something I missed that puts them in direct conflict, I think it's only fair to evaluate this based on what we know.

Everyone was feeling a lot of emotions at the moment, and the attack in response could have been from a moment of desperation. I'm not condoning what happened, but I am trying to understand it.


I think part of the story happen before St. Loius. At MSC the finals were 469, 67 and 830 against 2054, 548 and 245. The red alliance won the first match, but the second match ended in a most unusual note. 2054, 548 and 245 were attempting a tripple balance. A little before that, 67 was in the blue alley and died about two feet in front of the blue bridge. The blue alliance charlie browned the bridge and contact was made with 67 a few times. At the end of the match, the blue bridge was level, but one robot, on the blue alliance side was half on the bridge and the floor. The referees looked it over and huddled up. I believe they call a few 3 point penaties during the match for contact in the alley. The final score was close enough that a blue bridge balance would have given them the match win and force a thrid final match. I can only assume the referees were discussing if there was a bridge balancing interference. As they discussed the issue, the winning teams were call to the floor to cut down the nets. Referees still discussing. Some of the nets have now been cut down. One referee leaves the huddle, removes his striped shirt within a couple steps and is clearly not happy. The rest of the nets are cut down. I can not remember exactly when the referee huddle ended, but at some point the MC them explains that the balanced blue bridge did not count because of the one robot half on the bridge and the floor. No mention of the possible bridge balancing interference. Myself and many others believe that the bridge balance was interfered with, OK, not deliberately, but just the same. That's not how the referees called it. 2054, 548 and 245 could have been denied a chance to win the tournament because of that call. Referees are human and they do the best they can, do not blame them unless you want to fill their shoes.
I was really suprised that the referee huddle was still in progress when they began the ending ceremonies. That really dampened the mode for everyone there.

Siri 23-08-2012 07:58

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by EricVanWyk (Post 1182810)
At an event, the "question box" is the best way to begin communication, you just need to be patient as your question gets routed to the best person to answer it. Outside an event, email is your best bet. Specific to these types of situations, you can use 2012frcfeedback@usfirst.org (as stated in the Einstein report). Please note that many people are currently on vacation, and the ones that aren't are buried in work…

While I agree with you, most significantly on your discussion of communication, the outside-event lack-of-communication issues described lately on CD are far from the first time I've heard intelligent, articulate and patient people report silence from FIRST. Without saying they're right (or wrong), I will contend that it's widespread enough to attract the attention of myself and others.

As for at-event, I've already put in my defense of the question box, but my experience somewhere between yourself and EricH. I've witnessed and experienced mentors communicating issues to students for the question box from both sides, and I'm not sure it's routinely as slow or unwieldy as some are concerned (though it certainly has the potential to be...as a coach I actually review question box procedure with my drivers). In this situation, I do believe officials would listen, especially after the second failure.

At the same time, there hasn't been a year go by (as driver, coach or volunteer) that I haven't seen at least 2-3+ patient, articulate, clear-but-not-obnoxious students, some with documentation or other teams to back them up, never be given the chance to talk and instead get sent away from the box. Nor is it altogether uncommon for these students to be independently proven correct, but only later than necessary and in some cases too late. (The obvious best-known example this year was at championship quals, though thankfully then it wasn't too late.) How many more have had a valid point, and how much difficulty on both sides did the breakdown in communication cost? I understand and feel very dearly for FTAs and my fellow refs, but I can't help but try for alternatives. As this situation demonstrates, there's something to be gained for everyone.

Am I making a mountain of a molehill? Maybe, but it's if so it's far from an uncommon hallucination.

Adam Freeman 23-08-2012 08:04

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by stjonl (Post 1182813)
I think part of the story happen before St. Loius. At MSC the finals were 469, 67 and 830 against 2054, 548 and 245. The red alliance won the first match, but the second match ended in a most unusual note. 2054, 548 and 245 were attempting a tripple balance. A little before that, 67 was in the blue alley and died about two feet in front of the blue bridge. The blue alliance charlie browned the bridge and contact was made with 67 a few times. At the end of the match, the blue bridge was level, but one robot, on the blue alliance side was half on the bridge and the floor. The referees looked it over and huddled up. I believe they call a few 3 point penaties during the match for contact in the alley. The final score was close enough that a blue bridge balance would have given them the match win and force a thrid final match. I can only assume the referees were discussing if there was a bridge balancing interference. As they discussed the issue, the winning teams were call to the floor to cut down the nets. Referees still discussing. Some of the nets have now been cut down. One referee leaves the huddle, removes his striped shirt within a couple steps and is clearly not happy. The rest of the nets are cut down. I can not remember exactly when the referee huddle ended, but at some point the MC them explains that the balanced blue bridge did not count because of the one robot half on the bridge and the floor. No mention of the possible bridge balancing interference. Myself and many others believe that the bridge balance was interfered with, OK, not deliberately, but just the same. That's not how the referees called it. 2054, 548 and 245 could have been denied a chance to win the tournament because of that call. Referees are human and they do the best they can, do not blame them unless you want to fill their shoes.
I was really suprised that the referee huddle was still in progress when they began the ending ceremonies. That really dampened the mode for everyone there.

I am not quite sure what your post has to do with this topic. It's not even correct. The field was given the "all clear" signal and the scores/winners were announced before the nets were cut. Not sure what was going in with the refs (maybe Gary Voshol can clarify). I know there were some upset mentors from the blue alliance. Heck, even the drive coaches for the winning alliance were less than enthusiastic with the way things ended.

Al Skierkiewicz 23-08-2012 08:54

Re: Team 548 Einstein Statement
 
In case everyone doesn't know, I will state this again. Key volunteers are asked to report on their events following each of those events. We have weekly phone conferences to answer questions and pass along the latest data. Specifically, that is the FTA, LRI, and Head Ref as well as others. Those reports will contain information that is beneficial to each of the groups, to improve the next weeks events and FRC in general. In some cases, these reports will prompt a Team Update and/or a rule change. As an LRI I can assure you that one of the persons who hears my report is a member of the GDC and I have access via email and phone to First Engineering. If one of us comes across something that we can verify at the event, then we make the tests and insure that there is a real problem that exists. Then we all document it and report it through our individual lines of communication with HQ along with fixes that may have been found. Reports that come to HQ through other means are also evaluated and checked. While some people on this thread believe that they are not listened to or that they are ignored is simply and categorically untrue. They are opinions, not fact. While I can't really speak for the other volunteers, I can tell you that robot inspectors, FTAs, Refs and First staff are dedicated to improving this competition environment. Those outside of the robot key volunteer organization i.e. event staff, coordinators, pit admin and safety advisors are also dedicated to improving things. If you want to know who those people are, you only have to look here at CD and see who is answering certain questions in a helpful, open manner without trying to be condescending, boastful or argumentative. I can tell you from personal experience, trying to evaluate a problem with few staff and come up with a solution in a week or less is very difficult. That pushes confirmations, phone or email, way down on the list or priorities. Most of the First staff are at events throughout the season so that leaves even less time to come up with solutions. First staff and GDC are always looking at CD for input and they are reading exactly what you write here even if they don't respond.
As Alan pointed out earlier, the Einstein report hints that the 802.11 protocol allows for various types of security and those suggestions from the experts are being employed for next season. Other suggestions related to antenna designs, placement of components and other issues with the wifi infrastructure are also being implemented to insure secure communications with your robot.
David and Siri, let me know if this didn't answer your questions. Alan and Eric thanks for your input.

qnetjoe 23-08-2012 15:31

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by DMetalKong (Post 1182812)
I think that breaking things could be acceptable for use in FRC if the need is strong enough. As long as only firmware is changing, and not hardware, the cost of deployment would not be as great as an entirely custom solution..

There is really no sense in reinventing the wheel. IEEE has been working on protected management frames for a long time. There is a standard called 802.11w-2009 that does this and it was ratified in 2009 and was recently superseded by 802.11-2012 which was just the merging of ten amendments together to help prevent forking.

The next step in the process would be to find hardware that meets FIRST budget and that meets these standard.

just my two cents

If you want to track the 802.11 WG progress/projects here a good link to start off with

http://grouper.ieee.org/groups/802/1..._Timelines.htm

Racer26 23-08-2012 17:08

Re: Team 548 Einstein Statement
 
I'm admittedly a bit late to the party on this one, but here goes anyway.

IMO, 548's statement is what was needed to get the community to put away the pitchforks. BUT, I don't believe the story told to them by the mentor in question. I personally feel that both the story told to 548's committee by the mentor, AND the story in the Einstein Report BOTH overlook the giant elephant in the room, that being the mysterious comms losses through MSC and Newton Elims, exhibiting symptoms consistent with the FCA attack (based on my viewing of the match video I can find), and only to teams that would pose a threat to 548's success in future matches (or current matches, in a desparation effort to avoid elimination).

I'm content to take 548's committee at face value, that this mentor acted alone, and without the knowledge of the rest of the team. However, I feel that the individual who interfered with Einstein isn't telling the whole truth. It is my opinion, and one I'm sure I'm not alone in, that the individual had been using the FCA vulnerability since at least MSC.

Since I believe this individual was acting alone, I hold no grudge against 548 as a team.

Another thing I haven't seen mentioned, is that there was a Week 4 event held in 548's school. Its entirely likely that the mentor discovered the vulnerability there, when they would have had decidedly greater access to a real field than anywhere else.

While I agree with many of the posters in the first couple of pages of this thread that interfering at the highest stage to demonstrate your vulnerability isn't cool, if this person hadn't done that, we probably never would have had the Einstein Investigation, and many of the issues uncovered by it may have continued to go unnoticed.

Ekcrbe 23-08-2012 18:13

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by 1075guy (Post 1182895)
I personally feel that both the story told to 548's committee by the mentor, AND the story in the Einstein Report BOTH overlook the giant elephant in the room, that being the mysterious comms losses through MSC and Newton Elims, exhibiting symptoms consistent with the FCA attack (based on my viewing of the match video I can find), and only to teams that would pose a threat to 548's success in future matches (or current matches, in a desparation effort to avoid elimination).

It's not really an overlook, because the Einstein Report reported on Einstein, and was not obligated to anything else. I believe (anybody confirm?) that FiM is investigating other potential instances of FCA attacks outside of Einstein.

Edit: It seems there is some investigation into matches at MSC by FiM members.

techhelpbb 23-08-2012 18:28

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by qnetjoe (Post 1182883)
There is really no sense in reinventing the wheel. IEEE has been working on protected management frames for a long time. There is a standard called 802.11w-2009 that does this and it was ratified in 2009 and was recently superseded by 802.11-2012 which was just the merging of ten amendments together to help prevent forking.

The next step in the process would be to find hardware that meets FIRST budget and that meets these standard.

just my two cents

If you want to track the 802.11 WG progress/projects here a good link to start off with

http://grouper.ieee.org/groups/802/1..._Timelines.htm

The Cisco 1250 & 1260 series AP (with 32MB) already have support for Cisco Management Frame Protection (MFP). It's a similar idea but prior to the full standard availability. It requires a specific configuration and it requires a Cisco Certified Extensions CCX version 5.0 compliant device. There is support for CCX v5 from Ralink, Atheros and Broadcom. Though mostly as peripherals not in routers, AP or bridges that I could find. So it saves the field side but not the robot side.

Rosewill sells an USB device claiming CCX v5 support but I can't vouch for that personally, the model is:
RNX-N600UBE

I'm not making any recommendations here, if anyone is really interested please discuss the matter with AirTight regarding any caveats.

Liz Smith 23-08-2012 19:57

Re: Team 548 Einstein Statement
 
There is a big issue I see that has been echoed many times in these Einstein discussions. There are problems with assumptions that are not necessarily correct, speculation without any data, and research without the proper tools. I will use myself as an example, but I feel this applies to everyone.

It would be presumptuous for me to consider myself, for example, a FRC wifi interference researcher just by sitting here at home watching YouTube videos of matches. Any information I gain is anecdotal at best. I do not have a full set of data, I do not have a full FRC field in my house (yet??) to conduct my own research—and that’s not necessarily a bad thing.

But... lets say I’m really concerned about the second to last match in the “Regional State District Division Championship” where my team stopped moving during the match. It is way too easy for me to watch a video reply of that match 100 times and convince myself without a doubt that I know why that robot failed.

This is a problem. I can theorize all I want, but it is counter productive for me to arrive at absolute conclusions because I have neither a full set of data or the proper tools to investigate. It would be much more productive for me to report my information and concerns and maybe a link to the video, but then let go of it and let someone else with the tools to fully investigate the issue look into it. If I pursue it further, I just end up making more assumptions and run the risk of assuming that every robot that stops moving in every match I ever watch is because of this one reason… which in reality can’t possibly be true. Going further, if I then go out and present myself as an expert on these matters all I am doing is spreading rumors without any real empirical data to back it up.

I can definitely voice my concerns, but I have to accept the fact that the people whose full time job it is to solve these problems are going to be better equipped than me to draw these conclusions. I know that FRC staff, volunteers, mentors and students... we all want every single robot on the field to function properly 100% of the time but when theories are presented as factual evidence and people make statements that are just speculation it just causes unnecessary confusion.

Basel A 23-08-2012 23:40

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by 1075guy (Post 1182895)
I'm admittedly a bit late to the party on this one, but here goes anyway.

IMO, 548's statement is what was needed to get the community to put away the pitchforks. BUT, I don't believe the story told to them by the mentor in question. I personally feel that both the story told to 548's committee by the mentor, AND the story in the Einstein Report BOTH overlook the giant elephant in the room, that being the mysterious comms losses through MSC and Newton Elims, exhibiting symptoms consistent with the FCA attack (based on my viewing of the match video I can find), and only to teams that would pose a threat to 548's success in future matches (or current matches, in a desparation effort to avoid elimination).

Based on my understanding, it is impossible to reasonably diagnose a failure as FCA based on match video. Indeed, the symptoms consistent with FCA are consistent with roughly a million other problems. Even after FIRST's thorough investigation, all but one case of likely FCA cases were only considered "likely."

I find it disturbing that you're prepared not only to diagnose a robot failure as a complex problem based on minimal evidence, but also ready to indict an individual, about whom you know exactly one thing, of match-fixing at the highest level.

Ekcrbe 23-08-2012 23:53

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Basel A (Post 1182968)
Even after FIRST's thorough investigation, all but one case of likely FCA cases were only considered "likely."

And that case (SF 2-1) was only confirmed because of the individual admitting to it, so there was no testing-based confirmation of FCA being an absolute conclusion.

Gregor 24-08-2012 01:49

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Basel A (Post 1182968)
I find it disturbing that you're prepared not only to diagnose a robot failure as a complex problem based on minimal evidence, but also ready to indict an individual, about whom you know exactly one thing, of match-fixing at the highest level.

I don't believe that he is diagnosing them as FCA attacks, only pointing out the possibility of more FCA attacks that might have happened before Einstein.

Greg McKaskle 24-08-2012 07:51

Re: Team 548 Einstein Statement
 
Diagnosis based solely on video is highly speculative. But knowing more about the behavior of any dashboard feed, the diagnostics on the DS, and any after-match troubleshooting results can move the needle as far as likely. It should require hard evidence or admission to move it to confirmed.

Greg McKaskle

techhelpbb 24-08-2012 12:00

Re: Team 548 Einstein Statement
 
I'm trying to make this my last post in this topic...

This is a summary of the advice I've given my team:

Wireless networks like this are assumed to occasionally be unreliable and in order to handle the added complexity they implement solutions that may or may not be sufficient to make them as reliable as possible.

It is wonderful what the specifications for these networks would lead you to believe as a selling point for that technology. It is wonderful what demonstrations you can make to test those specifications in one circumstance or another.

However, at the core this technology creates a link subject to some unreliability even when you don't have someone trying to make it unreliable intentionally.

It's wonderful that FIRST is trying to make these links as reliable as possible but we as the robot builders can help by making our robots less dependent on the wireless network being entirely reliable for every instant we use it.

If the parts of the robot we as the robot builders control are less dependent on the reliability of the wireless network it will be much harder for an unforeseen situation over a short period of time to decrease our competitive performance. Regardless of whether that short interruption is from someone trying to cause trouble or an unforeseen circumstance.

Our team is student-led. I'll let them decide how to deal with that. I'm confident there are many things they can do with that advice to improve the competition performance of a robot.

This advice leaves FIRST additional room to have undetected problems in their network for short periods of time for a large number of possible reasons. So this is intended to be constructive and positive leaning guidance. I think it's fair to point it out because I don't expect that it's entirely level headed to charge FIRST with doing something quite hard with a great number of variables and expect there to be no issues along the way.

Alan Anderson 24-08-2012 17:10

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1183017)
It's wonderful that FIRST is trying to make these links as reliable as possible but we as the robot builders can help by making our robots less dependent on the wireless network being entirely reliable for every instant we use it.

Unfortunately, "we" can't do much at all about the robot's dependence on the network. When the cRIO isn't getting continuous "enabled" signals from the Driver Station, it shuts down all the motors and other actuators. That's something completely beyond the control of robot builders.

What kind of help were you thinking of?

GaryVoshol 24-08-2012 17:32

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Adam Freeman (Post 1182840)
I am not quite sure what your post has to do with this topic. It's not even correct. The field was given the "all clear" signal and the scores/winners were announced before the nets were cut. Not sure what was going in with the refs (maybe Gary Voshol can clarify). I know there were some upset mentors from the blue alliance. Heck, even the drive coaches for the winning alliance were less than enthusiastic with the way things ended.

Without disclosing any confidences, all I can say is that the refs did not have an extended discussion about F-2 and the results. Any observers may have seen us talking generally among ourselves as we ended the event.

techhelpbb 24-08-2012 21:55

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Alan Anderson (Post 1183062)
Unfortunately, "we" can't do much at all about the robot's dependence on the network. When the cRIO isn't getting continuous "enabled" signals from the Driver Station, it shuts down all the motors and other actuators. That's something completely beyond the control of robot builders.

What kind of help were you thinking of?

The next line of the post you quoted:
"If the parts of the robot we as the robot builders control are less dependent on the reliability of the wireless network it will be much harder for an unforeseen situation over a short period of time to decrease our competitive performance."

I am giving FIRST credit that no error on the scale of the FCA will escape the FRC development process in the future. If it does I assume FIRST will find a way to replay the matches or figure out the resolution as quickly as possible. Obviously to some extent the existing robot signal/status lights (RSL) and driver's station diagnostics are a start to let the people on the field find problems.

Last I checked the disable state does shut down all the motors and actuators in the hardware of the digital sidecar but not all the digital I/O (GPIO), or the I2C, or the User1 light on the cRIO. The rules seem to consider this with how things that can cause movement are connected to the digital sidecar. However I don't think the rules prohibit indicator lights you can see off the field correctly designed to be connected to the GPIO pins or I2C.

Certainly code can execute in the cRIO regardless of enable. If you loose communications you loose your ability to send status back to your driver's station over that link. Of course you loose your ability to move for the sake of safety. However you retain the ability to manipulate those I/O and can use those to deliver status information that might be valuable even if you can't communicate to the driver's station or get near the robot. You'd be able to use the flash memory as well while disabled and then you can retain that data even after the robot is powered off (course it is flash memory so wear leveling could be a concern).

Although that doesn't improve your movement situation if you end up loosing communications long enough to miss enable. Using something like that along with the information communicated by the RSL and the driver's station would give you a lot of information that your code is doing what you think it is, when you think it is, when the field thinks it is, even on a competition field where you can't get near the robot or even communicate with it. In fact if the robot can't communicate with you it could signal that (some might think that redundant but you never know it might have helped Team 118 on Einstein).

Separate from the issue of not getting enabled:

Sending large amounts of data back and forth consistently using TCP and making that critical to the control of the robot is going to increase the chance that a momentary interruption or delay will cause adverse consequences. TCP is going to try to deliver that data but who knows how long it'll take.

UDP, which isn't a reliable protocol however, will still generate useful communications. Someone can create a transaction system with UDP that can actually loose messages and ignore messages unlike TCP trying to help by pretending the link is reliable (when it might be busy or experience some wireless issue). The FMS seems to use UDP a lot itself.

My concerns fall along the same lines as what happens when a critical sensor has become disconnected and you don't detect that it's been disconnected. However the code expects input from that now disconnected sensor in a loop from which you cannot escape and so everything is stuck (it blocks).

What will happen if you loose your camera feed to the driver's station or it suddenly starts getting really dysfunctional and that's the only choice you have for some critical function? What will happen if your driver's station is running code to process that video and the camera feed is disrupted? What will happen if your robot is enabled and keeps waiting for information from the driver's stations and that information is delayed? What will happen if you put a lot of debugging information in to send back to the driver's station and it takes longer than you expected based on tests back home? What would happen if you send a lot of packets to the cRIO and your code didn't read them fast enough and you start to overflow the input buffer (buffer overflow 'exploit' right from the Einstein report starting on page 13)?

Obviously if someone can actually defeat your ability to see the enable your movement driving outputs from the digital sidecar will disable for safety (excusing momentum). Then you have to consider the physical status of the actuators that stopped if you return to the enabled state from that unexpected disabled state.

However, the system can obviously loose packets so the idea of continuous enable transmission seems to give the wrong impression (it is continuous but you can loose some packets and not get disabled). There's even a counter for missed packets in the Field Monitor Software (FMS) and the manual where it says: "Typically there are some lost packets. In a very tame wireless environment, this number will be less than 100." (Page 49, Rebound Rumble FMS manual, Rev. 0). Along with the average time it takes for traffic to go from the driver's station to the robot and back (average meaning not necessarily instantaneous round trip time). That information comes from the driver's stations to the FMS about every 100ms from what I've researched. Unfortunately every interruption to the link is going to delay delivery of TCP packets and might actually loose your UDP packets entirely. Obviously the counter existing with that note in the documentation for the field operators indicates that this happens at least 100 times in a very tame environment, what about a not so tame environment? Also that counter is for each team.

I can provide the links to back this up but I'm not sure I want to be linking the FMS manuals to this site. It might not stand the test of time and I'm not sure if there are rules about it.

Greg McKaskle 25-08-2012 00:48

Re: Team 548 Einstein Statement
 
Quote:

..You'd be able to use the flash memory as well while disabled and then you can retain that data even after the robot is powered off (course it is flash memory so wear leveling could be a concern)..
The flash drivers already implement wear leveling. The cRIO was designed as a monitoring/control device with a highly reliable file system and is used by industry to log data in remote and harsh conditions. Log files that detail how your robot operates are a good technique independent of any communications issues. Knowing whether the robot leaves auto, extends the arm too far, or dies entirely is helpful to everyone. Please keep in mind that the logging isn't free and it is possible to log so much data that the cRIO will not have the CPU needed to drive the robot.

Quote:

In fact if the robot can't communicate with you it could signal that (some might think that redundant but you never know it might have helped Team 118 on Einstein).
118 DS logs clearly showed what was happening on Einstein regarding communications. It showed that the robot was being told to enter auto, the CPU spiked to 100%, and the robot stayed in communication for several seconds longer responding with its voltage and other fields but never indicated that it completed processing the auto command. There were plenty LEDs on 118, and if the code had been executing as expected, if there had been a comms issue, they could have been used to show extra info and logging could have helped as well. The difficulty with 118 was identifying how and why the CPU went to 100%.

Quote:

Someone can create a transaction system with UDP that can actually loose messages and ignore messages unlike TCP trying to help by pretending the link is reliable (when it might be busy or experience some wireless issue). The FMS seems to use UDP a lot itself.
All traffic from FMS to DS to Robot and back are implemented using UDP with redundant info and some tracking data to calculate trip times and lost packets. TCP is used for smart dashboard and by dashboard cameras.

Quote:

... However the code expects input from that now disconnected sensor in a loop from which you cannot escape and so everything is stuck (it blocks).
The code on 118 was unique to their gyro reset done as auto began. I don't think anyone would recommend putting a tight loop into the code waiting for a sensor condition. The 118 SW mentor didn't know the code had been added. If the CPU hadn't pegged in the blocking loop, the dashboard and robot behavior would have helped identify that the gyro was disconnected.

The buffer issue mentioned was a secondary issue that explained why 118 couldn't be rebooted from the DS. It didn't directly contribute to the failure. It is an artifact of the version of VXWorks that runs on the cRIO. It allows for improperly written code in one task to impact the communication of other tasks. The buffer was full, not overflowing, and there was no exploit.

The robot disable occurs when no DS commands have been received for 100ms. The packets are sent every 20ms. So it will take 5 sequential packet losses to trigger a disable. The robot will be enabled as soon as another packet arrives, perhaps as short as 20ms. The Einstein communications, as measured and logged by the DS, was very quiet, almost equal to an ethernet cable, except for a field-wide burst in the final match. This may have been external noise such as a lightening strike. Logs of the Einstein robots during qualifications showed far more interference but no disabling caused by it.

Please ask if there are other questions about the Einstein Report.
Greg McKaskle

techhelpbb 25-08-2012 03:41

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Greg McKaskle (Post 1183110)
The robot disable occurs when no DS commands have been received for 100ms. The packets are sent every 20ms. So it will take 5 sequential packet losses to trigger a disable. The robot will be enabled as soon as another packet arrives, perhaps as short as 20ms. The Einstein communications, as measured and logged by the DS, was very quiet, almost equal to an ethernet cable, except for a field-wide burst in the final match. This may have been external noise such as a lightening strike. Logs of the Einstein robots during qualifications showed far more interference but no disabling caused by it.

I'm unclear on this:

The shortest time delay in the 5 possible RSL light status patterns is 100ms for the off time of the teleop enabled mode.

So if you miss 100ms of communications, become disabled, then 20ms or even 60ms or 80ms passes before you re-enable from a DS packet you might not notice the change in the pattern of the RSL pattern even though you've disabled briefly.

The charts tab in the DS shows when the robot is enabled or disabled even for short periods of time.

The DS sends data to the FMS every 100ms and the FMS logs every 500ms in the match review.

So is it possible for the DS to notice that the robot transitioned from enabled to disabled back to enabled between these 100ms bursts back to the FMS and not report the robot state transition because it happened between reporting intervals to the FMS?

Greg McKaskle 25-08-2012 08:43

Re: Team 548 Einstein Statement
 
Quote:

.. you might not notice ..
Correct. The RSL is a pretty crude indicator of the robot state. Keep in mind that a human blink is at least 100ms. I've also reviewed the logs with the drive coach and shown them brief disables that neither they nor the drivers noticed during a match. I've also seen robot logs, very successful robots, that only process the teleop every 60ms and they seem fine with the rate. In other words, they choose to ignore two out of three control packets even though the CPU usage was quite low.

Actually, the FMS<-->DS comms are at 20ms as well. The FMS logs are somewhat slow from what I've seen -- between 2 and 4 points in a second. The DS reports everything it knows to the field. But at this point, the DS log data is the best indication of what took place on the robot and with the comms.

Greg McKaskle

techhelpbb 27-08-2012 10:52

Re: Team 548 Einstein Statement
 
Am I correct that the missing packet indicators on the FMS and the lost packet counters in the charts tab of the driver's station are counting only the UDP packets that FIRST is using for DS<->Robot communications? It's clear that the average round trip calculation depends on those packets.

Is there any additional monitoring in place on the current fields to track bottlenecks, lost packets, and other TCP/IP behavior while the field operates besides those counters? I mean besides one of the driver's station operators peaking at that with System Monitor?

Is there any kind of prioritization for the UDP traffic imposed by the field and D-Link AP?

What process is in place to prevent the improper configuration of the Windows TCP/IP stack in the driver's station? Specifically with respect to TCP sliding windows and window scaling?

I ask these questions because of situations where UDP packet traffic sees the unintended side effects of TCP bottlenecks. The effect that concerns me is discussed at length in this link:
Characteristics of UDP Packet Loss: Effect of TCP Traffic

If we can see 100 UDP packets disappear during a match in a very tame wireless environment, how much TCP bottlenecking (and packet loss) is really going on impacting the Smart Dashboards and TCP based web cameras?

You can write software to get to all the raw counters you can see in the System Monitor on Windows like this:
Raw performance data class

It's not clear to the me the driver's station is using the Windows API to collect the lost packet information.

Though even if you did use that source of information you could only monitor with respect to the TCP/IP stack of driver's station. I suppose using the UDP packets to track performance like this was easier than modifying the D-Link AP to run DD-WRT or OpenWRT and passing back it's TCP/IP status statistics to the driver's stations and back to the field. Keeping in mind that the cRIO can't see the traffic generated by the other devices not addressed to the cRIO and plugged into the D-Link AP switch (the D-Link AP doesn't seem to support any kind of port tap to bypass the switch and I doubt it would be wise to ARP poison it).

Also which IP stack for VxWorks is in the cRIO: the BSD stack or the Interpeak stack? The older BSD stack source code supports the features that concern me as can be read here:
Wind River VxWorks TCP/IP stack

It sounds from the description of the buffer configuration above it might have the Interpeak stack in it?
I'm curious to see if there's more than RFC2581 in that TCP/IP stack for congestion control.

Brian

Racer26 27-08-2012 17:59

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Basel A (Post 1182968)
Even after FIRST's thorough investigation, all but one case of likely FCA cases were only considered "likely."

I find it disturbing that you're prepared not only to diagnose a robot failure as a complex problem based on minimal evidence, but also ready to indict an individual, about whom you know exactly one thing, of match-fixing at the highest level.

As the earlier poster mentioned: The only "confirmed" case was the one admitted to.

The "likely" cases are ones that the Einstein committee (18 industry experts, plus the 12 teams, 548 included) agreed were reasonably likely to ALSO be caused by the FCA exploit, based on evidence available in terms of match video and DS logs, and the circumstantial evidence of multiple eyewitness accounts stating that they had viewed the individual punching away on the Galaxy Nexus phone at a screen containing numbers of the teams on field at various points throughout Einstein, one reporter distinctly remembering 1114 being targeted. Many people seem to be overlooking (or at least glazing over for the purposes of peacekeeping) this part of the report, when in actuality, its relatively damning.

As for the OTHER cases, outside of Einstein? Nobody investigated them in a proper investigation (at least not yet, to my knowledge, I read in this thread that FiM is conducting something related to MSC), so we may never know for sure.

The match video I've found exhibits the same symptoms as those seen on Einstein and documented in the Einstein report as what would be visible to an astute observer watching a video from a distance (a flashing RSL indicating robot power is still present, and a Flashing alliance station wall light indicating a lack of communications with the robot). I fully agree that these alone do not a complete diagnosis of FCA make.

However, with the circumstantial evidence that a 548 mentor was tampering with the system in one admitted case, plus several other likely cases, according to 42 experts (18 industry + 2x12 team reps). This individual was presumably intending to influence the outcome of the matches, and that makes it reasonably believable to me that these other matches I can find with FCA-like symptoms, being cases where the disabled robot(s) being disabled would pose a distinct advantage to 548, would probably also be attributable to the FCA exploit.

Am I ready to indict this individual of match fixing at the highest-level? YES! They ADMITTED to that, and that's why they're no longer welcome at FIRST events! However, yes, I further believe that they fixed many more matches than they've admitted to, and I know I'm not alone in that belief.

As I stated in my earlier post though, I hold no grudge against 548, because I'm willing to take the TEAM at their word that this INDIVIDUAL was acting ALONE and without the team's knowledge. Its not fair to the present and future students of 548 to have to be punished for something a mentor of theirs did sometime in the past. They're a 3-time district chairman's award team. They are doing good things, and the kids whom they're trying to have an impact on don't deserve to be chastised by the community at large for the actions of someone they trusted. I'm sure they're probably MORE devastated than the rest of the FIRST world, since this mentor violated their trust and damaged their team's hard earned image as a leader in possibly irreparable ways.

Greg McKaskle 27-08-2012 19:39

Re: Team 548 Einstein Statement
 
Edited to condense the questions. Answers marked with ***'s.
---------------------
Am I correct that the missing packet indicators on the FMS and the lost packet counters in the charts tab of the driver's station are counting only the UDP packets that FIRST is using for DS<->Robot communications?
*** Yes. The trip time and lost packets refer to the control/status loop between DS and robot.

Is there any additional monitoring in place on the current fields to track bottlenecks, lost packets, and other TCP/IP behavior while the field operates besides those counters? I mean besides one of the driver's station operators peaking at that with System Monitor?
*** If those other aspects impact the control/status loop, then the CSA, inspector, or FTA will use other system tools to determine what is causing the problem. The DS monitors a few cRIO factors such as CPU.

Is there any kind of prioritization for the UDP traffic imposed by the field and D-Link AP?
*** In 2012, default settings were used. The report indicates that QOS may be configured in coming seasons.

What process is in place to prevent the improper configuration of the Windows TCP/IP stack in the driver's station? Specifically with respect to TCP sliding windows and window scaling?
*** Nothing except for overall monitoring of the control/status loop. If that is working poorly, the CSA, inspector, or FTA may decide to look at TCP configuragion, but honestly, that is getting pretty obscure.

If we can see 100 UDP packets disappear during a match in a very tame wireless environment, how much TCP bottlenecking (and packet loss) is really going on impacting the Smart Dashboards and TCP based web cameras?
*** That is less than one packet per second. If TCP is having issues and retransmitting, that will likely impact the UDP and the FTA or others would look into it.

It's not clear to the me the driver's station is using the Windows API to collect the lost packet information.
*** It is not. If you believe that information would be helpful instead or in addition, I'm sure it can be added.

Also which IP stack for VxWorks is in the cRIO: the BSD stack or the Interpeak stack?
*** I don't have a cRIO with me, so I can't answer your question.

Greg McKaskle

techhelpbb 28-08-2012 12:14

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Greg McKaskle (Post 1183324)
If those other aspects impact the control/status loop, then the CSA, inspector, or FTA will use other system tools to determine what is causing the problem. The DS monitors a few cRIO factors such as CPU.

What tools besides the Windows Performance/System Monitor and ping are available to everyone to diagnose such a situation?

Traceroute won't do much good considering the robot is bridged.

Ping works to some extent because it's ICMP echo and on layer 3 therefore while it's wrapped in IP it's not really TCP or UDP. So if you start ping toward the cRIO you'll see the congestion that is impacting TCP and UDP which are on layer 4. Unfortunately, if you ping the cRIO from the driver's station you'll see the congestion but not necessarily at which point in the communications path the congestion exists. In fact ICMP has not just the ability to detect congestion it also has the ability to throttle inbound traffic that causes congestion of the local receive buffer with the source quench message, which if the sender responds to (and it should) should cause it to back off.

To my knowledge current Microsoft Windows TCP/IP stacks honor source quench requests if they are doing the sending but do not generate them when they receive. Instead it's common for devices that have filled their input buffers to simply drop packets. This behavior appears to be the same in the older VxWorks stack(s).

VxWorks BSD TCP stack
Code:

/*
 * When a source quench is received, close congestion window
 * to one segment.  We will gradually open it again as we proceed.
 */

If someone were to create some ICMP source quench packets they could throttle the remote senders back when their receive buffer is almost full, completely full or just because they need to alter the status of the network communications. For example to force a sender with a large window to reduce it ASAP so it doesn't impact other traffic. (There's nothing stopping a DiffServ QoS as described directly below from using the ICMP source quench in it's own way. No idea which specific hardware FIRST might use for the QoS function so no idea if ICMP source quench will be present or how it will be used.)

Quote:

Originally Posted by Greg McKaskle (Post 1183324)
In 2012, default settings were used. The report indicates that QOS may be configured in coming seasons.

I presume FIRST will implement DiffServ which is stateless? I suppose one could use Intserv but that requires reservations (RFC2210) to operate and while it's possible VxWorks in the cRIO could pull that off the Axis cameras do not support it. Axis themselves did confirm the cameras support differentiated services code point (DSCP - RFC 2474) per function (audio, video, alarm...)

DiffServ has some end to end issues that are worthy of noting:

1. DiffServ doesn't track all statistics for all open flows (plus side it needs less space to operate, downside it isn't as aware of long term quality issues).

2. DiffServ in high packet loss situations tends to give you the choice to scalp one class to get additional bandwidth for another class (but can't be sure that in the long term that the scalped class is assured bandwidth either).

3. DiffServ has a compensation class but that only helps you if you have some idea of the limits of the uncertainty in the network and if the devices can handle that behavior. In short, if you give the compensation class a large amount of bandwidth to pull from it'll allow more flow to another class to make up for a shortage, but if the packet loss is high you need to make the compensation class larger and that still won't assure that high packet loss over a short period won't reduce the ability of traffic to flow at all.

This is only made worse because the TCP sliding windows and window scaling I noted above will very likely not be smart enough to differentiate between a congestion issue and a packet loss. This congestion issue with TCP has existed for a very long time. It's the equivalent of a hole the size of a dime and the need to pass a half dollar. Sure if you grind the half dollar down long enough it'll fit through the hole but it's going to be unpleasant. The solutions to this problem are called TCP congestion avoidance algorithms. The choice of which algorithm you use can have a dramatic impact on your network performance. TCP-Vegas as implemented in DD-WRT, OpenWRT, Linux and BSD can more effectively respond to packet loss from congestion and packet loss from the radio layer on largely unidirectional links (IE: the TCP video is a large amount of bandwidth headed one way). It is unclear to me at this time if Axis cameras, Microsoft Windows, or VxWorks supports TCP-Vegas in their TCP/IP stack. Keeping in mind sometimes you need to tune the queues with TCP-Vegas.

On the one hand a DiffServ QoS unit sitting on the field side will throttle back the senders it can actually communicate with over a short duration which should impact their maximum flow rate over the longer duration. On the other hand, when the packet loss of the network due to WiFi issues pops through (or someone causes packets to be dropped at the radio level) the senders can't see the QoS unit on the field side so they'll resort to their TCP congestion avoidance algorithm of choice. With TCP-Reno/New Reno (we are certainly using one of these now) depending on how that sits it could still cause flooding moments after a packet loss. A handy example:

Performance Evaluation of TCP Variants In WiFi Network Using Cross Layer Design Protocol and Explicit Congestion Notification

Quote:

Originally Posted by Greg McKaskle (Post 1183324)
Nothing except for overall monitoring of the control/status loop. If that is working poorly, the CSA, inspector, or FTA may decide to look at TCP configuragion, but honestly, that is getting pretty obscure.

Why do an analysis at all? Why not just set up the field and optimize the TCP-IP parameters as a baseline for the acceptable Windows OS for the driver's stations and the cRIO? Then distribute those settings in a simple 'registry file' export from RegEdit or RegEdt32 for installation or comparison.

Probably should also note the following:

1. It is very unlikely the FIRST driver's station will need to become the local master browser. One could turn that feature off in the LanMan parameters in the Windows registry. Even if the students take that laptop off the field and use it else where it would only really be an issue on a network in which they are the only Windows computer and no Samba is running. This NetBIOS feature serves no purpose in the current field and robot systems but it'll generate a handy election and quite likely use NetBIOS over TCP-IP to do it. As an alternative one could turn off NetBIOS over TCP-IP which does have a GUI option to change. If anyone is interested on more details look at the Samba project.

2. IPV6 ought to be turned off. Especially in Windows 7. Windows 7 has a perverse tendency to use IPV6 first and IPV4 later and not only does IPV6 have so many security concerns that I could fill a book I don't think any device we have supports it unless FIRST is using the InterPeak stack configured for it in the cRIO. I'd be interested to know if there is IPV6 usage in the FIRST ecosystem.

One could take this easily a step further. They could write a simple program or even a script to back up the relevant local Windows system registry entries. Make these changes in preparation of the driver's station function. Then return the original settings when the driver's station is done. In point of fact one could even set a System Restore point but that might get a bit out of hand with regards to storage since you only need to alter a trivial number of keys. On the plus side a program could easily locate the keys to disable the IPV6 protocol for the wired adapter you'll be using.

Quote:

Originally Posted by Greg McKaskle (Post 1183324)
That is less than one packet per second. If TCP is having issues and retransmitting, that will likely impact the UDP and the FTA or others would look into it.

In a 135 second long match (2 minutes, 15 seconds) you're absolutely correct that if the UDP packet loss of 100 packets per match was distributed evenly that would be less than one packet per second. However, as we agree your enable/disable timer in the cRIO will time out in 100ms. So you can loose 3-4 driver's station generated UDP packets with the enable/disable state in them in a row before the robot runs the 100ms timer out and disables. In 1 second you have fifty 20ms intervals. In theory if the timing is perfect (and let's face it the Windows TCP-IP stack will not reliably send those UDP packets precisely every 20ms and the latency of the link to the cRIO will impact the timing) then you can loose 40 of those UDP packets per second and still not be disabled and you have 135 seconds in which you could do that.

The reason for the math is that if you look at the link below again with my concerns, the TCP sliding window and window scaling functions have their effect over a duration than can easily be in seconds (see Figure 2-5 in link below). So it's possible for the trouble to start, build, drop a UDP packet or a bunch, cycle back, start, build, drop another UDP packet or a bunch. Meanwhile the entire time packets are dropping and devices are making their choices of congestion avoidance process (each algorithm has a set of processes at work) during and after each packet drops. Not just because of radio level issues but also because of congestion and with TCP-Reno/New Reno there really is no way to tell the difference unless there is some mitigation inserted like ICMP source quench.

Characteristics of UDP Packet Loss: Effect of TCP Traffic


Quote:

Originally Posted by Greg McKaskle (Post 1183324)
It is not. If you believe that information would be helpful instead or in addition, I'm sure it can be added.

I think the best visual representation of that data is a difference from data point to data point from the original start values of the TCP statistics to the final values over time. Pretty much Windows System Monitor already provides this facility. At least it's something to help people diagnose their own issues. Perhaps highlighting it's value will be helpful to some people.

Unfortunately neither the driver's station charts tab nor the Windows local TCP-IP stack statistics show the end users where precisely along the communications path congestion or momentary packet loss occurs. In the same way that showing the average round trip doesn't represent the instantaneous round trip time or even the time to get to the robot versus the time to get to the driver's station. The devices that can currently most determine whether packet congestion is due to packet loss in the radio layer or congestion at the wired sides of the radio links are the APs. There is no facility in TCP-Reno/New Reno itself to calculate round trip time (RTT) which would illuminate that data at any moment is disappearing (it uses a timer usually 200ms-500ms and hence the several second escalation). Even if there was a Intserv QoS unit in the field side of the communications path it wouldn't be able to determine the cause of packet loss from that vantage point (Intserv QoS does actually have information in the duration about the flows through it). In a way the UDP traffic DS<-> robot with the round trip timer represents an addition over TCP-Reno/New Reno that could better arbitrate issues but that driver's station UDP traffic is both too slow to really force down the TCP sliding windows and window scaling for it's own benefit (I write this in relation to the UDP/TCP link I posted above so if it's not apparent please reread that link) and doesn't implement ICMP source quench that I can see. If the driver's station implemented ICMP echo (aka 'ping') and ICMP source quench you could interleave the UDP packets and ICMP echo requests and monitor congestion on layer 3 and layer 4 to every IP device on the robot (to which you could then send individual ICMP source quench messages). Course if DD-WRT or OpenWRT was on the robot AP we could not only send back the statistics to the driver's stations and through that the field we could also use TCP-Vegas as the TCP congestion avoidance algorithm under the right circumstances which would need to include support on the Cisco end. The big draw back of TCP-Vegas I know of shouldn't matter to a FIRST system. TCP-Vegas doesn't play well in live routed environments if you change the routes, because that action invalidates the round trip data.

Windows does have Compound-TCP as a congestion avoidance algorithm since Windows Vista and it's been partially backported, and Linux has support for it since 2.6.17 but I'm not sure it works for Linux at this time. So far as I know it's disabled in Windows Vista and up by default. It can be enabled like this:
How to enable CTCP

CTCP helps pull down the TCP-Reno congestion avoidance algorithm by maintaining 2 windows. Again something else on the driver's stations to consider. Also please take note that you might have to enable more than CTCP and the timestamps might be an issue for the devices on the robot. The VxWork's older stack seems to support it, so the newer InterPeak stack should as well. The bad part of CTCP is that it would more appropriately mitigate the Windows driver's station effect on the network but it wouldn't by necessity impact all the other devices and what congestion avoidance algorithms they'll use and cause issues with. Unlike the field and robot APs this CTCP algorithm can only effect the communications between the Windows driver's station and whatever else it talks to, so usually the cRIO and FMS. So in effect it takes one source of trouble out of the picture but leaves the rest which will use whatever TCP congestion avoidance algorithm they like with whatever consequences might follow.

I would have suggested TCP-Cubic that is implemented in Linux Kernel 2.6 backported to 2.4 but there's an issue with it that concerns me regarding it's remaining ability to burst after a packet is lost over the radio. Without testing it's hard to say but the nature of all the video data going back to the driver's station might favor TCP-Cubic. I'm just concerned that it's not going to behave as well as it could if TCP-Reno/New Reno remains in operation on the same network and that may not be entirely avoidable. Mind you TCP-Reno/New Reno on the same network as TCP-Vegas will still be unfair to TCP-Vegas but from what I've seen not quite as bad (course that's on wire).


*SUPER SIMPLE VERSION:*

You have a hole big enough for a dime but you want to put a half dollar through it. So you grind up the half dollar and put it through that hole little by little. You have a choice of processes to make sure that as much of the half dollar gets through that hole as quickly as possible. I'm merely suggesting a different way to react to loosing little pieces of the half dollar which you eventually find. I think the way it's handled is slower than it needs to be to get just as much of the half dollar through that hole.

Just to make that even more interesting more than one person is trying to send their own half dollar through that same hole at the same time.

The half dollar is your data.
The hole is your network.
The multiple people are the multiple network devices.
There are lots of reasons you all loose some of the ground up half dollars (which are the packets).
There are different solutions (TCP congestion avoidance algorithms).

Now I'm hiding all my half dollars before someone wants a demo.

Brian

techhelpbb 29-08-2012 12:23

Re: Team 548 Einstein Statement
 
I'm no longer able to edit the post above. So I'll append additional information like this:

If someone reads the section above where I described DiffServ they'll probably wonder why I keep suggesting field side QoS. The D-Link AP supports some QoS in the form of WMM. The Cisco 1252 supports some QoS in the form of 802.11e. As far as I can tell that was not turned on this year or any previous year. The reason I've discounted it as an option is that a full DiffServ implementation already ignores a lot of information to reduce the resources it requires to perform QoS. 802.11e and WMM ignore even more information. On a wire based DiffServ implementation there are technologies that look at packet flows not tagged with DSCP by the network devices sending them or can guess at the proper QoS class for traffic by the source or destination information in the packets (like Cisco nBAR). This information is not acted upon with 802.11e or WMM because it would require serious stateful packet inspection and the resources that usually demands. So even if someone turned on 802.11e and WMM the devices that send to the AP link would need to tag their packets with DSCP so the APs on either end would know what class the traffic is supposed to be. Such tagging is supported by the Axis camera as noted above. I see no reason that VxWorks can't do the DSCP tagging either. However, any other network devices that might send traffic over the wireless link would be questionable. For example the COTS rule allow a laptop on the robot and that laptop might be sending video, images, or even streaming from VideoLAN back to the driver's station. Such a laptop generally wouldn't have the easily added ability to tag it's traffic with DSCP. The 802.11e and WMM devices would have to default that untagged traffic as classless and the contention mitigation process they use in the QoS might therefore be unfair to the people that implement such solutions. Worse it's hard for FIRST to know whether or not you'll use certain types of traffic. For example you might not use a video feed back to the driver's station. So it would be harder on FIRST to explain why they might blindly impact the performance of your robot devices to enforce a QoS policy with 802.11e and WMM that considers devices you don't actually have. FIRST could alter the QoS parameters on the robot AP and the field AP if your robot doesn't use certain classes but then that adds further convolution to the configuration process for both (and remember a lot of people cycle through the fields with their robots).

To extend my concerns about DiffServ to 802.11e and WMM. In the most abstract simple sense 802.11e and WMM categorize the classes defined by the DSCP tagging into a smaller number of classes. Those classes exist before the radio layer. So the radio layer doesn't really change. It's just that the data presented to the radio layer to communicate is prioritized differently than it might have been with a single queue going into the radio layer. So now the radio can spend more time trying to effectively communicate certain classes of traffic over certain other classes of traffic. Again, if the radio layer is interrupted or has a difficult time finding time to send certain classes of traffic the QoS function of the APs has to decide what to do with all the data coming into it that it has no way to rid itself of. Most devices as noted above will simply start ignoring and therefore dropping inbound packets. That packet dropping behavior triggers one of the TCP congestion avoidance algorithms mentioned above. Which in turn means that: if you have several devices in different DiffServ classes lumped into a smaller number of categories the devices within each smaller number of categories will impact each other's access to that category of service going over the radio link. Essentially the QoS process will really only help some devices rise above others when the communications link actually can pass traffic. Since DiffServ isn't completely aware of all flow statistics it may not notice that it's robbing from one class to give priority to another class when you account for packets lost in the radio layer (someone that knows exactly what I'm talking about will consider MAC parameter tuning to mitigate that but that's tuning and that's a lot of extra considerations for a lot of different robot designs). These caveats justify why the TCP-Variant (TCP-Reno/New Reno, TCP-Vegas, TCP-Cubic, Compound-TCP) still matters even with 802.11e and WMM.

So basically I figure implementing 802.11e and WMM represents way more work that FIRST is willing to put up with during the field operations (and worse to really tune that would be a field by field, robot by robot process so it's really even more painful than it sounds). More of that work could be absorbed in a field side wire implementation of DiffServ for QoS. Of course if something at the application layer (the driver's station software) were only smart enough to look back down the links to all the network devices (which it can get because the teams can configure it as such) one might be able to craft a flow control process that works even without any QoS device. Such a flow control process could use ICMP source quenches as outlined above already and could be designed to be aware of the priority of communications for specific network device functions on the robot as dictated by the application layer uses (so a driver's station would carry all the specific tailor made TCP/IP traffic priorities for a robot and the teams could start to figure out those priorities without a field to work with).

The process of using ICMP source quench is a primitive form of Explicit Congestion Notification (ECN). For about 20 years it's been known that messages such as ICMP source quench can be used to cause issues for a device that supports it. From the perspective of Internet security allowing anonymous ICMP source quenches to slow down your ability to send data obviously is not the best idea. Most firewalls and routers for the Internet allow one to block ICMP entirely including both echo (aka 'ping') and source quench (there's no reason it can't be used in a private network or by FIRST as long as it works). In comparison there's a newer standard less available to the application layer defined in RFC3168 called specifically the Explicit Congestion Notification. I did not suggest that anyone use the newer ECN even though it's been around for more than 10 years (though my instructions to turn on CTCP above mention how to enable the newer ECN in Windows Vista and up). There's a bunch of issues with the newer ECN and some of them impact the performance and connectivity to D-Link devices and certain websites. Opening a hole in a driver's station firewall for ICMP source quench on the local network is one thing (odds are good your local Internet routers will block attempts to pass it to a driver's station while it may be connected to the Internet). Leaving strange behavior in the driver's station that might psuedo-randomly appear in other use is quite another matter and that might happen with the newer ECN. Another upside of implementing ICMP in the driver's station software is that you can use both echo and source quench just by doing that so it's likely to be more compatible and quicker to develop.

On a separate topic from what I just wrote about but still related to my post directly above:

It might benefit FIRST to log the information from the Windows local TCP/IP stack in the driver's station software as I suggested and provided information to do above. This would enable FIRST to have an authoritative set of entries in the match log synchronized with the driver's station software communications with the field and robot. A log that is also retained by the same process as the current DS charts and logs. In that previous post I only considered the visualization of that data when it really does matter if you can get that data in a reliable way long after the matches end.

Again on a separate topic from what I just wrote about but still related to my post directly above:

Also as a bit of a footnote, I suggested disabling IPV6 above on the driver's station. Please be aware that if you disable IPV6 and try to create a new Windows 7 Home Group it'll fail. Windows 7 Home Groups use Peer Name Resolution Protocol (PNRP) and that requires IPV6. You can disable and re-enable IPV6 it'll just break Home Group's ability to resolve names while it's off (there's still the cache). I doubt any device in the FIRST ecosystem depends on a Windows 7 Home Group and there are plenty of better ways to move your stuff without resorting to that.

Brian

Greg McKaskle 30-08-2012 07:56

Re: Team 548 Einstein Statement
 
Sorry it took so long to reply. Had to read all the links ...

Backing up a few steps, I think the first thing is to monitor and log what goes on. At an event, things get a bit simpler since FIRST owns the AP and accompanying management SW and also has an Airtight. Channel level monitoring was already common, and I suspect the displays will be enhanced to include bandwidth monitoring per team.

In a build situation, it is more difficult, but if the team AP on the robot is instrumented, that info can be used when not on the field.

is there a TCP bottleneck problem? It doesnt seem common. Perhaps with better measures we will know for sure. Until then, if it isn't broke, let's not fix it.

For QOS, FIRST is still looking at options, and allowing experts to help with the selection. I'll try to get the experts to include your input.

Greg McKaskle

techhelpbb 30-08-2012 12:02

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Greg McKaskle (Post 1183600)
Sorry it took so long to reply. Had to read all the links ...

Backing up a few steps, I think the first thing is to monitor and log what goes on. At an event, things get a bit simpler since FIRST owns the AP and accompanying management SW and also has an Airtight. Channel level monitoring was already common, and I suspect the displays will be enhanced to include bandwidth monitoring per team.

In a build situation, it is more difficult, but if the team AP on the robot is instrumented, that info can be used when not on the field.

I presume you mean with SNMP support on the robot AP and a suitable MIB to identify the OIDs (I know I've been asked about that MIB a few times as people try to work this out). For those reading this that don't know the Object Identifiers (OIDs) are fields in a table of collected status data about the device and the Management Information Bases (MIBs) define what those fields are. I don't think the DAP-1522 supports RMON (custom traps) even if does support SNMP and telnet. On the plus side using SNMP would at least give you some statistical information about what the robot AP can see unless you loose access to that interface during the polling. With the matches being 135 seconds long and some devices having limits as far as the polling speed I would assume that one would try to SNMP poll frequently enough that the inability to do custom internal polling isn't an issue, that short packet losses wouldn't cause there to be no data collected the entire match, and that it doesn't slow the network with it's own resource demands. If you did have a SNMP (RMON if you have it) trap for the OIDs effectively indicating radio link losses you'd only be telling yourself that there's trouble with the radio after the radio link recovers. On the plus side such a delayed trap notification might be better than waiting for a long SNMP external polling timer to expire, on the downside if the SNMP external polling is frequent enough it's probably not vital to waste the robot AP's time doing that. One could expand the existing DS<->Robot communications systems that use UDP to send only the data they want, when they want it, how they want it as well. Looking back at my previous 2 posts above I had suggested collecting basically much of the same data in a custom fashion as well with a script on routers that support that.

Without even modifying the DS software one could already enable the SNMP service in Windows and use one of the many SNMP managers/monitors like Dart.

SNMP is a UDP based protocol on layer 4. Turning this on has the benefit/downside of generating UDP traffic that contains far more data than the DS<->robot UDP packets. Just doing that SNMP polling will help push down the TCP congestion from the TCP sliding windows and window scaling going toward the robot AP from the field side. Just don't poll too much or you'll basically flood the network and the congestion on the robot side going into the robot AP will get worse. Course if you had SNMP to the devices on the robot then the effect of pushing back TCP would extend into the robot because this traffic is bidirectional and it sends back to the field more data than is required to start the process.

Though again: the upside of sending more UDP traffic is that the congestion has less effect on the UDP packets you send (per that link I keep pointing back to). If you send more traffic for the critical UDP functions in the DS<->Robot communications they'll likely get through a congested link more often. One can send more traffic by making the UDP packets larger (say filling them with statistical data...minding the observations about performance and larger UDP packets) or by sending them more often (reduce the 20ms timer between the UDP packets to 5-10ms...you could still use autonomous/teleop modes the same way and still time out enable in 100ms). So while SNMP does offer the same capability this is a distinguishing point to make.

The only additional concerns I would add to that is to make sure someone changes their SNMP community password (string) on the robot AP. On the field it's not a big issue but off the field it could be used to craft a DoS by over-polling.

Additionally on this subject, even if the robot AP doesn't support RMON the Cisco 1252 does (there may be caveats to this support for instance it might not work in the older VxWorks firmwares).

Quote:

Originally Posted by Greg McKaskle (Post 1183600)
is there a TCP bottleneck problem? It doesnt seem common. Perhaps with better measures we will know for sure. Until then, if it isn't broke, let's not fix it.

That's fair enough. I do see a number of people reporting issues with the Smart DashBoard and Webcam performance which are both TCP but there's no absolute proof that TCP congestion caused by issues on the field causes that to happen more on the field than in private environments. Proper data collection would go a very long way to figure this out.

I should also note that reducing the number of packets lost in the radio layer will seriously improve the situation. If the radio link quality improves there will be more immunity to the sort of short jamming interruptions in communication I know are possible and that AirTight won't alert about. From AirTight's perspective a short interruption is not what most people perceive as a denial of service (DoS) attack. Normally if you loose a few packets your web page loads a little slower or your video quality goes down. Most people don't expect to use the full bandwidth of their radio link in 135 second intervals. Even expertly configured radio links loose information, it's not generally a desired trait and as all this shows it can be very complicated to deal with the consequences of loosing that information. So in this regards efforts on the part of FIRST to tighten the detection net for the radio link and to improve the quality (antennas, robot AP placement, etc...) go a long way to throttle back on the unusual and obscure network efforts needed to compensate. It's not like we can ask the robots to not move while we make adjustments during a match. (Attach antenna to arm and make program to find best signal. Robotic rabbit-ear adjuster.)

Obviously with so many TCP/IP implementers using the IETF RFC standards and via that implementing TCP-Reno/New Reno (Microsoft (default in XP TCP-SAck which is very close, Vista and up offer TCP-TSAck as well), Mac OSX (it's a default), Axis, VxWorks, BSD (it's a default), Linux being notable exceptions as they go beyond that with TCP-Cubic) they are probably doing that for a reason. The reason is that in a wide range of bidirectional network traffic carried on wire in a wide variety of circumstances TCP-Reno/New Reno (TCP-Westwood/Westwood+ is just a tweak to TCP-New Reno, TCP-SAck adds selective acknowledge to reduce retransmission, and TSAck uses timestamps (not to be confused with CTCP)) behavior is a good compromise when congestion occurs. The only reason I'm suggesting otherwise is that this is a specific set of circumstances. Personally I dislike when I see people turn this stuff on without a good idea of what it might help and what it might hurt in their specific circumstances. (Once in a while there's a fuss about how DD-WRT, OpenWRT and Tomato advertised TCP-Vegas and how people used it without a clue about it's specifics just cause it was a hot topic).

As a backup for my advocating of TCP-Vegas for this mobile robot application consider this:
VEGAS: Better Performance Than Other TCP Congestion Control Algorithms on MANETs

Quote:

Originally Posted by Greg McKaskle (Post 1183600)
For QOS, FIRST is still looking at options, and allowing experts to help with the selection. I'll try to get the experts to include your input.

I appreciate it.
Sorry about the length of the posts whole lot of detail in a small space.

BTW you might want to show someone this:
Cisco End of Sale / End of Life Announcement - 1250 Series

If FIRST is interested in replacing that Cisco 1252 here's a quick suggestion to consider:

6 individual APs of the same make and model as the robots use (nice and modular).
All the APs (field and robot) running DD-WRT.
Enable SSH to configure them all.
Configure TCP-Vegas across the wireless link between them (pay attention to where the TCP endpoints are).
Tune the queues (they often default to 1000 packets) using floods of UDP and TCP individually and together.
Keep in mind that small or huge queues are not a great idea, especially considering 802.11n packet aggregation.
Try disabling channel bonding (there are 6 robots, 4 channels in 5GHz, reduce the contention).
Consider that disabling channel bonding might impact the queue changes.
Instrument them all with custom code.
Make up for any missing features in the managed switch on the field side.

If someone would like me to demonstrate what I outlined above: it's easy enough so just ask.

Brian

qnetjoe 31-08-2012 13:56

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1183633)
BTW you might want to show someone this:
Cisco End of Sale / End of Life Announcement - 1250 Series

If FIRST is interested in replacing that Cisco 1252 here's a quick suggestion to consider:

6 individual APs of the same make and model as the robots use (nice and modular).
All the APs (field and robot) running DD-WRT.
Enable SSH to configure them all.
Configure TCP-Vegas across the wireless link between them (pay attention to where the TCP endpoints are).
Tune the queues (they often default to 1000 packets) using floods of UDP and TCP individually and together.
Keep in mind that small or huge queues are not a great idea, especially considering 802.11n packet aggregation.
Try disabling channel bonding (there are 6 robots, 4 channels in 5GHz, reduce the contention).
Consider that disabling channel bonding might impact the queue changes.
Instrument them all with custom code.
Make up for any missing features in the managed switch on the field side.

If someone would like me to demonstrate what I outlined above: it's easy enough so just ask.

Brian


I just want to caution everyone about going down this road with regards to the wireless system; there are a million tangents that you can take a wireless system design, but you will need to be very methodical and listen to the larger scale requirements. The 2015 Control System RFP is a good read just to understand the larger issues that the field has to face.

Section WRC1 states "Capable of controlling 4 co-located active fields with up to 6 robots on each field.". This means that we can not have 6 independent access points per field because even with 20MHz channel widths there are only 20 non-overlapping channels, plus is the middle of that 5 GHz band are required to use dynamic frequency selection (DFS) and transmit power control (TPC) because that is the same band as weather-radar and military applications. It would be a unfair for any team to be using on the these channels when other teams can use non DFS/TPC channels. There are only 9 non-DFS non-overlapping channels (in the US).

We all have to remember that there is a big difference between a concept/prototype and production. FIRST needs to have a production grade wireless system. 10% of my job is prototyping and the other 90% is taking a prototype and turning into something production grade. I recommend that if you are going to go down this road you a need to have a good model, preferably something based on the OSI model. At the end of the day FIRST can and should only do two things:

* Provide a rock solid production grade media layer (OSI Layers 1-3)
* Provide a method for detecting issues in the host layer (OSI Layers 4-7)

I really think that this thread has moved away from the orginal purpose of 548's Einstein Statement into a topic about wireless design. If this is something that you would like to talk about further can you create a new thread?

techhelpbb 31-08-2012 14:10

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by qnetjoe (Post 1183805)
I just want to caution everyone about going down this road with regards to the wireless system; there are a million tangents that you can take a wireless system design, but you will need to be very methodical and listen to the larger scale requirements. The 2015 Control System RFP is a good read just to understand the larger issues that the field has to face.

Section WRC1 states "Capable of controlling 4 co-located active fields with up to 6 robots on each field.". This means that we can not have 6 independent access points per field because even with 20MHz channel widths there are only 20 non-overlapping channels, plus is the middle of that 5 GHz band are required to use dynamic frequency selection (DFS) and transmit power control (TPC) because that is the same band as weather-radar and military applications. It would be a unfair for any team to be using on the these channels when other teams can use non DFS/TPC channels. There are only 9 non-DFS non-overlapping channels (in the US).

We all have to remember that there is a big difference between a concept/prototype and production. FIRST needs to have a production grade wireless system. 10% of my job is prototyping and the other 90% is taking a prototype and turning into something production grade. I recommend that if you are going to go down this road you a need to have a good model, preferably something based on the OSI model. At the end of the day FIRST can and should only do two things:

* Provide a rock solid production grade media layer (OSI Layers 1-3)
* Provide a method for detecting issues in the host layer (OSI Layers 4-7)

Fair enough but:

802.11n implements Clear Channel Assessment (CCA) to mitigate busy channels. The robots move so their proximity to the other end of the radio link changes. That could be ignored and be a bad thing or could be exploited for improving things. One could also use more directional antennas to adjust some of this. Not to mention in DD-WRT you can sometimes change the radio output power and sometimes without rebooting the AP (depends on manufacturer). 802.11n uses multipath so if the antenna placements were better perhaps FIRST wouldn't need so much transmit power because of the improvement in the ability to receive. More over, DD-WRT allows you to adjust the threshold for the CCA (assumes device support for this adjustment). This would be handy as well if you know you've got channel overlap and in this case we are lucky enough that we know it might be there.

In the current system 802.11n the maximum bandwidth of 300Mbps (actual throughput will be 60-70% of that) at 5GHz is achieved with radio channel bonding (which I advised to turn off). I should have been more clear above about why there are only 4 300Mbps communications channels available.

The only way you can't have overlap with channel bonding on is if you only use a maximum of 2 radio channels per 4 fields using multiple SSID. Then you have contention at the network level because the radio layer will be time shared between 6 robots. This is the tradeoff FIRST made already but all radio configurations were available to them as they control both ends during a match. The layer 3 and 4 network traffic beyond the UDP DS<->Robot is beyond FIRST's control to a much greater degree. Field side QoS will help to a point. Robot side QoS will just restrict what can be sent and when but in a stand alone environment things might be very different so how is anyone to test? I would think that someone could design their robot to be much less fair to the others using that contention for the radio resources with just multiple SSIDs on a dual channel radio layer especially as it is now (even if there are VLAN bandwidth limits). Even with multipath in the current environment with the robot APs as they are (badly placed) there's risk for hidden nodes (one robot checks to see if another is transmitting and doesn't get a clear reception so it transmits at the same time causing a collision).

Also if so much as one additional network is created that uses channel bonding you have overlap. Never mind adjacent channel interference which you will have if you use 8 of 9 radio channels. 802.11ac with quad radio channel bonding isn't going to improve this situation either. Something running 802.11ac as a hidden node and ignoring 'good neighbor' because it can't receive the transmit from a moving robot would be a real pain.

In that regard:

What happens when you use 802.11n radio channels next to each other in the radio spectrum and physically too close together:
Reinvestigating Channel Orthogonality - Adjacent Channel Interference in IEEE 802.11n Networks

How close together is close, what is the effect on UDP (important for DS<->Robot), and what happens if you turn down
the radio output power (the results of this can be used to mitigate that link at the top of this list):
Understanding the Effects of Output Power Settings When Evaluating 802.11n Reference Designs

Why channel bonding is not always the best idea:
The Impact of Channel Bonding on 802.11n Network Management

One can mitigate the issue of proximity to a wireless radio and overlapping channels (not to mention adjacent channels) with nearby similar networks (well within the distances FIRST is subject to) by reducing the output power of the radios (manually, by script, or frequently by code all of which are options with DD-WRT). It works in 802.11n on 5GHz and if you start reading this attached thesis from Rutgers you'll save me a lot of time typing because all the justification for my statement is basically there (start on page 45 to save yourself time). If one looks at the graphs they'll see that in the tests the writer saved a fair amount of power (always handy for a battery powered robot) and still maintained radio throughput with UDP traffic (the traffic that FIRST's DS<->Robot communications depends upon) but the effective range of the communications was reduced (handy if you have fields near each other). So by extension of this information, in an environment that is not adapting my point is: with proper antenna placement one should be able to reduce the radio power and address the concerns you've presented (in fact it's highly probable with such controls you could have even more than 4 fields). After all, even with these channel limits the reason these devices sell as they do is that the signal doesn't extend such great distance that you couldn't litter these units in nearby homes and not even notice them from the perspective of one home to the next.

Adaptive Transmit Power Control Based on Signal Strength and Frame Loss Measurements For WLANs

Which is worse? What I suggested isn't all that hard to test.

Additionally, there's nothing stopping anyone from using multiple SSID on a single device with DD-WRT either. One could work up a finger print of robot bandwidth and pair teams or alliances off to fewer than 6 field APs based on that metric. In fact such a automatically field tested bandwidth requirement fingerprint might be handy for a bunch of reasons beyond it's value for that (instead of leaving everyone looking at each other...just push the report to their driver's station for later review).

Quote:

I really think that this thread has moved away from the orginal purpose of 548's Einstein Statement into a topic about wireless design. If this is something that you would like to talk about further can you create a new thread?
I'm fine with that as well. I'm even open to this conversation in private.

I just want to add. As of this post we are now discussing exactly the sort of balance I wrote to FIRST about already in private. So, while this has ventured into great detail and engineering matters. It did not venture away from the issue of what happened on Einstein or for that matter what *may* have happened elsewhere. I hope that if someone has issues with my points or my point of view they'll discuss it with me. It's better to be respectfully challenged than to be above all challenges. Also my apologies for the crazy way I've had to edit all this. I did not intend originally to have to present a formal thesis of my own and it took some time for me to adjust the presentation of my ideas. I am still quite happy to demonstrate that this works and I'm going to leave this topic at this point. Thanks for your time.

Al Skierkiewicz 04-09-2012 14:01

Re: Team 548 Einstein Statement
 
Everyone should check out the Cisco site for actual tech specifications and in situ performance for the field AP. This unit is designed to cover the entire floor in large buildings with typical coverage of up to 375 ft and with well over 100 users connected. First engineering, DEKA engineers and the wireless consultants all have extensive experience with the units used.
If anything, there should be a caveat to teams to mount the radio away from large metal objects, near the outside of the robot, with a secured power connector and without having robot appendages move against the case while operating. Just this year alone, I have found teams with the radio mounted on the bottom of the robot, or behind the bumper supports, or underneath or behind 2"x4" box tube that is part of an appendage, or with the radio sandwiched between two pieces of metal plate. I even found one team that had constructed an aluminum box out of perf stock to "protect" the radio.

Jon Stratis 04-09-2012 14:31

Re: Team 548 Einstein Statement
 
Now there's an idea Al... can we just surround the field + field AP with a giant Faraday cage? That should resolve any and all concerns stemming from interference from outside sources!


Note: I'm not really being serious here :)

techhelpbb 04-09-2012 14:59

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Al Skierkiewicz (Post 1184086)
Everyone should check out the Cisco site for actual tech specifications and in situ performance for the field AP. This unit is designed to cover the entire floor in large buildings with typical coverage of up to 375 ft and with well over 100 users connected. First engineering, DEKA engineers and the wireless consultants all have extensive experience with the units used.
If anything, there should be a caveat to teams to mount the radio away from large metal objects, near the outside of the robot, with a secured power connector and without having robot appendages move against the case while operating. Just this year alone, I have found teams with the radio mounted on the bottom of the robot, or behind the bumper supports, or underneath or behind 2"x4" box tube that is part of an appendage, or with the radio sandwiched between two pieces of metal plate. I even found one team that had constructed an aluminum box out of perf stock to "protect" the radio.

I have no doubt that the Cisco 1252 has excellent range I've used them outside of FIRST. I have no doubt that the experts here would respond based on the data they've collected. In the end the question really becomes by what evidence does one determine that the radio power settings used are appropriate at any moment in the system's operation (the field and the robots being the system)? No matter what there's a high density of APs in a relatively small area so no matter what they'll impact the radio performance of each other. It's not clear to me that anything measures even RSSI currently (the thesis linked in my previous post detailed why RSSI alone isn't really the most thorough indicator with regards to radio power control). So really I'm not sure how anyone can know how the power of that radio signal is behaving moment to moment with the various robots involved during a match. I've never seen the experts go around to the robots adjusting them to mitigate their concerns like they could do with the fields during setup. One would have to actually do that because the robot APs are all on the same channel and they can be very close or 50 feet apart from one another near moving parts like rotating metal shooter hoods.

In an infrastructure environment (by far the most common usage of this technology) usually the transmitter power being set as high as possible without it causing other issues is a good thing. Even in an ad hoc network the number of devices actually moving is usually small. In this environment that may not be the case considering the devices being designed to move around. I can easily see that there's a balancing act where the power level one wants is just enough to do the job while keeping the side effects minimized and that power level will change moment to moment (it may be more power than is being used now, but I suspect it's actually less power than is used now a great deal of the time). Logging of the relevant factors seems critical as otherwise I can't see how anyone can have the data to determine the fit of the solution.

Al Skierkiewicz 04-09-2012 15:33

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Jon Stratis (Post 1184088)
Now there's an idea Al... can we just surround the field + field AP with a giant Faraday cage? That should resolve any and all concerns stemming from interference from outside sources!


Note: I'm not really being serious here :)

Isn't that Battlebots?



Brian,
You are implying that FIRST and/or it's vendors have not made these measurements or know the RF levels on the field. I have to ask, no demand, that you cease making statements based solely on your own experience without any knowledge of what is taking place on FIRST fields. All you are doing is seeding doubt in the minds of those who have no experience in the field. No matter how long your posts, in my mind you are simply throwing rocks at FIRST. The engineering staff has made these measurements, they know what the coverage contour is on fields, and they know the fade margins caused by objects, robots and people on or near the field.

While the RF output level of the Cisco router is adjustable, as you know. Setting devices to maximum is rarely the best solution depending on the environment. There is no doubt that the RF level is sufficient to reach 50 ft. However, with outside interference, it is not the transmit power but the receiver sensitivity that needs to be considered. In normal environments, high RF levels are likely to saturate the receivers causing front end overload and intermod products in the demod process. The Cisco device is capable of making more than one watt ERP with the antennas currently used. I have worked the world on less than that, often achieving distances of greater than 800 miles on about 0.5 watt ERP, not calculating for losses in antenna lobes, ground, transmission cable or atmospherics.

techhelpbb 04-09-2012 16:03

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Al Skierkiewicz (Post 1184095)
Brian,
You are implying that FIRST and/or it's vendors have not made these measurements or know the RF levels on the field. I have to ask, no demand, that you cease making statements based solely on your own experience without any knowledge of what is taking place on FIRST fields. All you are doing is seeding doubt in the minds of those who have no experience in the field. No matter how long your posts, in my mind you are simply throwing rocks at FIRST. The engineering staff has made these measurements, they know what the coverage contour is on fields, and they know the fade margins caused by objects, robots and people on or near the field.

With the deepest respect that is/was literally impossible on a continuous quality maintaining basis. The Einstein reports clearly indicate that not even the logs from the Cisco were available for review during testing. That was also a long report but still it didn't address this point either.

Also, you failed to consider that the fade margins are entirely dependent on robot AP placement or it would be impossible for a team to find a placement of the AP that would interfere with communications. There is no way for FIRST to predict all robot designs sufficiently to test on that level.

Quote:

While the RF output level of the Cisco router is adjustable, as you know. Setting devices to maximum is rarely the best solution depending on the environment. There is no doubt that the RF level is sufficient to reach 50 ft. However, with outside interference, it is not the transmit power but the receiver sensitivity that needs to be considered. In normal environments, high RF levels are likely to saturate the receivers causing front end overload and intermod products in the demod process. The Cisco device is capable of making more than one watt ERP with the antennas currently used. I have worked the world on less than that, often achieving distances of greater than 800 miles on about 0.5 watt ERP, not calculating for losses in antenna lobes, ground, transmission cable or atmospherics.
1. The robot APs also transmit.

2. The robot APs have been positioned in such a way that they don't communicate clearly despite the Cisco 1252's radio output power. It's not a question of could be...there are sufficient examples.

3. All of the APs both field and robot are capable of interfering with each other. Not just when they are on the same radio channel, but when they are on radio channels adjacent to one another. So the radio interference concerns are not just from outside sources.

4. Logging for the relevant issues should be simple enough to do. If one doubts the validity of their position, or values the correctness of their own that's a great way to mitigate both ends of the concerns.

5. I'm not saying that the field AP can't go a further distance, I'm saying that it does not need to. In fact should not unless it's absolutely necessary based on measurements. This goes for the robot APs as well. Right now all evidence supports that the radio power levels are fixed.

6. You are absolutely correct about the radio receiver sensitivity being important. The entire clear channel assessment (CCA) process that allows the robots to be on the same radio channel depends on that sensitivity. In fact it's adjustable for that reason. It's also impacted by the robot AP placement. Regardless of diversity, MIMO, or RTS/CTS. If the receiver can't receive the transmissions on it's radio channels a robot may as well be a rolling jamming device. A robot that can't receive other transmissions it might interfere with, and with a demand to send, will just send on the same radio channels that are probably busy with other robots causing a collision.

craigcd 04-09-2012 16:15

Re: Team 548 Einstein Statement
 
First of all I would like to say that the previous FMS discussions are extremely interesting. I kind’a like the “SUPER SIMPLE VERSION” the best. This is a very impressive analysis and the more I read the more confused I get. That is probably because my experience is not with electronics and code and “stuff”. Secondly they seem to have moved from the original purpose of the apology from team 548. This is all important information and maybe a new thread needs to be started and let the Team 548 Einstein Statement pass into history.

EricVanWyk 04-09-2012 16:43

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by craigcd (Post 1184103)
First of all I would like to say that the previous FMS discussions are extremely interesting. I kind’a like the “super simple explanations” the best. This is a very impressive analysis and the more I read the more confused I get. That is probably because my experience is not with electronics and code and “stuff”. Secondly they seem to have moved from the original purpose of the apology from team 548. This is all important information and maybe a new thread needs to be started and let the Team 548 Einstein Statement pass into history.

There is nothing in the entirety of the field of engineering that can not be understood by every single member of this board. Unfortunately, 'engineer-speak' is a fractured language with countless dialects that all use different words for the same meanings. If something seems confusing, ask the person you are talking to to rephrase it into the right dialect for you. Two of my patents are from rephrasing what a mechanical engineer said in to electrical engineer speak.

Alternately, you may find that they are trying to hide their own confusion in a cloud of vocabulary and acronyms. Don't be impressed, ask questions.

techhelpbb 04-09-2012 18:20

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by craigcd (Post 1184103)
First of all I would like to say that the previous FMS discussions are extremely interesting. I kind’a like the “SUPER SIMPLE VERSION” the best. This is a very impressive analysis and the more I read the more confused I get. That is probably because my experience is not with electronics and code and “stuff”. Secondly they seem to have moved from the original purpose of the apology from team 548. This is all important information and maybe a new thread needs to be started and let the Team 548 Einstein Statement pass into history.

In addition to that SUPER SIMPLE VERSION here's another:

The situation (The Town Hall With Musical Chairs):
1. You have 7 blind people in a room and there are possibly 4 rooms.
2. They tend to speak at the same volume and they hear just as well as each other.
3. One of those people is the person that everyone is talking to (the key person).
4. That person stands with their back against the wall of the room looking into the space of the room.
5. The other 6 people move around the room blindly.
6. Everyone is trying to be polite and only speak to this key person one at a time.
7. There are invisible portions of the room that make it harder to hear each other (not only are the sounds from each other's perspective too quiet but the voices are too hard for some other people to hear).
8. Any moving person not heard from in a short period of time must stop moving till they hear someone.
9. When they talk it's basically one sentence of some random length at a time then they stop.
10. If someone can't talk for a while they experience a pile up of sentences they must communicate later.

Knowing this:
1. If they knew they were in one of the quiet invisible portions they could just not talk and hurry out but they can't see that so they might talk when they should not.
2. If they all scream confusion will set in because just before they start screaming they might not hear someone else or they might be hard to understand at that extreme volume or they might disturb nearby rooms.
3. If they all whisper sometimes someone won't be heard when they are talking but at least someone will be heard if they are close enough to the key person.
4. If they all could just find the right volume they could all talk and hear each other but that volume changes as they move and they all move blindly.

My solution:
Let everyone talk at different volumes and adjust their volumes as they move. To do it requires communication about the perceived volume as each person talks. Sometimes someone will talk over someone else, but if they all start off slowly increasing volume between their movements it'll be less often they talk over each other and at least someone will get a clear word in edgewise. As long as the balance between volume changing, movement and time is set properly no one should be stuck anywhere for long or continuously talk over anyone else. Let's refer to that balance as being fair to one another.

A. The key person is the Cisco 1252 field AP.
B. The 6 blind people are the robots and robot APs.
C. The voice volumes are the radio transmit powers.
D. The rooms are the fields.
E. The invisible portions are things that make the robot AP occasionally not receive or have other APs receive it's transmit.
F. The short period of silence before which the people must stop is the robot enable that times out in 100ms.
G. The sentences they speak of different lengths are the data communicated over the radios.
H. The pile up of data they must send when they can't is a network congestion problem.
I. When more than one person talks by accident at the same time it's a collision.
J. A person in a quiet portion of the room talking because they can't hear someone else talking is a hidden node.
K. The restriction on each person to listen for another talking before they talk is clear channel assessment.
L. If someone screams into that room that would be a jammer but these people talking over each other serve the same function as that jammer would serve.

I'm suggesting they are talking over each other too often right now.

Further to link this super simple example and the other:
The communications in this example dictates the size of that hole from the other example with the half dollars.
If the communications was better perhaps that hole would be quarter sized instead of dime sized.
Then it would be easier in the first place to send the half dollars.
If the hole was a little bigger and the people stuffing the bits of ground up half dollar were more clearly and quickly communicating the whole problem gets easier.


The alternate situation (The Moving Study Hall):
I also suggested before that we have 12 people in the room.
6 key people and 6 moving people.
Each key person talks with one moving person.
In fact, we could strategically place some of the key people against the wall to keep them closer to a certain moving person.
The concern that people have with 12 people is that the volume in the room would disturb the rooms next door.
They can all be in that room and talk at the same time if they control their voice volumes properly.
Sure sometimes they might disturb each other but they already disturb each other in the other example anyway.


The basic conclusion:
I figure some of this is just a question of controlling that 'volume' of the communications either way.
If we can't manage to at least control the 'volume' of those 'voices', I think we should at least record the whole mess so we can find better solutions later.
If we can't record everything, at least record the 'volumes' of the 'individual people' and how it is perceived by 'everyone' else.
At least then we'll know there were 'quiet spots' in the 'room(s)' and 'who' was in those spots at what times.


Sorry this is long but I hope despite it's length that it is very easy to relate to.

Greg McKaskle 04-09-2012 21:19

Re: Team 548 Einstein Statement
 
I'm by no means an 802.11 or antennae expert, and I have seen engineers go for hours on trying to critique a single aspect of a lower-level layer of the OSI model for network communications.

Rather than debate network power, what if I instead just measure the efficiency of communication -- how long does it take for a given amount of data to be communicated. The radio tap header contains the data rate and encoding scheme. If it is low, well it could be for any number of reasons, but if it is high, approaching the theoretical limit, then that must mean that things are clicking along just fine. It isn't that hard to measure or even to log. Unless you have strong evidence that shows signal strength to be a root cause of many robot failures, I don't think this discussion will bear fruit. It can easily eat up many forum pages, but no fruit.

Greg McKaskle

techhelpbb 04-09-2012 22:05

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Greg McKaskle (Post 1184165)
I'm by no means an 802.11 or antennae expert, and I have seen engineers go for hours on trying to critique a single aspect of a lower-level layer of the OSI model for network communications.

Rather than debate network power, what if I instead just measure the efficiency of communication -- how long does it take for a given amount of data to be communicated. The radio tap header contains the data rate and encoding scheme. If it is low, well it could be for any number of reasons, but if it is high, approaching the theoretical limit, then that must mean that things are clicking along just fine. It isn't that hard to measure or even to log. Unless you have strong evidence that shows signal strength to be a root cause of many robot failures, I don't think this discussion will bear fruit. It can easily eat up many forum pages, but no fruit.

Greg McKaskle

Just a couple of points:

1. The necessary signal strength to balance, distance, throughput and interference will always be changing. Creating such a test was already presented in the thesis last page. Anything less than adapting (even if the adaptation is a shell script making the adjustment once a second) will surely have a short coming somewhere. Just the additional consequence of the movement of the robot APs.

2. It's not just the signal strength from the radio output but the antennas, the antenna placements and the competition for the channels (so which way one divides the 9 available radio channels matters as well as the distances between the users of each channel).

So the only way I can envision finding the optimum or at least the 'good enough' for FIRST is active data collection and response.
I have tried this with a bunch of APs just as a test and it worked fine. However, I'm not sure I consider my experiment to be a great proof of anything other than possibilities.
I didn't design it to be comparative against FIRST just as a demonstration.

I think Cisco now offers per-packet information headers (PPI headers) for 802.11n instead of radiotap (see also this).

Radiotap offers IEEE80211_RADIOTAP_RATE but I'm not sure about the active encoding and PPI headers offers rate, but aren't these in units of 500kbps? I know they are making additions for VHT to accomodate 802.11ac.

Also are the D-Link 1522s capable of tagging packets with information with the stock firmware?
OpenWRT has some development work for radiotap not so sure about PPI headers.

Not saying I'm against doing this just pointing out pros/cons.

Greg McKaskle 04-09-2012 22:20

Re: Team 548 Einstein Statement
 
Quote:

... So the only way I can envision finding the optimum or at least the 'good enough' for FIRST is active data collection and response. ...
Exactly, and that is what 802.11 participants do. The algorithms for adapting to changes in orientation and interference are part of the standard. They don't just broadcast, but listen, measure, adapt, and communicate status to the AP.

I was using radio tap, but I'm sure there are other standards, and it will continue to evolve and improve as it plays a larger role in our everyday lives.

Greg McKaskle

Alan Anderson 04-09-2012 22:25

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1184171)
...active data collection and response...

...is built in to 802.11n.

techhelpbb 04-09-2012 22:36

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Greg McKaskle (Post 1184177)
Exactly, and that is what 802.11 participants do. The algorithms for adapting to changes in orientation and interference are part of the standard. They don't just broadcast, but listen, measure, adapt, and communicate status to the AP.

I was using radio tap, but I'm sure there are other standards, and it will continue to evolve and improve as it plays a larger role in our everyday lives.

Greg McKaskle

Unfortunately not all the 802.11n participants have the features of 'beamforming' found in the newer devices.

This is not to say they don't have layers of responsive adaptation but as the thesis link demonstrated with Atheros chipset 802.11n development boards the responsive adjustment of transmit power to stike the best balance is not an existing feature. You can control the transmit power and it is effected by various existing settings but not in the manner I'm describing.

Relinked as it's now a page back:
Adaptive Transmit Power Control Based on Signal Strength and Frame Loss Measurements For WLANs

Perhaps someone could find a device that has those features but that's a whole separate issue. Generally what I am describing is closest to: Aruba Adaptive Resource Management (ARM), Dynamic Radio Management (DRM), Radio Resource Management (RRM), and anyone else with their own WiFi architecture generally (my apologies Greg if that was your grander point). Of course if all the devices were from Aruba, Cisco or Extreme we could exploit their infrastructure as they decribe but I figure FIRST is not interested in spending that sort of money considering the robot APs. Also some of these adaptive infrastructures are probably a bit too slow at minute intervals given the duration of a FIRST match. Again they usually make the reasonable assumption your AP isn't bolted to a robot and dancing around.

802.11F does provide a channel for similar communications via inter-acess point protocol. Though it's not clear to me if that is currently extended from the Cisco 1252 in any way to mitigate the specific power concerns I've highlighted with any other vendor. I'm sure Cisco's RRM works just great with other Cisco devices (I'm using it right now). However, so far as I know currently Cisco uses LWAPP not really 802.11F for their RMM feature set.

Given I have this feature working as a shell script pulling down the maximum radio power right now on some DD-WRT access points I know that FIRST doesn't need anything very fancy to achieve this basic balance. However, it's certainly not a feature you'll just get with random 802.11n hardware.

EricVanWyk 04-09-2012 23:58

Re: Team 548 Einstein Statement
 
This sounds like a great conversation to have on the 802.11 board. This thread is the wrong place to have it. This thread is about Team 548's Einstein Statement. This thread is not about your crackpot theories on beamforming or adaptive power control. Please stop hijacking threads.

techhelpbb 05-09-2012 01:00

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by EricVanWyk (Post 1184204)
This sounds like a great conversation to have on the 802.11 board. This thread is the wrong place to have it. This thread is about Team 548's Einstein Statement. This thread is not about your crackpot theories on beamforming or adaptive power control. Please stop hijacking threads.

All about going after me personally again instead of presenting evidence which I already did you the courtesy of doing and offering repeatedly to exit this topic.

My crackpot theories as you put it are backed by PhD level work yours are backed by....vapor.

I'll write this again, The Einstein reports CLEARLY indicated that insufficient logs were kept. I did not make that mistake I merely pointed out where additional logging should be implemented. If you doubt my points log the data and prove it.

Shifting blame like with 548's statement instead of being open and accountable is how this all got started and clearly some of you learned nothing. I've been more than tolerant of some of your blatant and often obvious discrimination. To the rest of you who treated me with some respect thanks for some consideration.

Akash Rastogi 05-09-2012 01:32

Re: Team 548 Einstein Statement
 
Brian,

Stop instigating more conflicts. It does not help your cause or your image which reflects back on your reliability as a source of information. Instigation and calling someone out does not help you earn respect from readers.

Eric is right, this thread is for one thing and one thing only. Leave it be.

Sincerely,
Akash

techhelpbb 05-09-2012 01:44

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Akash Rastogi (Post 1184214)
Brian,

Stop instigating more conflicts. It does not help your cause or your image. Eric is right, this thread is for one thing and one thing only.

Sincerely,
Akash

I've offered to exit already repeatedly as I just did and I shall. My image is irrelevant to any point. None of you has the ability to tarnish it with these tactics in any capacity. Just like I hope Team 548 understands that they should never let anyone else tell them who they are or dictate their abilities. I hope they move forward into a bright future.

As to the extra stuff you added the facts are the facts. I did not argue from my authority. I argued based on links and data I provided.
Anyone can not like me all they like but the facts are what the facts are. One can deal with evidence or accept risk.

Al Skierkiewicz 05-09-2012 08:54

Re: Team 548 Einstein Statement
 
Brian,
As I pointed out your statements that FIRST and the Einstein weekend experts don't know what the RF levels on the field are simply false. While it is true that logs on the 1252 were not retained from the actual Einstein event, there has been a lot of data collected in this area both prior to their initial use and after, most recently during the Einstein weekend. Experiments were performed in situ, with all robots and in various configurations and orientations. While RF levels vary on the field and while robots are moving, there was no specific and repeatable indication that RF levels on the field were or are a contributing factor to data throughput or loss of communications. While it is easy for you to state that there are many APs providing signal and interference on a FIRST field, the fact is that there are very few at 5GHz. In the event that there is an issue, FIRST has a solution to swap out the Dlinks with another device.
You stated that directive, high gain antennas should be used. These devices were discussed and are being evaluated. However, they carry significant issues when used for short distance communications. Side lobes, hot spots and excessive signal may not be able to be controlled through the adaptive processing used in 802.11 especially considering the amount of moving and stationary reflective surfaces on the playing field.
You state that RF levels should be made as high as possible. The 1252 is capable of signal levels in excess of +30 dBm per output. While the 1522 output is less, over the distances covered by these devices even with teams locating the radios inside a robot, the fade margins are greater than 30 dB and typically 50 dB.
You state that we need to take action on RF problems on FIRST fields when in fact RF is not the demon you state it is. You are stating the worse case scenarios that are used in system design in the harshest environments as the norm for a field that is less than 50 feet in actual signal path distance. The majority of the readers of this forum may take away from these statements that any problems they experience are caused by RF level issues when in fact they are not. If anything, the general reader should take away from this discussion that RF levels, maximum throughput, connectivity, reflections and multipath, changing RF environments and interference have all been thought through by the designers of the 802.11 specifications and associated hardware to make this a robust communications link.
Those that oppose your statements are not attacking you and I hope you realize I am not attacking you. We are merely opposing those statements that mislead, misinform and confuse the readers of this forum. I have seen the tests performed, and the equipment used. I sat in on discussions with the engineers at DEKA, Cisco and with other consultants where all of these things were discussed and some of those people were on the IEEE 802.11 committee. Participants communicated electronically both prior and following the Einstein weekend while we analyzed data collected during Einstein matches and at the test weekend. During the Einstein weekend we actually went as far to open a 1522 and measure antenna parameters while attempting to detune the antennas as a team might if it opened a Dlink to repair a connector.
While I am not an employee of FIRST, I am defending the organization in this area simply because I am aware of the work they have done and dedication they have shown in making sure the wireless communications work. It is my intent that teams should not immediately jump to the conclusion that the field is at fault and thereby fail to continue to check for other issues they have actually caused on their own robot. That is to say, teams should be checking that power, mounting, software and sensors are all working properly.

I have neglected to add the contributions of the Qualcom team input during Einstein testing and discussion. That is a major flaw on my part, sorry guys. The Qualcom input during field testing was invaluable and those people also were fluent in 802.11.

techhelpbb 05-09-2012 13:30

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Al Skierkiewicz (Post 1184249)
...

Al,

I don't really think that we are coming from all that different a place. I am not making excuses for teams that place their robot AP in a bad location. I am also not suggesting that a bad location for the robot AP should be accommodated.

I am merely suggesting a fully consistent monitored system for the field and robot APs that can detect both the optimal transmitter powers and any unusual cases system wide that cause collisions or congestion (the 2 being related).

I do not deny that FIRST supplemented the logged data for Einstein with the evidence collected for that match or during the report nor do I question the integrity or skill of those people or yourself. I am saying that such data was not in the logs generated and as the reports suggest more logging should be done.

I recognize that having the availability of talent and tools to perform the Einstein analysis measurements was a special case. A unique experience for the teams at Einstein to be sure (upside one learns some things, down side one has to deal with this). I feel that's an experience in troubleshooting that could be somewhat distilled and offer value to other people.

The Aruba (ARM), Extreme Networks (DRM) and Cisco (RMM) technology would perform a similar quantitative system level analysis and radio transmit power management/control (TPM/TPC) continuously regardless of where the fields were setup or which robot is using them. This is not to say it has to be one of those technologies nor that any of them is a perfect fit for FIRST. I additionally suggested a cheaper alternative in DD-WRT. All of these technologies provide insight into the APs with the perspective to the data I would like everyone to be able to analyze.

I am perfectly willing to accept that if there's doubt of my concerns that other cheaper logging solutions like Greg seems to be offering are a fine compromise. I also don't want FIRST to waste resources fixing a problem they have no real life data on yet. My point was not to demand solutions immediately but to frame concerns leading to exploration of the critical aspects. One may not know what to look for if they don't consider what problems might exist. Even if the consensus is that the Einstein report indicates these problems do not exist on that field at that time with those robots. That does not make a very large test sample considering the growing size of FIRST.

In my opinion the goal is not to just keep turning up the maximum field AP transmit power. The goal is to use no more, and no less power than is required to achieve the field system communications at any time. Personally I suspect that all of the APs already have too high a maximum transmit power setting (that's my opinion). I previously linked a paper on the risks of raising the radio transmit power so I think it would be a bad idea to use the Cisco 1252 turned up to it's highest settings. That's a sledge hammer when you need a frequently calibrated and maintained instrument.

I also would like to clarify that I am not writing that there needed to be a bunch of extra 5GHz networks to consider the existence of these problems. There is sufficient network hardware with just the fields that one can generate interference (adjacent radio channel and hidden node). I did point out before that there is additional opportunity for someone to compound that at any event with a 5GHz capable laptop nearby.

I understand that the selection of directional antennas for the Cisco 1252 at 5GHz may be problematic at these distances. I merely offer that perhaps if MIMO antennas were used on the field AP the side lobes could be better minimized. I'm not sure what selection of MIMO antennas FIRST has for the Cisco 1252 under the circumstances. Perhaps other APs would broaden the available antenna options.


I would really like FIRST to be in a strong position to hand quantifiable measures of radio power and throughput issues to all teams. Having data like this would be a wonderful critical thinking exercise for the teams and reduce the feeling of trial and error. There's not a lot of time for trial and error during real matches I'd rather people relocate APs or fix software for a reason. I'm fine with approaching these matters methodically and I don't really think I've asked for significantly more than some slight input into things FIRST may soon monitor and log which is one of their published mitigations from the Einstein report. There's really nothing extreme about what I'm asking considering it would illuminate the real risks of what I've described if they actually exist in this system. If all that monitoring from all those fields and robots shows that these problems are not common then that's a great sample set and definitive.

I don't want people running around looking for ghosts or laying blame or suspicion when the Einstein reports clearly show there are plenty of issues to go around. On topic, I also rather not have extreme perspectives of people's participation clouding what might be otherwise easy process adjustments, team reputations, or FIRST's reputation. None of it is necessary or scientific.

Taylor 05-09-2012 13:40

Re: Team 548 Einstein Statement
 
Moderators: A request. I think I've seen it done in the past, if it's too labor-intensive, then disregard.

Can we please split this discussion in two separate threads, one discussing 548's involvement and remarks regarding Einstein 2012, and another thread discussing the non-human interactions, interferences, and possibilities ? I'm sure the Robostangs (as well as many other members of the community) wish to put this incident behind them, but it is hard to do so when this "Team 548"-titled thread is hovering near the top of the portal.

EricH 05-09-2012 13:52

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Taylor (Post 1184283)
Moderators: A request. I think I've seen it done in the past, if it's too labor-intensive, then disregard.

Can we please split this discussion in two separate threads, one discussing 548's involvement and remarks regarding Einstein 2012, and another thread discussing the non-human interactions, interferences, and possibilities ? I'm sure the Robostangs (as well as many other members of the community) wish to put this incident behind them, but it is hard to do so when this "Team 548"-titled thread is hovering near the top of the portal.

Split, or lock. Either way, I think it's time this thread disappeared into technical-land.

Brian, Eric, and to some extent, Al. You guys are using rather high-level terms; if you want to do that, that's great, but put it in, say, the control system forum. I don't think I could understand much of what you're saying unless I took extra time to sit down with a copy of "802.11 for Dummies", if such a publication exists, and decipher. Add to that your attacks on each other (and not each other's methods or theories), and if I were a mod I'd have locked or split this thread several pages ago because of the redirect and hostility shown on multiple occasions, as well as this forum rule:
Quote:

ChiefDelphi.com reserves the right to remove a post which does not relate to the topic being discussed in the forum. In addition, ChiefDelphi reserves the right to reorganize discussion forums in order to best serve the majority of our members. (ie: topics may, at a moderators discretion, be relocated to a more appropriate discussion forum, or deleted entirely).

Al Skierkiewicz 05-09-2012 16:21

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1184282)
I would really like FIRST to be in a strong position to hand quantifiable measures of radio power and throughput issues to all teams.

To what end? Radio output power is not an issue and teams cannot modify the radio to make changes if it were an issue.

Quote:

Originally Posted by techhelpbb (Post 1184282)
I merely offer that perhaps if MIMO antennas were used on the field AP the side lobes could be better minimized.

The antennas in use are MIMO antennas per the 802.11 standards and produce the least side lobing of any antenna available from Cisco for this radio. They are vertical dipoles and essentially have a uniform horizontal dispersion. Higher gain antennas do have side lobes and could prove problematic if for no other reason than the deep nulls between lobes. The antennas mount on standard TNC connectors and the 1252 is designed such that the case of the box becomes the ground plane for the antennas.

techhelpbb 05-09-2012 16:36

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Al Skierkiewicz (Post 1184304)
To what end? Radio output power is not an issue and teams cannot modify the radio to make changes if it were an issue.

For the simple reason of finding the optimal placement of the robot AP.
Not to adjust the radio output power levels themselves.

Cisco refers to the transmit power control technology I'm describing as dynamic transmit power control (DTPC). It's part of the features of Cisco's CCX in their product line. Link to save space. This is the same CCX technology that offers protection from the WiFi management frame hacks and I mentioned previously in this topic.

Thanks for the information on the antennas but these are omnidirectional antennas you are describing correct?
I thought the discussion was about a more directional antenna with a reduced side lobe.
I was thinking more along the lines of a MIMO panel antenna or are you describing the inside of a panel antenna?

Jon Stratis 05-09-2012 16:52

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1184305)
For the simple reason of finding the optimal placement of the robot AP.
Not to adjust the radio output power levels themselves.

Is there any doubt that the optimal location for the robot AP is going to be high up, away from metal frame components, and away from motors?

Speaking for a team entering its 7th year... we've followed that general guideline as much as possible, and have never had problems with field connection. Is there anything more that is really needed for teams? We don't need to make this more complicated than it absolutely needs to be...

techhelpbb 05-09-2012 17:07

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Jon Stratis (Post 1184308)
Is there any doubt that the optimal location for the robot AP is going to be high up, away from metal frame components, and away from motors?

Speaking for a team entering its 7th year... we've followed that general guideline as much as possible, and have never had problems with field connection. Is there anything more that is really needed for teams? We don't need to make this more complicated than it absolutely needs to be...

The only remedial option a team might have is to improve their AP placement and connect it properly.
The reason for my concerns is more than just merely to serve that purpose.
It's one benefit.

Also, 2012 is a prime example that putting the D-Link AP near a rotating assembly with metal in it for say a shooter, might not be a great idea. That might be the top of the robot and the D-Link AP might be far away from motors.

The other benefit is to have a log of more information about the field should someone have any concerns. Such information about a wide variety of robots and fields could be used to find strange AP behavior or determine if there was any other source of interference (just examples).

Jon Stratis 05-09-2012 17:37

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1184305)
For the simple reason of finding the optimal placement of the robot AP.
Not to adjust the radio output power levels themselves.

Quote:

Originally Posted by techhelpbb (Post 1184310)
The only remedial option a team might have is to improve their AP placement and connect it properly.
The reason for my concerns is more than just merely to serve that purpose.
It's one benefit.

Now I'm confused... those seem like two contradictory statements to me. The only thing teams have the ability to influence is their radio placement and wiring on the robot, and it seems to me that they already have enough information to optimize that as best they can. As for your other concerns... I think you've explained them, and I think others with more insight into the inner workings of FIRST have pretty clearly indicated that those concerns are things that FIRST is or has looked at.

Playing chicken little with the wireless setup as you've been doing here on CD doesn't really help. All it can do is get teams worried and concerned, with no useful way to alleviate those concerns. It's clear that you're a knowledgeable and passionate person from your posts here, and I would encourage you to direct that passion in a constructive way with the appropriate audience.

techhelpbb 05-09-2012 19:01

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Jon Stratis (Post 1184318)
Now I'm confused... those seem like two contradictory statements to me. The only thing teams have the ability to influence is their radio placement and wiring on the robot, and it seems to me that they already have enough information to optimize that as best they can. As for your other concerns... I think you've explained them, and I think others with more insight into the inner workings of FIRST have pretty clearly indicated that those concerns are things that FIRST is or has looked at.

Playing chicken little with the wireless setup as you've been doing here on CD doesn't really help. All it can do is get teams worried and concerned, with no useful way to alleviate those concerns. It's clear that you're a knowledgeable and passionate person from your posts here, and I would encourage you to direct that passion in a constructive way with the appropriate audience.

While it's true there's nothing more a team can do than move their robot AP to improve it's radio signal quality. Besides perhaps limit the robot's range of movement on the field (that would be a really worst case).

Pages ago (page 11 first post from me at the top) I suggested teams use the GPIO and I2C to flash status information with LEDs while on the field. That would help teams find problems even if they can't communicate to the robot. Obviously the Einstein reports make it clear that a fair amount of problems this could help diagnose remain.

Pages ago I suggested getting better control over TCP/UDP communications and being more careful about how it is used.
With QoS next year the bandwidth limits will be better controlled by the fields. However that might impact how robots perform if teams do not consider those limits. This also implies being careful about network links to IP devices like COTS devices and network congestion. Further the additional bandwidth issues I've been writing about may spawn logged entries that well help teams find these issues.

Pages ago I also suggested testing the robots for a variety of things that are well within the scope of what a team should do. Such as loosing a camera feed to the driver's station. Loosing enable for short periods of time.

All other partial solutions require cooperation and communications with people involved in the fields and FIRST. I did send 2 e-mails to FIRST which still haven't received an acknowledgement. Those 2 e-mails are still relevant (as are the other students and Mentors that I know sent e-mails). Since then Al has clearly stated that while I might not get an acknowledgement they should be listening to this topic and elsewhere.

Clearly Greg's posts in this topic mark some interest in possible improvements in the logging in relation to my concerns. I appreciate his patience. Just going over all the possible points of interest to bring some of these concerns to that sort of attention was the reason for doing it. This way the detail is there to reference and perhaps at a later date reconsider.

I have no control over how powerless teams may feel in the face of this technical information and I'm trying to help them feel less powerless when things go wrong and they want to dig (I did twice frame the arguments in plain simple examples minus all the acronyms and abbreviations). I offered repeatedly to take this private or take it elsewhere. If that was the concern I would have done so without a hesitation as long as someone can be bothered to facilitate it. I have no ability to open new topics on this forum. I have no interest in being saddled with another forum just to discuss this right now.

This is all relevant to this topic because it shows how one gets a response when they have concerns.
One of my key points remains that this is a confusing and inefficient way to facilitate this.

Al Skierkiewicz 05-09-2012 19:10

Re: Team 548 Einstein Statement
 
Brian,
Also checked during the Einstein weekend. While placing the radio deep inside the robot, behind metallic parts, behind the the bumper, on the floor of the chassis pan or behind an arm are all very bad locations, the orientation of the radio with respect to the field, the Cisco router or other robots varied the received signal level by less than 10dB. Even if the radio was located low on the robot and on the outside, turning the robot so that the radio was facing away from the Cisco router only made about a 10 dB difference, far less than I would have thought knowing the interior construction of the radio and placement of the PIFA antennas. In fact nearby objects only started to affect signal strength when within 2" of the top or sides of the radio.
So for teams, in general, mount the radio where it is protected from contact with other robots and mounted so that the LEDs are visible. The radio should not be mounted near high noise devices like the leads to CIM motors or FP motors or near the 5 volt regulator. The bottom of the radio already has shielding so mounting it on metal should not be a problem but if mounted vertically, perf stock or lexan would be the preferred backing material. There didn't seem to be a vast difference in signal between horizontal and vertical mounting although horizontal will likely give the best overall coverage. When looking at the face of the radio, there is an antenna on both the left and right sides so don't mount the sides against robot frame. Both antennas are used all of the time for both receive and transmit to achieve the highest throughput per 802.11 specifications. Secure the power lead in some fashion so that it won't wiggle. A simple stick on tie point mounted on the top of the radio, with a wire tie securing a loop of the power cord is the best option. I do not recommend hot glue as this makes repairs almost impossible when applied correctly. When not done correctly, the glue will give you false hope, likely mis-align the connector or damage the jack on the radio and the connector will fall out when you need it the most. If you choose to use a Radio Shack connector, insure it is the right dimensions. Often teams will use a connector meant for a larger diameter center pin. The result is noise on the power line during robot movement. Over time, as the connection becomes dirty, radio reset will be the result. If you are placing the radio near moving parts check clearances for all positions of the moving part. The metal of the moving part should not cover the radio when at rest or fixed position and it should not pass within two inches of the top or sides of the radio. Do not make severe bends in the ethernet cables, the max spec as I remember is 1" minimum bend radius for full bandwidth. Secure the ethernet cables near the radio so that they do not put strain on the jacks on the radio or pull out with robot movement. Putting a small loop in the cable will prevent any strain on cable or radio. Above all, make sure the 5 volt regulator is connected to the radio output on the PD and all wires are secure and insulated. Mount the regulator where it will be protected and does not move within the robot.

techhelpbb 05-09-2012 19:52

Re: Team 548 Einstein Statement
 
I have additional questions about some of this.
This is not to say that I doubt the skill of the testers.
I still foresee other interactions but they are not issues teams can solve.

I think the last 2 posts (Al's and mine) summarize the basics of what a team might get from this topic. I suspect the other points are of only minor value compared to the posts for the last 2 pages before this which are more general. So I think I'll leave this here and see if I can find another place to discuss those details.

My hope is that what is already here will impact the available logged data decisions at the least. I am perfectly fine with that as a compromise. If problems such as those I touched on do appear at least this will serve as something to review.


All times are GMT -5. The time now is 23:13.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi