Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   General Forum (http://www.chiefdelphi.com/forums/forumdisplay.php?f=16)
-   -   Team 548 Einstein Statement (http://www.chiefdelphi.com/forums/showthread.php?t=107906)

Alan Anderson 24-08-2012 17:10

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1183017)
It's wonderful that FIRST is trying to make these links as reliable as possible but we as the robot builders can help by making our robots less dependent on the wireless network being entirely reliable for every instant we use it.

Unfortunately, "we" can't do much at all about the robot's dependence on the network. When the cRIO isn't getting continuous "enabled" signals from the Driver Station, it shuts down all the motors and other actuators. That's something completely beyond the control of robot builders.

What kind of help were you thinking of?

GaryVoshol 24-08-2012 17:32

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Adam Freeman (Post 1182840)
I am not quite sure what your post has to do with this topic. It's not even correct. The field was given the "all clear" signal and the scores/winners were announced before the nets were cut. Not sure what was going in with the refs (maybe Gary Voshol can clarify). I know there were some upset mentors from the blue alliance. Heck, even the drive coaches for the winning alliance were less than enthusiastic with the way things ended.

Without disclosing any confidences, all I can say is that the refs did not have an extended discussion about F-2 and the results. Any observers may have seen us talking generally among ourselves as we ended the event.

techhelpbb 24-08-2012 21:55

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Alan Anderson (Post 1183062)
Unfortunately, "we" can't do much at all about the robot's dependence on the network. When the cRIO isn't getting continuous "enabled" signals from the Driver Station, it shuts down all the motors and other actuators. That's something completely beyond the control of robot builders.

What kind of help were you thinking of?

The next line of the post you quoted:
"If the parts of the robot we as the robot builders control are less dependent on the reliability of the wireless network it will be much harder for an unforeseen situation over a short period of time to decrease our competitive performance."

I am giving FIRST credit that no error on the scale of the FCA will escape the FRC development process in the future. If it does I assume FIRST will find a way to replay the matches or figure out the resolution as quickly as possible. Obviously to some extent the existing robot signal/status lights (RSL) and driver's station diagnostics are a start to let the people on the field find problems.

Last I checked the disable state does shut down all the motors and actuators in the hardware of the digital sidecar but not all the digital I/O (GPIO), or the I2C, or the User1 light on the cRIO. The rules seem to consider this with how things that can cause movement are connected to the digital sidecar. However I don't think the rules prohibit indicator lights you can see off the field correctly designed to be connected to the GPIO pins or I2C.

Certainly code can execute in the cRIO regardless of enable. If you loose communications you loose your ability to send status back to your driver's station over that link. Of course you loose your ability to move for the sake of safety. However you retain the ability to manipulate those I/O and can use those to deliver status information that might be valuable even if you can't communicate to the driver's station or get near the robot. You'd be able to use the flash memory as well while disabled and then you can retain that data even after the robot is powered off (course it is flash memory so wear leveling could be a concern).

Although that doesn't improve your movement situation if you end up loosing communications long enough to miss enable. Using something like that along with the information communicated by the RSL and the driver's station would give you a lot of information that your code is doing what you think it is, when you think it is, when the field thinks it is, even on a competition field where you can't get near the robot or even communicate with it. In fact if the robot can't communicate with you it could signal that (some might think that redundant but you never know it might have helped Team 118 on Einstein).

Separate from the issue of not getting enabled:

Sending large amounts of data back and forth consistently using TCP and making that critical to the control of the robot is going to increase the chance that a momentary interruption or delay will cause adverse consequences. TCP is going to try to deliver that data but who knows how long it'll take.

UDP, which isn't a reliable protocol however, will still generate useful communications. Someone can create a transaction system with UDP that can actually loose messages and ignore messages unlike TCP trying to help by pretending the link is reliable (when it might be busy or experience some wireless issue). The FMS seems to use UDP a lot itself.

My concerns fall along the same lines as what happens when a critical sensor has become disconnected and you don't detect that it's been disconnected. However the code expects input from that now disconnected sensor in a loop from which you cannot escape and so everything is stuck (it blocks).

What will happen if you loose your camera feed to the driver's station or it suddenly starts getting really dysfunctional and that's the only choice you have for some critical function? What will happen if your driver's station is running code to process that video and the camera feed is disrupted? What will happen if your robot is enabled and keeps waiting for information from the driver's stations and that information is delayed? What will happen if you put a lot of debugging information in to send back to the driver's station and it takes longer than you expected based on tests back home? What would happen if you send a lot of packets to the cRIO and your code didn't read them fast enough and you start to overflow the input buffer (buffer overflow 'exploit' right from the Einstein report starting on page 13)?

Obviously if someone can actually defeat your ability to see the enable your movement driving outputs from the digital sidecar will disable for safety (excusing momentum). Then you have to consider the physical status of the actuators that stopped if you return to the enabled state from that unexpected disabled state.

However, the system can obviously loose packets so the idea of continuous enable transmission seems to give the wrong impression (it is continuous but you can loose some packets and not get disabled). There's even a counter for missed packets in the Field Monitor Software (FMS) and the manual where it says: "Typically there are some lost packets. In a very tame wireless environment, this number will be less than 100." (Page 49, Rebound Rumble FMS manual, Rev. 0). Along with the average time it takes for traffic to go from the driver's station to the robot and back (average meaning not necessarily instantaneous round trip time). That information comes from the driver's stations to the FMS about every 100ms from what I've researched. Unfortunately every interruption to the link is going to delay delivery of TCP packets and might actually loose your UDP packets entirely. Obviously the counter existing with that note in the documentation for the field operators indicates that this happens at least 100 times in a very tame environment, what about a not so tame environment? Also that counter is for each team.

I can provide the links to back this up but I'm not sure I want to be linking the FMS manuals to this site. It might not stand the test of time and I'm not sure if there are rules about it.

Greg McKaskle 25-08-2012 00:48

Re: Team 548 Einstein Statement
 
Quote:

..You'd be able to use the flash memory as well while disabled and then you can retain that data even after the robot is powered off (course it is flash memory so wear leveling could be a concern)..
The flash drivers already implement wear leveling. The cRIO was designed as a monitoring/control device with a highly reliable file system and is used by industry to log data in remote and harsh conditions. Log files that detail how your robot operates are a good technique independent of any communications issues. Knowing whether the robot leaves auto, extends the arm too far, or dies entirely is helpful to everyone. Please keep in mind that the logging isn't free and it is possible to log so much data that the cRIO will not have the CPU needed to drive the robot.

Quote:

In fact if the robot can't communicate with you it could signal that (some might think that redundant but you never know it might have helped Team 118 on Einstein).
118 DS logs clearly showed what was happening on Einstein regarding communications. It showed that the robot was being told to enter auto, the CPU spiked to 100%, and the robot stayed in communication for several seconds longer responding with its voltage and other fields but never indicated that it completed processing the auto command. There were plenty LEDs on 118, and if the code had been executing as expected, if there had been a comms issue, they could have been used to show extra info and logging could have helped as well. The difficulty with 118 was identifying how and why the CPU went to 100%.

Quote:

Someone can create a transaction system with UDP that can actually loose messages and ignore messages unlike TCP trying to help by pretending the link is reliable (when it might be busy or experience some wireless issue). The FMS seems to use UDP a lot itself.
All traffic from FMS to DS to Robot and back are implemented using UDP with redundant info and some tracking data to calculate trip times and lost packets. TCP is used for smart dashboard and by dashboard cameras.

Quote:

... However the code expects input from that now disconnected sensor in a loop from which you cannot escape and so everything is stuck (it blocks).
The code on 118 was unique to their gyro reset done as auto began. I don't think anyone would recommend putting a tight loop into the code waiting for a sensor condition. The 118 SW mentor didn't know the code had been added. If the CPU hadn't pegged in the blocking loop, the dashboard and robot behavior would have helped identify that the gyro was disconnected.

The buffer issue mentioned was a secondary issue that explained why 118 couldn't be rebooted from the DS. It didn't directly contribute to the failure. It is an artifact of the version of VXWorks that runs on the cRIO. It allows for improperly written code in one task to impact the communication of other tasks. The buffer was full, not overflowing, and there was no exploit.

The robot disable occurs when no DS commands have been received for 100ms. The packets are sent every 20ms. So it will take 5 sequential packet losses to trigger a disable. The robot will be enabled as soon as another packet arrives, perhaps as short as 20ms. The Einstein communications, as measured and logged by the DS, was very quiet, almost equal to an ethernet cable, except for a field-wide burst in the final match. This may have been external noise such as a lightening strike. Logs of the Einstein robots during qualifications showed far more interference but no disabling caused by it.

Please ask if there are other questions about the Einstein Report.
Greg McKaskle

techhelpbb 25-08-2012 03:41

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Greg McKaskle (Post 1183110)
The robot disable occurs when no DS commands have been received for 100ms. The packets are sent every 20ms. So it will take 5 sequential packet losses to trigger a disable. The robot will be enabled as soon as another packet arrives, perhaps as short as 20ms. The Einstein communications, as measured and logged by the DS, was very quiet, almost equal to an ethernet cable, except for a field-wide burst in the final match. This may have been external noise such as a lightening strike. Logs of the Einstein robots during qualifications showed far more interference but no disabling caused by it.

I'm unclear on this:

The shortest time delay in the 5 possible RSL light status patterns is 100ms for the off time of the teleop enabled mode.

So if you miss 100ms of communications, become disabled, then 20ms or even 60ms or 80ms passes before you re-enable from a DS packet you might not notice the change in the pattern of the RSL pattern even though you've disabled briefly.

The charts tab in the DS shows when the robot is enabled or disabled even for short periods of time.

The DS sends data to the FMS every 100ms and the FMS logs every 500ms in the match review.

So is it possible for the DS to notice that the robot transitioned from enabled to disabled back to enabled between these 100ms bursts back to the FMS and not report the robot state transition because it happened between reporting intervals to the FMS?

Greg McKaskle 25-08-2012 08:43

Re: Team 548 Einstein Statement
 
Quote:

.. you might not notice ..
Correct. The RSL is a pretty crude indicator of the robot state. Keep in mind that a human blink is at least 100ms. I've also reviewed the logs with the drive coach and shown them brief disables that neither they nor the drivers noticed during a match. I've also seen robot logs, very successful robots, that only process the teleop every 60ms and they seem fine with the rate. In other words, they choose to ignore two out of three control packets even though the CPU usage was quite low.

Actually, the FMS<-->DS comms are at 20ms as well. The FMS logs are somewhat slow from what I've seen -- between 2 and 4 points in a second. The DS reports everything it knows to the field. But at this point, the DS log data is the best indication of what took place on the robot and with the comms.

Greg McKaskle

techhelpbb 27-08-2012 10:52

Re: Team 548 Einstein Statement
 
Am I correct that the missing packet indicators on the FMS and the lost packet counters in the charts tab of the driver's station are counting only the UDP packets that FIRST is using for DS<->Robot communications? It's clear that the average round trip calculation depends on those packets.

Is there any additional monitoring in place on the current fields to track bottlenecks, lost packets, and other TCP/IP behavior while the field operates besides those counters? I mean besides one of the driver's station operators peaking at that with System Monitor?

Is there any kind of prioritization for the UDP traffic imposed by the field and D-Link AP?

What process is in place to prevent the improper configuration of the Windows TCP/IP stack in the driver's station? Specifically with respect to TCP sliding windows and window scaling?

I ask these questions because of situations where UDP packet traffic sees the unintended side effects of TCP bottlenecks. The effect that concerns me is discussed at length in this link:
Characteristics of UDP Packet Loss: Effect of TCP Traffic

If we can see 100 UDP packets disappear during a match in a very tame wireless environment, how much TCP bottlenecking (and packet loss) is really going on impacting the Smart Dashboards and TCP based web cameras?

You can write software to get to all the raw counters you can see in the System Monitor on Windows like this:
Raw performance data class

It's not clear to the me the driver's station is using the Windows API to collect the lost packet information.

Though even if you did use that source of information you could only monitor with respect to the TCP/IP stack of driver's station. I suppose using the UDP packets to track performance like this was easier than modifying the D-Link AP to run DD-WRT or OpenWRT and passing back it's TCP/IP status statistics to the driver's stations and back to the field. Keeping in mind that the cRIO can't see the traffic generated by the other devices not addressed to the cRIO and plugged into the D-Link AP switch (the D-Link AP doesn't seem to support any kind of port tap to bypass the switch and I doubt it would be wise to ARP poison it).

Also which IP stack for VxWorks is in the cRIO: the BSD stack or the Interpeak stack? The older BSD stack source code supports the features that concern me as can be read here:
Wind River VxWorks TCP/IP stack

It sounds from the description of the buffer configuration above it might have the Interpeak stack in it?
I'm curious to see if there's more than RFC2581 in that TCP/IP stack for congestion control.

Brian

Racer26 27-08-2012 17:59

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Basel A (Post 1182968)
Even after FIRST's thorough investigation, all but one case of likely FCA cases were only considered "likely."

I find it disturbing that you're prepared not only to diagnose a robot failure as a complex problem based on minimal evidence, but also ready to indict an individual, about whom you know exactly one thing, of match-fixing at the highest level.

As the earlier poster mentioned: The only "confirmed" case was the one admitted to.

The "likely" cases are ones that the Einstein committee (18 industry experts, plus the 12 teams, 548 included) agreed were reasonably likely to ALSO be caused by the FCA exploit, based on evidence available in terms of match video and DS logs, and the circumstantial evidence of multiple eyewitness accounts stating that they had viewed the individual punching away on the Galaxy Nexus phone at a screen containing numbers of the teams on field at various points throughout Einstein, one reporter distinctly remembering 1114 being targeted. Many people seem to be overlooking (or at least glazing over for the purposes of peacekeeping) this part of the report, when in actuality, its relatively damning.

As for the OTHER cases, outside of Einstein? Nobody investigated them in a proper investigation (at least not yet, to my knowledge, I read in this thread that FiM is conducting something related to MSC), so we may never know for sure.

The match video I've found exhibits the same symptoms as those seen on Einstein and documented in the Einstein report as what would be visible to an astute observer watching a video from a distance (a flashing RSL indicating robot power is still present, and a Flashing alliance station wall light indicating a lack of communications with the robot). I fully agree that these alone do not a complete diagnosis of FCA make.

However, with the circumstantial evidence that a 548 mentor was tampering with the system in one admitted case, plus several other likely cases, according to 42 experts (18 industry + 2x12 team reps). This individual was presumably intending to influence the outcome of the matches, and that makes it reasonably believable to me that these other matches I can find with FCA-like symptoms, being cases where the disabled robot(s) being disabled would pose a distinct advantage to 548, would probably also be attributable to the FCA exploit.

Am I ready to indict this individual of match fixing at the highest-level? YES! They ADMITTED to that, and that's why they're no longer welcome at FIRST events! However, yes, I further believe that they fixed many more matches than they've admitted to, and I know I'm not alone in that belief.

As I stated in my earlier post though, I hold no grudge against 548, because I'm willing to take the TEAM at their word that this INDIVIDUAL was acting ALONE and without the team's knowledge. Its not fair to the present and future students of 548 to have to be punished for something a mentor of theirs did sometime in the past. They're a 3-time district chairman's award team. They are doing good things, and the kids whom they're trying to have an impact on don't deserve to be chastised by the community at large for the actions of someone they trusted. I'm sure they're probably MORE devastated than the rest of the FIRST world, since this mentor violated their trust and damaged their team's hard earned image as a leader in possibly irreparable ways.

Greg McKaskle 27-08-2012 19:39

Re: Team 548 Einstein Statement
 
Edited to condense the questions. Answers marked with ***'s.
---------------------
Am I correct that the missing packet indicators on the FMS and the lost packet counters in the charts tab of the driver's station are counting only the UDP packets that FIRST is using for DS<->Robot communications?
*** Yes. The trip time and lost packets refer to the control/status loop between DS and robot.

Is there any additional monitoring in place on the current fields to track bottlenecks, lost packets, and other TCP/IP behavior while the field operates besides those counters? I mean besides one of the driver's station operators peaking at that with System Monitor?
*** If those other aspects impact the control/status loop, then the CSA, inspector, or FTA will use other system tools to determine what is causing the problem. The DS monitors a few cRIO factors such as CPU.

Is there any kind of prioritization for the UDP traffic imposed by the field and D-Link AP?
*** In 2012, default settings were used. The report indicates that QOS may be configured in coming seasons.

What process is in place to prevent the improper configuration of the Windows TCP/IP stack in the driver's station? Specifically with respect to TCP sliding windows and window scaling?
*** Nothing except for overall monitoring of the control/status loop. If that is working poorly, the CSA, inspector, or FTA may decide to look at TCP configuragion, but honestly, that is getting pretty obscure.

If we can see 100 UDP packets disappear during a match in a very tame wireless environment, how much TCP bottlenecking (and packet loss) is really going on impacting the Smart Dashboards and TCP based web cameras?
*** That is less than one packet per second. If TCP is having issues and retransmitting, that will likely impact the UDP and the FTA or others would look into it.

It's not clear to the me the driver's station is using the Windows API to collect the lost packet information.
*** It is not. If you believe that information would be helpful instead or in addition, I'm sure it can be added.

Also which IP stack for VxWorks is in the cRIO: the BSD stack or the Interpeak stack?
*** I don't have a cRIO with me, so I can't answer your question.

Greg McKaskle

techhelpbb 28-08-2012 12:14

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Greg McKaskle (Post 1183324)
If those other aspects impact the control/status loop, then the CSA, inspector, or FTA will use other system tools to determine what is causing the problem. The DS monitors a few cRIO factors such as CPU.

What tools besides the Windows Performance/System Monitor and ping are available to everyone to diagnose such a situation?

Traceroute won't do much good considering the robot is bridged.

Ping works to some extent because it's ICMP echo and on layer 3 therefore while it's wrapped in IP it's not really TCP or UDP. So if you start ping toward the cRIO you'll see the congestion that is impacting TCP and UDP which are on layer 4. Unfortunately, if you ping the cRIO from the driver's station you'll see the congestion but not necessarily at which point in the communications path the congestion exists. In fact ICMP has not just the ability to detect congestion it also has the ability to throttle inbound traffic that causes congestion of the local receive buffer with the source quench message, which if the sender responds to (and it should) should cause it to back off.

To my knowledge current Microsoft Windows TCP/IP stacks honor source quench requests if they are doing the sending but do not generate them when they receive. Instead it's common for devices that have filled their input buffers to simply drop packets. This behavior appears to be the same in the older VxWorks stack(s).

VxWorks BSD TCP stack
Code:

/*
 * When a source quench is received, close congestion window
 * to one segment.  We will gradually open it again as we proceed.
 */

If someone were to create some ICMP source quench packets they could throttle the remote senders back when their receive buffer is almost full, completely full or just because they need to alter the status of the network communications. For example to force a sender with a large window to reduce it ASAP so it doesn't impact other traffic. (There's nothing stopping a DiffServ QoS as described directly below from using the ICMP source quench in it's own way. No idea which specific hardware FIRST might use for the QoS function so no idea if ICMP source quench will be present or how it will be used.)

Quote:

Originally Posted by Greg McKaskle (Post 1183324)
In 2012, default settings were used. The report indicates that QOS may be configured in coming seasons.

I presume FIRST will implement DiffServ which is stateless? I suppose one could use Intserv but that requires reservations (RFC2210) to operate and while it's possible VxWorks in the cRIO could pull that off the Axis cameras do not support it. Axis themselves did confirm the cameras support differentiated services code point (DSCP - RFC 2474) per function (audio, video, alarm...)

DiffServ has some end to end issues that are worthy of noting:

1. DiffServ doesn't track all statistics for all open flows (plus side it needs less space to operate, downside it isn't as aware of long term quality issues).

2. DiffServ in high packet loss situations tends to give you the choice to scalp one class to get additional bandwidth for another class (but can't be sure that in the long term that the scalped class is assured bandwidth either).

3. DiffServ has a compensation class but that only helps you if you have some idea of the limits of the uncertainty in the network and if the devices can handle that behavior. In short, if you give the compensation class a large amount of bandwidth to pull from it'll allow more flow to another class to make up for a shortage, but if the packet loss is high you need to make the compensation class larger and that still won't assure that high packet loss over a short period won't reduce the ability of traffic to flow at all.

This is only made worse because the TCP sliding windows and window scaling I noted above will very likely not be smart enough to differentiate between a congestion issue and a packet loss. This congestion issue with TCP has existed for a very long time. It's the equivalent of a hole the size of a dime and the need to pass a half dollar. Sure if you grind the half dollar down long enough it'll fit through the hole but it's going to be unpleasant. The solutions to this problem are called TCP congestion avoidance algorithms. The choice of which algorithm you use can have a dramatic impact on your network performance. TCP-Vegas as implemented in DD-WRT, OpenWRT, Linux and BSD can more effectively respond to packet loss from congestion and packet loss from the radio layer on largely unidirectional links (IE: the TCP video is a large amount of bandwidth headed one way). It is unclear to me at this time if Axis cameras, Microsoft Windows, or VxWorks supports TCP-Vegas in their TCP/IP stack. Keeping in mind sometimes you need to tune the queues with TCP-Vegas.

On the one hand a DiffServ QoS unit sitting on the field side will throttle back the senders it can actually communicate with over a short duration which should impact their maximum flow rate over the longer duration. On the other hand, when the packet loss of the network due to WiFi issues pops through (or someone causes packets to be dropped at the radio level) the senders can't see the QoS unit on the field side so they'll resort to their TCP congestion avoidance algorithm of choice. With TCP-Reno/New Reno (we are certainly using one of these now) depending on how that sits it could still cause flooding moments after a packet loss. A handy example:

Performance Evaluation of TCP Variants In WiFi Network Using Cross Layer Design Protocol and Explicit Congestion Notification

Quote:

Originally Posted by Greg McKaskle (Post 1183324)
Nothing except for overall monitoring of the control/status loop. If that is working poorly, the CSA, inspector, or FTA may decide to look at TCP configuragion, but honestly, that is getting pretty obscure.

Why do an analysis at all? Why not just set up the field and optimize the TCP-IP parameters as a baseline for the acceptable Windows OS for the driver's stations and the cRIO? Then distribute those settings in a simple 'registry file' export from RegEdit or RegEdt32 for installation or comparison.

Probably should also note the following:

1. It is very unlikely the FIRST driver's station will need to become the local master browser. One could turn that feature off in the LanMan parameters in the Windows registry. Even if the students take that laptop off the field and use it else where it would only really be an issue on a network in which they are the only Windows computer and no Samba is running. This NetBIOS feature serves no purpose in the current field and robot systems but it'll generate a handy election and quite likely use NetBIOS over TCP-IP to do it. As an alternative one could turn off NetBIOS over TCP-IP which does have a GUI option to change. If anyone is interested on more details look at the Samba project.

2. IPV6 ought to be turned off. Especially in Windows 7. Windows 7 has a perverse tendency to use IPV6 first and IPV4 later and not only does IPV6 have so many security concerns that I could fill a book I don't think any device we have supports it unless FIRST is using the InterPeak stack configured for it in the cRIO. I'd be interested to know if there is IPV6 usage in the FIRST ecosystem.

One could take this easily a step further. They could write a simple program or even a script to back up the relevant local Windows system registry entries. Make these changes in preparation of the driver's station function. Then return the original settings when the driver's station is done. In point of fact one could even set a System Restore point but that might get a bit out of hand with regards to storage since you only need to alter a trivial number of keys. On the plus side a program could easily locate the keys to disable the IPV6 protocol for the wired adapter you'll be using.

Quote:

Originally Posted by Greg McKaskle (Post 1183324)
That is less than one packet per second. If TCP is having issues and retransmitting, that will likely impact the UDP and the FTA or others would look into it.

In a 135 second long match (2 minutes, 15 seconds) you're absolutely correct that if the UDP packet loss of 100 packets per match was distributed evenly that would be less than one packet per second. However, as we agree your enable/disable timer in the cRIO will time out in 100ms. So you can loose 3-4 driver's station generated UDP packets with the enable/disable state in them in a row before the robot runs the 100ms timer out and disables. In 1 second you have fifty 20ms intervals. In theory if the timing is perfect (and let's face it the Windows TCP-IP stack will not reliably send those UDP packets precisely every 20ms and the latency of the link to the cRIO will impact the timing) then you can loose 40 of those UDP packets per second and still not be disabled and you have 135 seconds in which you could do that.

The reason for the math is that if you look at the link below again with my concerns, the TCP sliding window and window scaling functions have their effect over a duration than can easily be in seconds (see Figure 2-5 in link below). So it's possible for the trouble to start, build, drop a UDP packet or a bunch, cycle back, start, build, drop another UDP packet or a bunch. Meanwhile the entire time packets are dropping and devices are making their choices of congestion avoidance process (each algorithm has a set of processes at work) during and after each packet drops. Not just because of radio level issues but also because of congestion and with TCP-Reno/New Reno there really is no way to tell the difference unless there is some mitigation inserted like ICMP source quench.

Characteristics of UDP Packet Loss: Effect of TCP Traffic


Quote:

Originally Posted by Greg McKaskle (Post 1183324)
It is not. If you believe that information would be helpful instead or in addition, I'm sure it can be added.

I think the best visual representation of that data is a difference from data point to data point from the original start values of the TCP statistics to the final values over time. Pretty much Windows System Monitor already provides this facility. At least it's something to help people diagnose their own issues. Perhaps highlighting it's value will be helpful to some people.

Unfortunately neither the driver's station charts tab nor the Windows local TCP-IP stack statistics show the end users where precisely along the communications path congestion or momentary packet loss occurs. In the same way that showing the average round trip doesn't represent the instantaneous round trip time or even the time to get to the robot versus the time to get to the driver's station. The devices that can currently most determine whether packet congestion is due to packet loss in the radio layer or congestion at the wired sides of the radio links are the APs. There is no facility in TCP-Reno/New Reno itself to calculate round trip time (RTT) which would illuminate that data at any moment is disappearing (it uses a timer usually 200ms-500ms and hence the several second escalation). Even if there was a Intserv QoS unit in the field side of the communications path it wouldn't be able to determine the cause of packet loss from that vantage point (Intserv QoS does actually have information in the duration about the flows through it). In a way the UDP traffic DS<-> robot with the round trip timer represents an addition over TCP-Reno/New Reno that could better arbitrate issues but that driver's station UDP traffic is both too slow to really force down the TCP sliding windows and window scaling for it's own benefit (I write this in relation to the UDP/TCP link I posted above so if it's not apparent please reread that link) and doesn't implement ICMP source quench that I can see. If the driver's station implemented ICMP echo (aka 'ping') and ICMP source quench you could interleave the UDP packets and ICMP echo requests and monitor congestion on layer 3 and layer 4 to every IP device on the robot (to which you could then send individual ICMP source quench messages). Course if DD-WRT or OpenWRT was on the robot AP we could not only send back the statistics to the driver's stations and through that the field we could also use TCP-Vegas as the TCP congestion avoidance algorithm under the right circumstances which would need to include support on the Cisco end. The big draw back of TCP-Vegas I know of shouldn't matter to a FIRST system. TCP-Vegas doesn't play well in live routed environments if you change the routes, because that action invalidates the round trip data.

Windows does have Compound-TCP as a congestion avoidance algorithm since Windows Vista and it's been partially backported, and Linux has support for it since 2.6.17 but I'm not sure it works for Linux at this time. So far as I know it's disabled in Windows Vista and up by default. It can be enabled like this:
How to enable CTCP

CTCP helps pull down the TCP-Reno congestion avoidance algorithm by maintaining 2 windows. Again something else on the driver's stations to consider. Also please take note that you might have to enable more than CTCP and the timestamps might be an issue for the devices on the robot. The VxWork's older stack seems to support it, so the newer InterPeak stack should as well. The bad part of CTCP is that it would more appropriately mitigate the Windows driver's station effect on the network but it wouldn't by necessity impact all the other devices and what congestion avoidance algorithms they'll use and cause issues with. Unlike the field and robot APs this CTCP algorithm can only effect the communications between the Windows driver's station and whatever else it talks to, so usually the cRIO and FMS. So in effect it takes one source of trouble out of the picture but leaves the rest which will use whatever TCP congestion avoidance algorithm they like with whatever consequences might follow.

I would have suggested TCP-Cubic that is implemented in Linux Kernel 2.6 backported to 2.4 but there's an issue with it that concerns me regarding it's remaining ability to burst after a packet is lost over the radio. Without testing it's hard to say but the nature of all the video data going back to the driver's station might favor TCP-Cubic. I'm just concerned that it's not going to behave as well as it could if TCP-Reno/New Reno remains in operation on the same network and that may not be entirely avoidable. Mind you TCP-Reno/New Reno on the same network as TCP-Vegas will still be unfair to TCP-Vegas but from what I've seen not quite as bad (course that's on wire).


*SUPER SIMPLE VERSION:*

You have a hole big enough for a dime but you want to put a half dollar through it. So you grind up the half dollar and put it through that hole little by little. You have a choice of processes to make sure that as much of the half dollar gets through that hole as quickly as possible. I'm merely suggesting a different way to react to loosing little pieces of the half dollar which you eventually find. I think the way it's handled is slower than it needs to be to get just as much of the half dollar through that hole.

Just to make that even more interesting more than one person is trying to send their own half dollar through that same hole at the same time.

The half dollar is your data.
The hole is your network.
The multiple people are the multiple network devices.
There are lots of reasons you all loose some of the ground up half dollars (which are the packets).
There are different solutions (TCP congestion avoidance algorithms).

Now I'm hiding all my half dollars before someone wants a demo.

Brian

techhelpbb 29-08-2012 12:23

Re: Team 548 Einstein Statement
 
I'm no longer able to edit the post above. So I'll append additional information like this:

If someone reads the section above where I described DiffServ they'll probably wonder why I keep suggesting field side QoS. The D-Link AP supports some QoS in the form of WMM. The Cisco 1252 supports some QoS in the form of 802.11e. As far as I can tell that was not turned on this year or any previous year. The reason I've discounted it as an option is that a full DiffServ implementation already ignores a lot of information to reduce the resources it requires to perform QoS. 802.11e and WMM ignore even more information. On a wire based DiffServ implementation there are technologies that look at packet flows not tagged with DSCP by the network devices sending them or can guess at the proper QoS class for traffic by the source or destination information in the packets (like Cisco nBAR). This information is not acted upon with 802.11e or WMM because it would require serious stateful packet inspection and the resources that usually demands. So even if someone turned on 802.11e and WMM the devices that send to the AP link would need to tag their packets with DSCP so the APs on either end would know what class the traffic is supposed to be. Such tagging is supported by the Axis camera as noted above. I see no reason that VxWorks can't do the DSCP tagging either. However, any other network devices that might send traffic over the wireless link would be questionable. For example the COTS rule allow a laptop on the robot and that laptop might be sending video, images, or even streaming from VideoLAN back to the driver's station. Such a laptop generally wouldn't have the easily added ability to tag it's traffic with DSCP. The 802.11e and WMM devices would have to default that untagged traffic as classless and the contention mitigation process they use in the QoS might therefore be unfair to the people that implement such solutions. Worse it's hard for FIRST to know whether or not you'll use certain types of traffic. For example you might not use a video feed back to the driver's station. So it would be harder on FIRST to explain why they might blindly impact the performance of your robot devices to enforce a QoS policy with 802.11e and WMM that considers devices you don't actually have. FIRST could alter the QoS parameters on the robot AP and the field AP if your robot doesn't use certain classes but then that adds further convolution to the configuration process for both (and remember a lot of people cycle through the fields with their robots).

To extend my concerns about DiffServ to 802.11e and WMM. In the most abstract simple sense 802.11e and WMM categorize the classes defined by the DSCP tagging into a smaller number of classes. Those classes exist before the radio layer. So the radio layer doesn't really change. It's just that the data presented to the radio layer to communicate is prioritized differently than it might have been with a single queue going into the radio layer. So now the radio can spend more time trying to effectively communicate certain classes of traffic over certain other classes of traffic. Again, if the radio layer is interrupted or has a difficult time finding time to send certain classes of traffic the QoS function of the APs has to decide what to do with all the data coming into it that it has no way to rid itself of. Most devices as noted above will simply start ignoring and therefore dropping inbound packets. That packet dropping behavior triggers one of the TCP congestion avoidance algorithms mentioned above. Which in turn means that: if you have several devices in different DiffServ classes lumped into a smaller number of categories the devices within each smaller number of categories will impact each other's access to that category of service going over the radio link. Essentially the QoS process will really only help some devices rise above others when the communications link actually can pass traffic. Since DiffServ isn't completely aware of all flow statistics it may not notice that it's robbing from one class to give priority to another class when you account for packets lost in the radio layer (someone that knows exactly what I'm talking about will consider MAC parameter tuning to mitigate that but that's tuning and that's a lot of extra considerations for a lot of different robot designs). These caveats justify why the TCP-Variant (TCP-Reno/New Reno, TCP-Vegas, TCP-Cubic, Compound-TCP) still matters even with 802.11e and WMM.

So basically I figure implementing 802.11e and WMM represents way more work that FIRST is willing to put up with during the field operations (and worse to really tune that would be a field by field, robot by robot process so it's really even more painful than it sounds). More of that work could be absorbed in a field side wire implementation of DiffServ for QoS. Of course if something at the application layer (the driver's station software) were only smart enough to look back down the links to all the network devices (which it can get because the teams can configure it as such) one might be able to craft a flow control process that works even without any QoS device. Such a flow control process could use ICMP source quenches as outlined above already and could be designed to be aware of the priority of communications for specific network device functions on the robot as dictated by the application layer uses (so a driver's station would carry all the specific tailor made TCP/IP traffic priorities for a robot and the teams could start to figure out those priorities without a field to work with).

The process of using ICMP source quench is a primitive form of Explicit Congestion Notification (ECN). For about 20 years it's been known that messages such as ICMP source quench can be used to cause issues for a device that supports it. From the perspective of Internet security allowing anonymous ICMP source quenches to slow down your ability to send data obviously is not the best idea. Most firewalls and routers for the Internet allow one to block ICMP entirely including both echo (aka 'ping') and source quench (there's no reason it can't be used in a private network or by FIRST as long as it works). In comparison there's a newer standard less available to the application layer defined in RFC3168 called specifically the Explicit Congestion Notification. I did not suggest that anyone use the newer ECN even though it's been around for more than 10 years (though my instructions to turn on CTCP above mention how to enable the newer ECN in Windows Vista and up). There's a bunch of issues with the newer ECN and some of them impact the performance and connectivity to D-Link devices and certain websites. Opening a hole in a driver's station firewall for ICMP source quench on the local network is one thing (odds are good your local Internet routers will block attempts to pass it to a driver's station while it may be connected to the Internet). Leaving strange behavior in the driver's station that might psuedo-randomly appear in other use is quite another matter and that might happen with the newer ECN. Another upside of implementing ICMP in the driver's station software is that you can use both echo and source quench just by doing that so it's likely to be more compatible and quicker to develop.

On a separate topic from what I just wrote about but still related to my post directly above:

It might benefit FIRST to log the information from the Windows local TCP/IP stack in the driver's station software as I suggested and provided information to do above. This would enable FIRST to have an authoritative set of entries in the match log synchronized with the driver's station software communications with the field and robot. A log that is also retained by the same process as the current DS charts and logs. In that previous post I only considered the visualization of that data when it really does matter if you can get that data in a reliable way long after the matches end.

Again on a separate topic from what I just wrote about but still related to my post directly above:

Also as a bit of a footnote, I suggested disabling IPV6 above on the driver's station. Please be aware that if you disable IPV6 and try to create a new Windows 7 Home Group it'll fail. Windows 7 Home Groups use Peer Name Resolution Protocol (PNRP) and that requires IPV6. You can disable and re-enable IPV6 it'll just break Home Group's ability to resolve names while it's off (there's still the cache). I doubt any device in the FIRST ecosystem depends on a Windows 7 Home Group and there are plenty of better ways to move your stuff without resorting to that.

Brian

Greg McKaskle 30-08-2012 07:56

Re: Team 548 Einstein Statement
 
Sorry it took so long to reply. Had to read all the links ...

Backing up a few steps, I think the first thing is to monitor and log what goes on. At an event, things get a bit simpler since FIRST owns the AP and accompanying management SW and also has an Airtight. Channel level monitoring was already common, and I suspect the displays will be enhanced to include bandwidth monitoring per team.

In a build situation, it is more difficult, but if the team AP on the robot is instrumented, that info can be used when not on the field.

is there a TCP bottleneck problem? It doesnt seem common. Perhaps with better measures we will know for sure. Until then, if it isn't broke, let's not fix it.

For QOS, FIRST is still looking at options, and allowing experts to help with the selection. I'll try to get the experts to include your input.

Greg McKaskle

techhelpbb 30-08-2012 12:02

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by Greg McKaskle (Post 1183600)
Sorry it took so long to reply. Had to read all the links ...

Backing up a few steps, I think the first thing is to monitor and log what goes on. At an event, things get a bit simpler since FIRST owns the AP and accompanying management SW and also has an Airtight. Channel level monitoring was already common, and I suspect the displays will be enhanced to include bandwidth monitoring per team.

In a build situation, it is more difficult, but if the team AP on the robot is instrumented, that info can be used when not on the field.

I presume you mean with SNMP support on the robot AP and a suitable MIB to identify the OIDs (I know I've been asked about that MIB a few times as people try to work this out). For those reading this that don't know the Object Identifiers (OIDs) are fields in a table of collected status data about the device and the Management Information Bases (MIBs) define what those fields are. I don't think the DAP-1522 supports RMON (custom traps) even if does support SNMP and telnet. On the plus side using SNMP would at least give you some statistical information about what the robot AP can see unless you loose access to that interface during the polling. With the matches being 135 seconds long and some devices having limits as far as the polling speed I would assume that one would try to SNMP poll frequently enough that the inability to do custom internal polling isn't an issue, that short packet losses wouldn't cause there to be no data collected the entire match, and that it doesn't slow the network with it's own resource demands. If you did have a SNMP (RMON if you have it) trap for the OIDs effectively indicating radio link losses you'd only be telling yourself that there's trouble with the radio after the radio link recovers. On the plus side such a delayed trap notification might be better than waiting for a long SNMP external polling timer to expire, on the downside if the SNMP external polling is frequent enough it's probably not vital to waste the robot AP's time doing that. One could expand the existing DS<->Robot communications systems that use UDP to send only the data they want, when they want it, how they want it as well. Looking back at my previous 2 posts above I had suggested collecting basically much of the same data in a custom fashion as well with a script on routers that support that.

Without even modifying the DS software one could already enable the SNMP service in Windows and use one of the many SNMP managers/monitors like Dart.

SNMP is a UDP based protocol on layer 4. Turning this on has the benefit/downside of generating UDP traffic that contains far more data than the DS<->robot UDP packets. Just doing that SNMP polling will help push down the TCP congestion from the TCP sliding windows and window scaling going toward the robot AP from the field side. Just don't poll too much or you'll basically flood the network and the congestion on the robot side going into the robot AP will get worse. Course if you had SNMP to the devices on the robot then the effect of pushing back TCP would extend into the robot because this traffic is bidirectional and it sends back to the field more data than is required to start the process.

Though again: the upside of sending more UDP traffic is that the congestion has less effect on the UDP packets you send (per that link I keep pointing back to). If you send more traffic for the critical UDP functions in the DS<->Robot communications they'll likely get through a congested link more often. One can send more traffic by making the UDP packets larger (say filling them with statistical data...minding the observations about performance and larger UDP packets) or by sending them more often (reduce the 20ms timer between the UDP packets to 5-10ms...you could still use autonomous/teleop modes the same way and still time out enable in 100ms). So while SNMP does offer the same capability this is a distinguishing point to make.

The only additional concerns I would add to that is to make sure someone changes their SNMP community password (string) on the robot AP. On the field it's not a big issue but off the field it could be used to craft a DoS by over-polling.

Additionally on this subject, even if the robot AP doesn't support RMON the Cisco 1252 does (there may be caveats to this support for instance it might not work in the older VxWorks firmwares).

Quote:

Originally Posted by Greg McKaskle (Post 1183600)
is there a TCP bottleneck problem? It doesnt seem common. Perhaps with better measures we will know for sure. Until then, if it isn't broke, let's not fix it.

That's fair enough. I do see a number of people reporting issues with the Smart DashBoard and Webcam performance which are both TCP but there's no absolute proof that TCP congestion caused by issues on the field causes that to happen more on the field than in private environments. Proper data collection would go a very long way to figure this out.

I should also note that reducing the number of packets lost in the radio layer will seriously improve the situation. If the radio link quality improves there will be more immunity to the sort of short jamming interruptions in communication I know are possible and that AirTight won't alert about. From AirTight's perspective a short interruption is not what most people perceive as a denial of service (DoS) attack. Normally if you loose a few packets your web page loads a little slower or your video quality goes down. Most people don't expect to use the full bandwidth of their radio link in 135 second intervals. Even expertly configured radio links loose information, it's not generally a desired trait and as all this shows it can be very complicated to deal with the consequences of loosing that information. So in this regards efforts on the part of FIRST to tighten the detection net for the radio link and to improve the quality (antennas, robot AP placement, etc...) go a long way to throttle back on the unusual and obscure network efforts needed to compensate. It's not like we can ask the robots to not move while we make adjustments during a match. (Attach antenna to arm and make program to find best signal. Robotic rabbit-ear adjuster.)

Obviously with so many TCP/IP implementers using the IETF RFC standards and via that implementing TCP-Reno/New Reno (Microsoft (default in XP TCP-SAck which is very close, Vista and up offer TCP-TSAck as well), Mac OSX (it's a default), Axis, VxWorks, BSD (it's a default), Linux being notable exceptions as they go beyond that with TCP-Cubic) they are probably doing that for a reason. The reason is that in a wide range of bidirectional network traffic carried on wire in a wide variety of circumstances TCP-Reno/New Reno (TCP-Westwood/Westwood+ is just a tweak to TCP-New Reno, TCP-SAck adds selective acknowledge to reduce retransmission, and TSAck uses timestamps (not to be confused with CTCP)) behavior is a good compromise when congestion occurs. The only reason I'm suggesting otherwise is that this is a specific set of circumstances. Personally I dislike when I see people turn this stuff on without a good idea of what it might help and what it might hurt in their specific circumstances. (Once in a while there's a fuss about how DD-WRT, OpenWRT and Tomato advertised TCP-Vegas and how people used it without a clue about it's specifics just cause it was a hot topic).

As a backup for my advocating of TCP-Vegas for this mobile robot application consider this:
VEGAS: Better Performance Than Other TCP Congestion Control Algorithms on MANETs

Quote:

Originally Posted by Greg McKaskle (Post 1183600)
For QOS, FIRST is still looking at options, and allowing experts to help with the selection. I'll try to get the experts to include your input.

I appreciate it.
Sorry about the length of the posts whole lot of detail in a small space.

BTW you might want to show someone this:
Cisco End of Sale / End of Life Announcement - 1250 Series

If FIRST is interested in replacing that Cisco 1252 here's a quick suggestion to consider:

6 individual APs of the same make and model as the robots use (nice and modular).
All the APs (field and robot) running DD-WRT.
Enable SSH to configure them all.
Configure TCP-Vegas across the wireless link between them (pay attention to where the TCP endpoints are).
Tune the queues (they often default to 1000 packets) using floods of UDP and TCP individually and together.
Keep in mind that small or huge queues are not a great idea, especially considering 802.11n packet aggregation.
Try disabling channel bonding (there are 6 robots, 4 channels in 5GHz, reduce the contention).
Consider that disabling channel bonding might impact the queue changes.
Instrument them all with custom code.
Make up for any missing features in the managed switch on the field side.

If someone would like me to demonstrate what I outlined above: it's easy enough so just ask.

Brian

qnetjoe 31-08-2012 13:56

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by techhelpbb (Post 1183633)
BTW you might want to show someone this:
Cisco End of Sale / End of Life Announcement - 1250 Series

If FIRST is interested in replacing that Cisco 1252 here's a quick suggestion to consider:

6 individual APs of the same make and model as the robots use (nice and modular).
All the APs (field and robot) running DD-WRT.
Enable SSH to configure them all.
Configure TCP-Vegas across the wireless link between them (pay attention to where the TCP endpoints are).
Tune the queues (they often default to 1000 packets) using floods of UDP and TCP individually and together.
Keep in mind that small or huge queues are not a great idea, especially considering 802.11n packet aggregation.
Try disabling channel bonding (there are 6 robots, 4 channels in 5GHz, reduce the contention).
Consider that disabling channel bonding might impact the queue changes.
Instrument them all with custom code.
Make up for any missing features in the managed switch on the field side.

If someone would like me to demonstrate what I outlined above: it's easy enough so just ask.

Brian


I just want to caution everyone about going down this road with regards to the wireless system; there are a million tangents that you can take a wireless system design, but you will need to be very methodical and listen to the larger scale requirements. The 2015 Control System RFP is a good read just to understand the larger issues that the field has to face.

Section WRC1 states "Capable of controlling 4 co-located active fields with up to 6 robots on each field.". This means that we can not have 6 independent access points per field because even with 20MHz channel widths there are only 20 non-overlapping channels, plus is the middle of that 5 GHz band are required to use dynamic frequency selection (DFS) and transmit power control (TPC) because that is the same band as weather-radar and military applications. It would be a unfair for any team to be using on the these channels when other teams can use non DFS/TPC channels. There are only 9 non-DFS non-overlapping channels (in the US).

We all have to remember that there is a big difference between a concept/prototype and production. FIRST needs to have a production grade wireless system. 10% of my job is prototyping and the other 90% is taking a prototype and turning into something production grade. I recommend that if you are going to go down this road you a need to have a good model, preferably something based on the OSI model. At the end of the day FIRST can and should only do two things:

* Provide a rock solid production grade media layer (OSI Layers 1-3)
* Provide a method for detecting issues in the host layer (OSI Layers 4-7)

I really think that this thread has moved away from the orginal purpose of 548's Einstein Statement into a topic about wireless design. If this is something that you would like to talk about further can you create a new thread?

techhelpbb 31-08-2012 14:10

Re: Team 548 Einstein Statement
 
Quote:

Originally Posted by qnetjoe (Post 1183805)
I just want to caution everyone about going down this road with regards to the wireless system; there are a million tangents that you can take a wireless system design, but you will need to be very methodical and listen to the larger scale requirements. The 2015 Control System RFP is a good read just to understand the larger issues that the field has to face.

Section WRC1 states "Capable of controlling 4 co-located active fields with up to 6 robots on each field.". This means that we can not have 6 independent access points per field because even with 20MHz channel widths there are only 20 non-overlapping channels, plus is the middle of that 5 GHz band are required to use dynamic frequency selection (DFS) and transmit power control (TPC) because that is the same band as weather-radar and military applications. It would be a unfair for any team to be using on the these channels when other teams can use non DFS/TPC channels. There are only 9 non-DFS non-overlapping channels (in the US).

We all have to remember that there is a big difference between a concept/prototype and production. FIRST needs to have a production grade wireless system. 10% of my job is prototyping and the other 90% is taking a prototype and turning into something production grade. I recommend that if you are going to go down this road you a need to have a good model, preferably something based on the OSI model. At the end of the day FIRST can and should only do two things:

* Provide a rock solid production grade media layer (OSI Layers 1-3)
* Provide a method for detecting issues in the host layer (OSI Layers 4-7)

Fair enough but:

802.11n implements Clear Channel Assessment (CCA) to mitigate busy channels. The robots move so their proximity to the other end of the radio link changes. That could be ignored and be a bad thing or could be exploited for improving things. One could also use more directional antennas to adjust some of this. Not to mention in DD-WRT you can sometimes change the radio output power and sometimes without rebooting the AP (depends on manufacturer). 802.11n uses multipath so if the antenna placements were better perhaps FIRST wouldn't need so much transmit power because of the improvement in the ability to receive. More over, DD-WRT allows you to adjust the threshold for the CCA (assumes device support for this adjustment). This would be handy as well if you know you've got channel overlap and in this case we are lucky enough that we know it might be there.

In the current system 802.11n the maximum bandwidth of 300Mbps (actual throughput will be 60-70% of that) at 5GHz is achieved with radio channel bonding (which I advised to turn off). I should have been more clear above about why there are only 4 300Mbps communications channels available.

The only way you can't have overlap with channel bonding on is if you only use a maximum of 2 radio channels per 4 fields using multiple SSID. Then you have contention at the network level because the radio layer will be time shared between 6 robots. This is the tradeoff FIRST made already but all radio configurations were available to them as they control both ends during a match. The layer 3 and 4 network traffic beyond the UDP DS<->Robot is beyond FIRST's control to a much greater degree. Field side QoS will help to a point. Robot side QoS will just restrict what can be sent and when but in a stand alone environment things might be very different so how is anyone to test? I would think that someone could design their robot to be much less fair to the others using that contention for the radio resources with just multiple SSIDs on a dual channel radio layer especially as it is now (even if there are VLAN bandwidth limits). Even with multipath in the current environment with the robot APs as they are (badly placed) there's risk for hidden nodes (one robot checks to see if another is transmitting and doesn't get a clear reception so it transmits at the same time causing a collision).

Also if so much as one additional network is created that uses channel bonding you have overlap. Never mind adjacent channel interference which you will have if you use 8 of 9 radio channels. 802.11ac with quad radio channel bonding isn't going to improve this situation either. Something running 802.11ac as a hidden node and ignoring 'good neighbor' because it can't receive the transmit from a moving robot would be a real pain.

In that regard:

What happens when you use 802.11n radio channels next to each other in the radio spectrum and physically too close together:
Reinvestigating Channel Orthogonality - Adjacent Channel Interference in IEEE 802.11n Networks

How close together is close, what is the effect on UDP (important for DS<->Robot), and what happens if you turn down
the radio output power (the results of this can be used to mitigate that link at the top of this list):
Understanding the Effects of Output Power Settings When Evaluating 802.11n Reference Designs

Why channel bonding is not always the best idea:
The Impact of Channel Bonding on 802.11n Network Management

One can mitigate the issue of proximity to a wireless radio and overlapping channels (not to mention adjacent channels) with nearby similar networks (well within the distances FIRST is subject to) by reducing the output power of the radios (manually, by script, or frequently by code all of which are options with DD-WRT). It works in 802.11n on 5GHz and if you start reading this attached thesis from Rutgers you'll save me a lot of time typing because all the justification for my statement is basically there (start on page 45 to save yourself time). If one looks at the graphs they'll see that in the tests the writer saved a fair amount of power (always handy for a battery powered robot) and still maintained radio throughput with UDP traffic (the traffic that FIRST's DS<->Robot communications depends upon) but the effective range of the communications was reduced (handy if you have fields near each other). So by extension of this information, in an environment that is not adapting my point is: with proper antenna placement one should be able to reduce the radio power and address the concerns you've presented (in fact it's highly probable with such controls you could have even more than 4 fields). After all, even with these channel limits the reason these devices sell as they do is that the signal doesn't extend such great distance that you couldn't litter these units in nearby homes and not even notice them from the perspective of one home to the next.

Adaptive Transmit Power Control Based on Signal Strength and Frame Loss Measurements For WLANs

Which is worse? What I suggested isn't all that hard to test.

Additionally, there's nothing stopping anyone from using multiple SSID on a single device with DD-WRT either. One could work up a finger print of robot bandwidth and pair teams or alliances off to fewer than 6 field APs based on that metric. In fact such a automatically field tested bandwidth requirement fingerprint might be handy for a bunch of reasons beyond it's value for that (instead of leaving everyone looking at each other...just push the report to their driver's station for later review).

Quote:

I really think that this thread has moved away from the orginal purpose of 548's Einstein Statement into a topic about wireless design. If this is something that you would like to talk about further can you create a new thread?
I'm fine with that as well. I'm even open to this conversation in private.

I just want to add. As of this post we are now discussing exactly the sort of balance I wrote to FIRST about already in private. So, while this has ventured into great detail and engineering matters. It did not venture away from the issue of what happened on Einstein or for that matter what *may* have happened elsewhere. I hope that if someone has issues with my points or my point of view they'll discuss it with me. It's better to be respectfully challenged than to be above all challenges. Also my apologies for the crazy way I've had to edit all this. I did not intend originally to have to present a formal thesis of my own and it took some time for me to adjust the presentation of my ideas. I am still quite happy to demonstrate that this works and I'm going to leave this topic at this point. Thanks for your time.


All times are GMT -5. The time now is 21:36.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi