I didn’t write the article on ni.com, but perhaps I can help decode it.
My Take-away: If you open a TCP or UDP port on vxworks and someone writes to it, your program should read from it or it will eventually fill the communications buffer and interfere with communication even on other ports and protocols. This is not true on other OSes LV runs on and not true of other VxWorks versions, but is true of the current version of VxWorks that NI supports on the cRIO.
It sounds like the original bug report may have involved an unexpected arrival of datagrams. The author opened the port to use for outgoing datagrams. The author did not expect it to receive datagrams, never read from the port, and discovered that this would eventually lead to the symptoms listed. The suggested workaround in the article is to read from the port or close and reopen to flush unexpected datagrams even on ports you assume to be write-only.
Since FIRST robots are generally on a controlled network, I don’t think the suggestion is necessary. In reference to Einstein, what took place a few years ago involved data from a coprocessor intentionally writing to a UDP port on the cRIO. The thread responsible for reading from it was sometimes spinning, waiting for a sensor value to stabilize. The unattended UDP port filled the buffer and prevented communication on other ports that would have allowed communication to the cRIO – including the ability to reboot the cRIO. There is of course no way to know that this was exactly what took place on Einstein on that particular robot. But the code would loop indefinitely with a bad or disconnected sensor. It fit the symptoms, and was determined to be the most likely explanation for what was observed on that particular robot.
To the original topic, the original SD protocol was even more complex and was quite difficult to implement. in fact, I decided not to release the LV implementation because I wasn’t comfortable with its reliability. The next year, we removed a number of features, simplifying the implementation, and released all three languages. SD offers an alternative to sockets or TCP/UDP. Teams may choose any of these forms of communication on open ports, and since port 80 is open, they could use other forms such as web services.
The issue that affected the field last year in week one was caused by a flood of tiny single byte TCP packets in the C++ implementation. The short-term solution was to allow the OS to buffer the writes using the Nagle algorithm. I don’t know if this is still enabled or if the writes were refactored to transmit larger transaction buffers the way the LV implementation does.
I was in San Antonio this weekend, and we saw lockup issues with one C++ team making heavy use of SD and a Java DB. The team chose to disable SD usage and their symptoms seem to have disappeared. Plenty of other teams use SD in C++, Java, and LV in various DB combinations. I’m not aware of other lockup reports from San Antonio. This will be investigated further. I’m sure Brad and the WPI folks appreciate the help with the C++ implementation.
Greg McKaskle