|
|
|
![]() |
|
|||||||
|
||||||||
![]() |
|
|
Thread Tools | Rate Thread | Display Modes |
|
|
|
#1
|
||||
|
||||
|
Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
A bug report has been filed on the WPILib tracker here: http://firstforge.wpi.edu/sf/go/artf1719
The TL;DR is basically this: if you are connected via SmartDashboard, and you disconnect the connected computer's wireless while the robot is writing a value to the SmartDashboard, the robot may hang until the write times out.. which can be a few minutes. I'm working on identifying a good fix, but I fear the best way to fix it is to use non-blocking I/O... which would be a rather large rewrite. |
|
#2
|
||||||
|
||||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
I think we've probably seen this. In the pits, we often have the driver station connected to the robot, with a LabVIEW dashboard using SmartDashboard/NetworkTables. When there is a programming change, the programmers plug in a second computer and run SmartDashboard. As we heavily use the preferences class and other uploaded files, we aren't often rebooting the robot. Sometimes when we disconnect the programming computer ethernet cable, we see a hang in NetworkTables on the driver station.
We use Java on the robot, but I assume the implementation is similar to C++. |
|
#3
|
||||
|
||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
Good to know. Does your robot stop responding also, or is it just SmartDashboard that stops?
In the last few years, we've definitely had a robot exhibit odd behaviors where it isn't responding to controls, but we've never directly been able to associate it with NetworkTables until now. We've always heavily used NetworkTables, particularly last year. |
|
#4
|
||||||
|
||||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
It was just SmartDashboard that stopped, except that we needed data from SmartDashboard to function properly. We've been trying to remove those constraints from our code this year, wherever possible.
|
|
#5
|
||||
|
||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
Interesting. I've been moving more towards putting robot control in the SmartDashboard (well, a custom UI using NetworkTables), because there's a lot you can do with a touchscreen -- in particular, implementing toggle buttons using a UI is much easier than wiring up toggle buttons to attach to the DS I/O. Once the bugs in NetworkTables get ironed out, it should be a pretty good solution, and worked pretty well for us last year despite the bugs.
|
|
#6
|
||||
|
||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
Quote:
|
|
#7
|
||||
|
||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
We punted the network tables stuff early in the season and went with straight up UDP sockets. There are a lot of misconceptions in the WPILib code regarding network communications. E.g., if you have a source of UDP packets talking to the cRio, you need to have a consumer running in your teleop disabled code to toss the UDP data away or the bot will hang.
This is because the implementation on WPILib tries to buffer all network traffic and deliver it regardless of whether it should or not. UDP traffic without a listener should just be tossed on the floor according to the specification. But, that's not what WPILib does. In fact, WPILib apparently keeps allocating RAM for the network comms until the bot runs out of memory. Thanks goodness this isn't a safety critical application. So, if it's possible for you, drop back to good old UDP sockets (not TCP as they require a connection be maintained). Just remember to create a thread on the cRio to run and read/throw away the packets if you're not in an operational mode). HTH, Mike |
|
#8
|
|||
|
|||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
Quote:
|
|
#9
|
||||
|
||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
Hmm... Perhaps, but I've been using VxWorks for 25 years and never had a problem with UDP traffic before. Of course, I wouldn't rule out that something that NI added has changed the network implementation. It's a moot point at this juncture as next year's control system is embedded Linux with the PREEMPT_RT patch in place. It will be a completely different beast.
Mike |
|
#10
|
||||
|
||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
Quote:
However, despite all those problems, I *do* really like the idea of being able to use SmartDashboard, and I really like the simple API that is exposed on the robot. I'm loathe to reimplement SmartDashboard and NetworkTables itself, and my hope is that they'll fix up the implementations for next year -- so until then, I'll keep patching it for the python interpreter PS: In case you're interested, I found another obscure bug in NetworkTables tonight, that causes buffer overflows on my linux box. If you've ever wondered why you see gibberish in Netconsole when a NetworkTables client disconnects, I found out why. |
|
#11
|
||||
|
||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
For those interested, I've posted an updated patch to the bug report. Without completely rewriting NetworkTables, I think the best solution (in addition to the previous fix) is to make the sockets non-blocking, and use select with a 1-second timeout on writes.
My thought is that anything that blocks a write for more than a second is going to be useless anyways, and NetworkTables has provisions for reconnecting when the connection dies. Better than hanging permanently. If anyone has feedback on the patch, I'd welcome it. Our team successfully used the first part of the patch without issues in a week zero event, but we don't have competition until mid-March, so I won't have any hard testing of the patch until then. However, I've tested it extensively on Linux/Windows, and on a cRio-II that was disconnected from actual robot hardware. |
|
#12
|
||||
|
||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
Quote:
[ancedote] Before last season I was helping the author clean it up and finish it so it could ship for the 2013 season. twards the end of november I looked at the code and send him a laundry list of suggestions, and among the responses were "Yes I created this in java. I did it in c++ to mirror the java api" and "Yah I know as I said I wasn't the right person to do this". As it was already the end of november, re-writing it was out of the question so I attempted to clean it up a bit. I managed to clean a few things up, like removing the custom UTF16 string class among other things. Then I moved over to making SFX, so never got a change to clean it up. I was hoping to with the C++11 project, but SFX took over my time, and there are not many good C++ devs. Sigh... [/ancedote] Anyway, I will attempt to move these patches along, though no guarentees. |
|
#13
|
||||
|
||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
Quote:
|
|
#14
|
||||
|
||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
Quote:
Code:
write error: : read error: : S_errno_EPIPE S_errno_ETIMEDOUT [[NNTT]] IIOOEExxcceeppttiioonn mmeessssaaggee:: CEorurlodr noont FwDrIiOt er eaaldl [[bNyTt]e s0 xt2o0 ef8dd 1s8t reenatme r[eNdT ]c o0nxn2e8cet8ido1n8 setnatteer:e dS EcRoVnEnRe_cEtRiRoOnR state: SERVER_ERROR [NT] Close: 0x28e8d18 |
|
#15
|
||||
|
||||
|
Re: Serious bug identified in SmartDashboard/NetworkTables -- robot hangs
Quote:
|
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|