Quote:
Originally Posted by Greg McKaskle
A couple things that may help with reading the logs.
1. A spike at the beginning of auto is an glitch in the measurement. The DS was modified to zero the lost packets when the match begins. As a result, the delta loss reported into the logs will sometimes show a high spike at the very beginning of auto. You cannot trust this point.
2. If the communication with the robot fails, the voltage will disappear, and other lines such as CPU should be flat and cached. Lost packets and trip time will often look high for a time when a drop occurs.
3. The lost packets numbers is typically "out of 25". So 10 lost packets over half second means 10 out of 25 were lost. If communications is lost entirely, the timeout is 1 second, so the number may go above 25 and be "out of 50".
4. When packets are lost, it is common for CPU to drop. The code for tele and disabled are typically waiting for new DS packet, so less to do when less packets comes in.
5. If the DS line suddenly shows a disabled line, but the robot line seems to ignore it, this indicates a watchdog due to communications drop. This means the robot outputs were disabled due to more than 5 consecutive packets not arriving. Note that it is possible for occasional packet arrival to keep the robot enabled, and this is typical.
6. If you see periodic spikes in the trip time and/or the lost packets, this may very well be due to the DS. If you observe these when cabled, that is supporting evidence that something on the DS is a likely cause. Opening the Task Manager may show a process bumping to the top in time with the blip.
7. Trip time doesn't necessarily mean latency. The trip time includes the trip to the robot and back again. If delayed to the robot, it impacts driving. If delayed on the robot or on the return trip, or often within the DS laptop, the trip delay does't imply that the robot is hard to drive due to lag.
8. Similarly, lost packets may make it to the robot and keep the watchdog alive and be lost on the return trip.
Greg McKaskle
|
Greg,
I am one of the mentors for Team 1280. Thank you for the guidance on reading the log files you provided. It is helpful to know what is normal.
I've looked carefully at our log files for both our successful matches and our matches where we stopped running. Unfortunately there is no discernible pattern in CPU, dropped packets, or trip time that correlates to when we lose the ability to control our robot on the field.
We do have voltage and CPU readings in all our log files for the duration of the run. It makes sense that if we are getting these readings, we do still have communication with our robot even if we cannot control the robot from the driver station.
For the line on the log files that shows the Robot mode, what does this really tell us? When we run successful matches, the lines that make up the Robot mode clearly show the transition from disabled to autonomous to a short disabled to teleoperated then back to disabled. Occasionally we see a little blip of disabled in the middle of teleoperated but not often. While we see the expected CPU drop when this blip occurs, the robot picks up where it left off once robot mode returns to teleoperated mode. The driver station mode line is always complete.
For our unsuccessful matches, the robot mode line ends abruptly in whatever mode it was in at the time of the failure. We had robot failures in the middle of autonomous, at the beginning of autonomous, in the middle of teleoperated and at the beginning of teleoperated. I'd love to know if you think this is just a symptom or reflects a problem with the processing.
We saw very similar behavior in our robot during practice matches when we were communicating between the driver station wirelessly and when we had the radio configured to automatically select the channel on which it would communicate. The main cause seemed to be interference in the wireless network. When we analyzed the channels being used by the school network's access points, and then switched to a free channel that had no access points, we had no further problems in our practice matches. We also saw similar behavior when we were trying to write too many Log() commands to the SmartDashboard. Again when we reduced the number of SmartDashboard Log() commands, the problem cleared up.
For competition, we are displaying very limited data on the SmartDashboard. For San Jose we are going to have a version of our code where we take out all SmartDashboard logic.
For competition we also reset the radio to automatically select the channel again, since this is how the instructions indicate the radio should be configured.
It feels to us that the more crowded the venue became with spectators, the more likely we were to have a problem with our robot operating properly. Thursday we ran in many practice matches with no failures. Friday we failed 3 of 7 matches. Saturday morning we failed 3 of 4 matches. We declined to participate in the elimination rounds because we were not performing consistently.
I'm attaching a document that has screen print of the log viewer files for our failed matches, as well as the log files. For the screen prints, I expanded the display so you can more clearly see where the robot mode line ends in relation to any dropped packets and drop in CPU.
We are planning to work on the field with Jakub Fiedorowicz tomorrow morning in San Jose before the practice matches begin to see what more we can discern about our failures.