Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   General Forum (http://www.chiefdelphi.com/forums/forumdisplay.php?f=16)
-   -   2012 Field Comm. Issue Logs (http://www.chiefdelphi.com/forums/showthread.php?t=104860)

Greg McKaskle 28-03-2012 06:50

Re: 2012 Field Comm. Issue Logs
 
If it can play video games, it is probably realtime enough to be a DS. The rate of both is defined by human perception and reaction time.

As image processing and other code written by teams migrates to the DS laptop, it will impact the comms. Throw in a Kinect or two, and ...

But at this point, I think some teams have overloaded a component of their robot -- the DS laptop -- not unlike how they are capable of overloading a given circuit or mechanical component. The Task manager alone will give good enough feedback to learn about and avoid the issue.

Greg McKaskle

dkearle 28-03-2012 23:58

Re: 2012 Field Comm. Issue Logs
 
2 Attachment(s)
Quote:

Originally Posted by Greg McKaskle (Post 1150440)
A couple things that may help with reading the logs.

1. A spike at the beginning of auto is an glitch in the measurement. The DS was modified to zero the lost packets when the match begins. As a result, the delta loss reported into the logs will sometimes show a high spike at the very beginning of auto. You cannot trust this point.

2. If the communication with the robot fails, the voltage will disappear, and other lines such as CPU should be flat and cached. Lost packets and trip time will often look high for a time when a drop occurs.

3. The lost packets numbers is typically "out of 25". So 10 lost packets over half second means 10 out of 25 were lost. If communications is lost entirely, the timeout is 1 second, so the number may go above 25 and be "out of 50".

4. When packets are lost, it is common for CPU to drop. The code for tele and disabled are typically waiting for new DS packet, so less to do when less packets comes in.

5. If the DS line suddenly shows a disabled line, but the robot line seems to ignore it, this indicates a watchdog due to communications drop. This means the robot outputs were disabled due to more than 5 consecutive packets not arriving. Note that it is possible for occasional packet arrival to keep the robot enabled, and this is typical.

6. If you see periodic spikes in the trip time and/or the lost packets, this may very well be due to the DS. If you observe these when cabled, that is supporting evidence that something on the DS is a likely cause. Opening the Task Manager may show a process bumping to the top in time with the blip.

7. Trip time doesn't necessarily mean latency. The trip time includes the trip to the robot and back again. If delayed to the robot, it impacts driving. If delayed on the robot or on the return trip, or often within the DS laptop, the trip delay does't imply that the robot is hard to drive due to lag.

8. Similarly, lost packets may make it to the robot and keep the watchdog alive and be lost on the return trip.

Greg McKaskle

Greg,

I am one of the mentors for Team 1280. Thank you for the guidance on reading the log files you provided. It is helpful to know what is normal.

I've looked carefully at our log files for both our successful matches and our matches where we stopped running. Unfortunately there is no discernible pattern in CPU, dropped packets, or trip time that correlates to when we lose the ability to control our robot on the field.

We do have voltage and CPU readings in all our log files for the duration of the run. It makes sense that if we are getting these readings, we do still have communication with our robot even if we cannot control the robot from the driver station.

For the line on the log files that shows the Robot mode, what does this really tell us? When we run successful matches, the lines that make up the Robot mode clearly show the transition from disabled to autonomous to a short disabled to teleoperated then back to disabled. Occasionally we see a little blip of disabled in the middle of teleoperated but not often. While we see the expected CPU drop when this blip occurs, the robot picks up where it left off once robot mode returns to teleoperated mode. The driver station mode line is always complete.

For our unsuccessful matches, the robot mode line ends abruptly in whatever mode it was in at the time of the failure. We had robot failures in the middle of autonomous, at the beginning of autonomous, in the middle of teleoperated and at the beginning of teleoperated. I'd love to know if you think this is just a symptom or reflects a problem with the processing.

We saw very similar behavior in our robot during practice matches when we were communicating between the driver station wirelessly and when we had the radio configured to automatically select the channel on which it would communicate. The main cause seemed to be interference in the wireless network. When we analyzed the channels being used by the school network's access points, and then switched to a free channel that had no access points, we had no further problems in our practice matches. We also saw similar behavior when we were trying to write too many Log() commands to the SmartDashboard. Again when we reduced the number of SmartDashboard Log() commands, the problem cleared up.

For competition, we are displaying very limited data on the SmartDashboard. For San Jose we are going to have a version of our code where we take out all SmartDashboard logic.

For competition we also reset the radio to automatically select the channel again, since this is how the instructions indicate the radio should be configured.

It feels to us that the more crowded the venue became with spectators, the more likely we were to have a problem with our robot operating properly. Thursday we ran in many practice matches with no failures. Friday we failed 3 of 7 matches. Saturday morning we failed 3 of 4 matches. We declined to participate in the elimination rounds because we were not performing consistently.

I'm attaching a document that has screen print of the log viewer files for our failed matches, as well as the log files. For the screen prints, I expanded the display so you can more clearly see where the robot mode line ends in relation to any dropped packets and drop in CPU.

We are planning to work on the field with Jakub Fiedorowicz tomorrow morning in San Jose before the practice matches begin to see what more we can discern about our failures.

slijin 29-03-2012 01:11

Re: 2012 Field Comm. Issue Logs
 
Quote:

Originally Posted by slijin (Post 1148734)
I can say with certainty that we ran into lag issues with computer CPU consumption by the SmartDashboard; keeping it closed solved apparent lag problems, although the NYC FTA (Mark Mcleod) noted that we retained high trip times (~175 ms, as opposed to a standard 5-6 ms) throughout the entire event.
  • Which event? NYC
  • Wireless bridge radio HW revision? (A1, A2, or B1) A1
  • Radio firmware rev? 1.21
  • Programming language? Java
  • Using a dashboard app? n/a
  • Using vision with Axis Camera and cRIO processing? Ran a live stream with no processing at default fps
  • Using vision with driver station processing? DS display; nothing else.
  • Did you have the radio mounted near motors/large metal structure? Near two Black Jaguars, above the 12V/5V converter*.
  • Using classmate as driver station? 2go E11 Classmate, not the KoP one
  • 4 or 8 slot cRIO? (FRC-CRIO2 or Old version) 4-slot
  • CAN Jaguars? What FW version? Yes, rev99.

*This issue is something I realized post-NY. As Alan has pointed out, this could possibly produce issues; it will be fixed when we go to CT.

Quote:

Originally Posted by Joe Ross (Post 1149601)
While probably not the cause of your issues, [R61] requires version 99 or higher. The current version is 101.

Oops, I made a mistake there. That should be rev99.

Thanks for pointing that out.

plnyyanks 29-03-2012 21:34

Re: 2012 Field Comm. Issue Logs
 
1 Attachment(s)
At CT today, things seemed pretty good. I don't think I saw too many connection problems (but then again, I wasn't hanging around the field much). For us today, we were only dropped by the field in one early match this morning. I had a long talk with the FTA who was pretty helpful, and we decided it was an isolated indecent - we made it the rest of the day with no issues at all, so I think we'll be okay. But I'll fill this out and attach a screenshot of the log viewer just for the sake of another data point.

- Which event? CT
- Wireless bridge radio HW revision? (A1, A2, or B1) A1
- Radio firmware rev? (1.21, 1.4, or maybe something else) 1.21
- Programming language? LabVIEW
- Using a dashboard app? Labview, SmartDashboard, or custom? custom LV dash
- Using vision with Axis Camera and cRIO processing? yes
- Using vision with driver station processing? no
- Did you have the radio mounted near motors/large metal structure? no
- Using classmate (or similarly slow computer) as driver station? yes
- 4 or 8 slot cRIO? (FRC-CRIO2 or Old version) 4 slot
- CAN Jaguars? What FW version? no

Alan Anderson 29-03-2012 23:29

Re: 2012 Field Comm. Issue Logs
 
Quote:

Originally Posted by waialua359 (Post 1148673)
Alan,
the only puzzling part here, is that even after switching to the 1.21 as you noted BEFORE eliminations, the problem still existed for teams AND in matches that we werent even in.

One other team at the Hawaii Regional was identified as having 1.4, but not until teams were preparing to pack up at the end of the event. I wasn't aware at the time that the WPA programming kiosk kept records of D-Link firmware revisions. Looking at its logs would have been a much quicker way to find the potentially problematic 'bots.

The larger issue is still mysterious, but it does seem to involve the order connections are established.

Blackphantom91 30-03-2012 15:19

Re: 2012 Field Comm. Issue Logs
 
Durring elemination matches we ad 1985 connected last and we didn't have problems. the order was really prevalent or when they did a reset there were no problems. How do teams correct the problem if its all about revisions though? We could go out to the field and it could all be messed up at champs? hope we can get to the bottom of this.

catsylve 30-03-2012 18:06

Re: 2012 Field Comm. Issue Logs
 
During the St. Louis Regional, 1985 experienced what seemed to be connection issues throughout the entire first part of the competition. Those issues turned out to be a hardware problem. We went through the process of replacing our classmate first, then our entire power distribution block. When these did not fix it, we had to replace the CRio with a replacement, although the one we had been using was a brand new one. This completely fixed our problems and we returned the CRio to NI and they replaced it still under warranty. It made for an incredibly frustrating day, but literally our connection issues turned out to be from a bad part. All of this did happen after the firmware issues with the radio were corrected.

dkearle 31-03-2012 01:09

Re: 2012 Field Comm. Issue Logs
 
We (Team 1280) had another similar crash in one of our matches at Silicon Valley today. Back in the pit we crashed again, this time with NetConsole running and were able to capture the following exception and error messages on NetConsole:

Quote:

Exception current instruction address: 0x02beb868
Machine Status Register: 0x0008b012
Condition Register: 0x20000084
Task: 0x2b30980 "FRC_NetworkTablesWriteTask"
0x2b30980 (FRC_NetworkTablesWriteTask): task 0x2b30980 has had a failure and has been stopped.
0x2b30980 (FRC_NetworkTablesWriteTask): fatal kernel task-level exception!
task 0x2b247e0 (FRC_NetworkTablesWatchdogTask) deleted: errno=0 (0) status=0 (0)

>>>>ERROR: A timeout has been exceeded: NetworkTables watchdog expired... disconnecting ...in WatchdogTaskRun() in C:/WindRiver/workspace/WPILib/NetworkTables/Connection.cpp at line 565

>>>>ERROR: A timeout has been exceeded: NetworkTables watchdog expired... disconnecting ...in WatchdogTaskRun() in C:/WindRiver/workspace/WPILib/NetworkTables/Connection.cpp at line 565

>>>>ERROR: A timeout has been exceeded: NetworkTables watchdog expired... disconnecting ...in WatchdogTaskRun() in C:/WindRiver/workspace/WPILib/NetworkTables/Connection.cpp at line 565
task 0x259d568 (FTP Server Connection Thread) deleted: errno=0 (0) status=0 (0)
?
We use SmartDashboard to display some data from our robot using the Log() method. We do not access the NetworkTables class independent from SmartDashboard..

Since we are not able to run NetConsole during matches, we have no idea if we get this same exception every time we crash, but we'll keep looking for it when we are tethered in the pit. Because of this error, we have removed our use of SmartDashboard from our robot code.

Gray Adams 31-03-2012 02:23

Re: 2012 Field Comm. Issue Logs
 
I don't have many details right now, but we had to sit out 5 matches at SVR today. I'll try to post back with some details tomorrow.

Greg McKaskle 31-03-2012 07:05

Re: 2012 Field Comm. Issue Logs
 
Quote:

We (Team 1280) had another ...
I'm glad you were able to capture this. I wasn't able to post last night, but the logs you posted seemed to indicate a crash. If the battery trace doesn't go away, the robot it connected to the field. The CPU goes low because the thread the code was in was terminated. Hopefully you can use practice matches and some code walkthroughs or some clever debugging to track it down.

Greg McKaskle

Tristan Lall 31-03-2012 10:46

Re: 2012 Field Comm. Issue Logs
 
Although probably a different issue than is being experienced by the others, 253 at SVR has been consistently unable to connect to the field. The robot works correctly on the tether, but will not communicate with the FMS.

So far, the cRIO-II has been imaged (v. 43, 2012-01-20 imaging tool) and replaced, code has been recompiled and reloaded, the radio has been reflashed and replaced, and all of the cRIO modules have been either checked or replaced (a sidecar and bad 37-pin cable were replaced). The robot is running C++ code and Jaguars over PWM.

The system has been tested with two driver station computers: a regular laptop and a loaner classmate. Neither works on the field.

I have a couple more wild theories: can anyone sanity-check them?

Firstly, what would happen if the WPA key given in the kiosk and the one from the FMS were inconsistent? Would we see red indications for code and communications on the dashboard?

Secondly, wasn't there something in the LV code (and maybe C++) where you could run code from the laptop, but it wouldn't be installed on the cRIO? I haven't seen that in a while, but maybe the team has an incorrect build option set? (I don't think this is it, though, because the robot flash codes seem to indicate it's running code.)

RufflesRidge 31-03-2012 11:10

Re: 2012 Field Comm. Issue Logs
 
Quote:

Originally Posted by Tristan Lall (Post 1151861)
Firstly, what would happen if the WPA key given in the kiosk and the one from the FMS were inconsistent? Would we see red indications for code and communications on the dashboard?

With an incorrect WPA key, the bridge could never link to the field and you would see red for code and communications and red for Bridge on the diagnostics tab

Quote:

Secondly, wasn't there something in the LV code (and maybe C++) where you could run code from the laptop, but it wouldn't be installed on the cRIO? I haven't seen that in a while, but maybe the team has an incorrect build option set? (I don't think this is it, though, because the robot flash codes seem to indicate it's running code.)
You can do that for both LabVIEW and C++ but in both cases you would see green for the Communication light and red for Robot Code.

Greg McKaskle 31-03-2012 13:01

Re: 2012 Field Comm. Issue Logs
 
Does the robot connect but not stay connected, or does never connect to or through the field? Are the cabled tests in the pits practice matches? what rev radio and what FW version?

Greg Mckaskle

Timz3082 31-03-2012 21:19

Re: 2012 Field Comm. Issue Logs
 
Which event? 10,000 Lakes
Wireless bridge radio HW revision? (A or B) A
Programming language? Labview
Using a dashboard app? Is it SmartDashboard? No
Using vision with Axis Camera and cRIO processing? Passing image, no processing
Using vision with driver station processing? No
Did you have the radio mounted near motors/large metal structure? No
Using classmate (or similarly slow computer) as driver station? Yes

During our last Qualification match, we were all running in the beginning, then two robots on our alliance dropped at the same time. 30 seconds later we dropped connection causing our alliance to loose the match! We talked with the staff and they blamed it on each teams code, which makes no sense since they all were working in every other match without any code change. The languages were different to, one was labview, the other, C++. Is this just a fluke with our robots (only match we dropped to) or would it have been some sort of field problem?

thephpdev 31-03-2012 23:34

Re: 2012 Field Comm. Issue Logs
 
We're not sure what the problem was exactly for our robot during competition (Team 2502) and we didn't get to find the exact reason it didn't work, but we did fix it.

First, the radio was set to auto rather than bridge. For competition it needs to be on bridge, but it seems that for other testing (such as in the pits) the radio should be on auto.

Second, the radio was right next to our drive train motor and clumped around a lot of wires. So we moved that up farther so that it could be seen from over the ramp.

Last change, we made a slight hack to allow faster operator control loops. Which meant only putting drive train in the operator control loop, but creating a new thread to handle everything that would have been in operator control.

Note on the last change, our graph before this change was showing very long loop time (green line) and after the change it decreased drastically (of course). I'm not sure which of these things could be the issue, but I would double check these.


All times are GMT -5. The time now is 22:43.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi