DS reliability

I’m doing some tests on the reliability of the Driver Station. The first one logs the timing jitter of the DS packets.
Jitter is the uncontrolled variation in a timing source. The Driver Station sends a packet every 40ms. If it did this with complete reliability, it would have zero jitter. Here’s a graph of how often the robot receives a packet from the DS.

This test took place over about 5 hours and 30 minutes.

The VI I’m using is a modified Robot Main, which you should be able to pull into any normal LabVIEW FRC Robot Project and run. It logs to a file called “jitter.txt” in the main directory of the cRIO. You can look at the data manually, or you can use the attached “extract jitter data.vi” to make a graph like above.

The reason for testing this with a direct ethernet connection is to use as a baseline for a similar test done over wireless, to then test the reliability of the wireless connection. Many of us have seen robots stop dead on the field, so I’m initiating some testing to aid in troubleshooting and diagnostics. The issue is commonly blamed on the placement of the robot radio or the quality of ethernet cables, but I’ve seen no testing to back this up.
Here’s a general idea of what I’m trying to answer:

The attachment is missing.

It would be interesting to see the same graph with a kwikbyte DS, as well as the DS software run on a more powerful computer.

What’s also missing (to completely figure out the whole path) is what the FMS does in error conditions. I’m not sure that we’d ever be able to figure that out.

Sorry about the attachements. (I was sure I uploaded them!)

You’re right, the Field Management System is another story. It might be possible to do an approximation with FMS-lite… but that’s not the competition FMS.

For the moment, I’m going to stick to the current control system. Whether the Classmate PC performs better than the kiwkbyte isn’t as important as whether the Clasmate PC provides a reliable method of control over the robot.

Are you displaying the delta time between processed packets in RobotMain?

Like Joe, I’m curious what is running on the Classmate, perhaps what the power and wifi settings are.

I instrumented the code last year to get a sense of the latency and dropped packets, but the code was removed before shipping. The reason for the DS messages about watchdog counts was to detect and report short WD glitches since lost packets and/or latent packets will after enough time result in a system watchdog. This wasn’t as informative as heavily instrumented framework, but allowed for the most obvious symptom of latent packets to be noticed and hopefully a cause and effect established.

You may also want to note the CPU usage of the Classmate and the DS mode. When in disabled mode, the DS does extra work to enumerate joysticks and detect changes, and if no estop button is present, it is also looking for the button. When enabled, these tasks no longer take place, and you should see an effect in the timings or in the CPU load. There is also a difference if the Cypress board is attached.

Please post with questions or observations.
Greg McKaskle

Yes, this is the time between iterations on the main loop in Robot Main.

I’m running in driver mode, but I left the wireless on (it is on by default when the laptop boots up).
For the first couple minutes, it is in Teleop disabled, but the remainder is Teleop enabled.
Stop button is connected, although the PSoC is not.

I seem to be getting a lot of errors about the raw DS IO, though. When I had 8MB of jitter log, there were 86MB of WPI errors. (I’m not sure how that fits 64MB of flash memory) I’m guessing this could affect the performance of the cRIO.

See this post about error logging: http://www.chiefdelphi.com/forums/showthread.php?t=83492

Well, it looks like it is normal behavior, so I’ll let it be. Many teams didn’t have their cypress board hooked up during competition.

If your interest is in inestigating what was occurring on people’s robots, then the data plot is a reasonable representation. I don’t know how applicable the five hours of data is, but you are seeing some interesting patterns that it would be nice to understand. I’d suggest adding some statistics that are updated for each five minute window of data, and adding some instrumentation to the Start Communications VI to distinguish between latent packets and lost packets.

I can already tell you that some of the nondetermininism is due to how the call library nodes are configured on the diagram of Start Comms. It was decided that it wasn’t worth a midseason patch to improve, but the edits are simple and safe if you’d like to try it at some point. PM me for more details.

Greg McKaskle

Okay, I have some better statistics now, and a more applicable recording period. I’ve been using the “practice” option on the Driver Station with fairly normal options. (0s countdown, 15s auto, 2s disabled, 100s Teleop, 20s endgame)
For the visualization, I used an advanced histogram, with bins in miliseconds from 0 to 29, 30 to 49, 50 to 199, 200 to 499, and 500 to 1500. The count for each bin is on a logarithmic scale so the lower values can be seen.
Here’s the results for wireless:

And straight ethernet (using a wireless router as a bridge)

i’ve attached my files, but I recommend you back up your DriverStation.lvlib before opening the project. I believe having identically-named VIs in a project and the LabVIEW vi.lib can really mess up your other projects, so I would recommend zipping your current DriverStation.lvlib and then replacing it with the copy I’ve provided.

Anyways, I use a queue record the packet index and the ms timer, but everything else is processed in Finish.vi when the program ends.

DS Reliability.zip (277 KB)


DS Reliability.zip (277 KB)

Are there any ideas as to why half the Driver Station Packets are dropped?

These are some very interesting results. It’s interesting to see that there seems to be more then pure entropy playing a factor in the time-to-arrive measurements you have here.

It will be curious to see if this improves/becomes more reliable in the 2011 control system code if that is indeed a factor.

(Sorry, I know, oldish thread)

Another thing I wonder about is the difference between the SOHO wireless routers that they provide to us and the enterprise class wireless access point that they are using with the field.

If I also recall when on the field each end of the field where the drivers stations are located has a network switch and then a single connection is fed back from there to the field table. While minimal this probably also has the ability to add additional factors that should be considered into the mix.

It’d be nice to be able to do tests with an actual field.
Unfortunately, I can’t.

I do have results for the Beta driver station this year. I’m double-checking to see that I can share those.

Here’s a screenshot of the test for the new driver station. The code is the same as last time.

There’s still a packet that took longer than 500ms (though my cutoff should now be 100ms due to the new safety VIs.)

Overall, however, this is a huge improvement! Packet loss is 3% down from 50%. Most packets arrive in 20ms.

Note that I’m not running this on the classmate PC right now (I’ve returned it to the team), but I think the results are comparable.

There are a lot of things that can change when switching computers. Ethernet card, firewall, processor speed, background tasks, and operating system could all have an effect on the timing of the DS packets. For the test to be valid, you should run both versions of DS software on the same computer and compare the results. I assume it wouldn’t be too hard for you to run the old DS software on your current computer.

That’s a good idea.
I’ll try to get to that on Sunday.

Joe, you’re very right. I just did a test of the old driver station:

(I apologize, I forgot to show the histogram last time)

Do you have any suggestions for how I can have two versions of the DS installed on the same computer? This uninstalling and reinstalling is cumbersome for what should be a quick test.

EDIT:
Something seems funny about this data. The test took 2 minutes, not 2 hours.

EDIT2:
The data was correct, but my time elapsed calculation was not.
Here’s a screenshot of another test of the same thing:

Just install the old version, save off the driver station .exe as a different name, then install the new version. You can then just run either one (one at a time).

-Joe