Comments/Complaints on NI control system

Comments/Complaints/Annoyances of NI . I am not trying to complain, but it seriously bugs me when there are 10 minutes to get ready for the next match and I have to boot the robot, tether, download new code, reboot it,test it, and change a battery and possibly bumpers. The NI/WPI code dosen’t totally suck, but it has many issues that make it clear that the author of the code obviously has never been on a pit crew at a FIRST competition. They have no idea what sort of timing pressure we are under, and I am sure they would have designed their software differently if they did.

I. Timing:
A. Boot-up: The old IFI system could boot and establish radio communication un under 5 seconds. The new system takes aproximately 1.5 minutes to establish communication, begin code execution, and be ready to use.
B. Code: The Old IFI+MPLAB system could build,deploy,and reboot in under a minute. The new system takes aproximately 2-3 minutes to do a full build, plus another 2-3 minutes to download, plus must completely reboot, which is another 1.5 minutes.
C. Network delay: The old IFI system had a simple processor on the OI communicating over 900mhz radios directly to another simple processor, with one level of communication via RS422. The new system has a PC running software, communicating via Ethernet to the FMS, which is communicating wirelessly to the robot, which is translating it back to Ethernet, and then unbundling the giant data packet. This leads to much hetwork delay, as the entire alliance station must share a single Ethernet line.

II. Operator Interface:
A. Boot times: The old OI could boot and establish comm in 5 seconds. The new system must do a full boot, plus run several large applications which take a long time to load. If you have to reboot your operator console on the field, you are seriously screwed.
B. Cypress board and related communication: The cypress board runs on 3.3v which is annoying since everything else runs on 5v. In addition, should the Cypress board loose connection, the Classmate will continue to send old data to the robot, making it impossible to detect the loss of the box. In addition to all of this crap, booting up the Classmate while connected to the FMS will lead to a loss of the Cypress board, which is only fixable with TWO reboots of the classmate, meaning teams faced with this must choose between risking it and possibly not playing because it took them too long to setup, or not having their Cypress board. On Friday of our first district we encountered this scenario twice, and went without the Cypress board.
C. FMS Lockout: On the old IFI system, removing the Competition cable would lead to a robot that would be happy and ready for use. Now, after connecting to the FMS, the Classmate must be logged out and logged back in (to re-launch Driver Station) which takes too long.

III. Radio communication:
A. The IFI system had RS422 radios that communicated on 900mhz on a DEDICATED CHANNEL!! The new system shares bandwidth between teams, and we have found the 802.11 communication to be unreliable during matches, although that could be related to all of the FMS problems at Kettering this year. In addition, on loss of the radio, you are screwed for the rest of your match. If the IFI system were to hang and loose comm, it would reboot in 5 seconds and be happy again.

IV. Robot Processor:
A. The old IFI processor had a PIC which was powerful enough to do complex things without using much program space at all. The new system has a giant heavy cRio with a 400mhz PowerPC which has far more power than we could ever possibly need, like trying to gently tap something with a sledgehammer. It’s far too big for its application.
B. Download times: They suck. It takes far too long to download code.
C. Boot time: Too long. The people who wrote the code that runs on boot up obviously have never been on a pit crew and had to reset a robot in only a few minutes.

V. LabVIEW:
A. Probe things while the code is initializing: This causes a LabVIEW crash which is really annoying. Same with downloading before the already-deployed code initializes, it will lock you out and you will have to reboot the robot, which is also really annoying.
B. Build times: Really annoying again. We have encountered many instances where we cannot fix a small code issue because that would require waiting 10 minutes to boot the robot, build, download, and reboot again.

Anything else?

If you’d like to condense your list of complaints by removing duplicates, I’d be happy to comment on them one by one. As it is, I’d end up unduly agreeing or disagreeing with you if I addressed them as they currently stand.

As Alan stated, it is difficult to respond to this post point by point, but I’ll try instead to get us to the productive stage of understanding what is going on and coming up with solutions.

But first, while it is natural and healthy to make comparisons to the previous system(s), let’s use the correct terms. NI donates a few key elements that make up the control system, but there are many additional suppliers and developers involved. It is called the FRC Control system for a reason. Myself and other NI folks are involved in support, but we don’t deserve all the credit, good or bad.

My timings are with the default code, camera to dashboard and vision processing on. Classmate was used as the development laptop.

I.A. 1.5 minutes seems long. While tethered, these are the times I see. At 8 seconds, the cRIO TCP stack is up, ping succeeds, and the yellow LED goes off. At 20 seconds, the DS Connection light goes off, meaning that the communications task is sending back Status packets as a reply to the control packets. At 30 seconds, the Robot Code DS light is good, meaning that the Robot Main is running and communicating with the DS. The odd thing I cannot explain is why the vision to the PC will often take another fifteen seconds or so to go live, but at thirty seconds, the robot can be enabled and I can control a solenoid and other I/O.

I.B. Using the Classmate as the laptop, I get a bit over two minutes to build. I really really wish I could speed this up, and traditional LV builds are really quite fast, but with the introduction of libraries, the build times went way up and haven’t improved yet. In my measurements, it took about 2:40 to build with the Robot Main VI window open, but 2:10 to build with it closed. Also note that the robot isn’t needed for building. If you have known changes, you can build while the robot powers up, while it is coming back from the field, etc. Once the app is built, my deploy takes eighteen seconds. This means that a boot and deploy should take less than a minute, and a reboot and test of the code should take another minute.

I.C. The DS measures the round trip time to send control packets to the robot and get a status return. The robot typically receives the packet about 2 to 4 ms after it is sent. The “giant” data packet sent to the robot is only about 1KB, mostly padded zeroes, as wifi frames have a minimum size that is larger than the packet size. This is the same technology used for Skype and plenty of other near-real-time uses. I’m not sure what speed the 422 ran over the 900MHz modems, but the N speed is likely 1000 times higher bandwidth. If you have actual measurements of latency, please share.

II.A. What is the process you are following? I’d expect that the DS would be in sleep mode which takes about ten seconds to awaken. Yes, rebooting the computer should be avoided when you can.

II.B. This seems to be the biggest issue that you faced. But for what it is worth, the DS doesn’t send data if the FirstTouch board is removed. The WPI robot code that wraps the comms does return back the latest known good values. It would be possible to return timestamps or other status data if this is really necessary. I don’t really know of interference between FMS and Cypress recognition. This hasn’t been reported before, but it is certainly something that we’ll test more thoroughly. The indication of whether the board is plugged in is the LED on the DS I/O button and LEDs on the I/O board itself.

II.C. This feature was requested and implemented in beta to provide additional safety. Since the laptops are self-powered, it helps to prevent a team from being able to drive a robot at the field until the FMS allows it. It does mean Exiting and restarting driver again, but that took 45 seconds in my test?

III. I know that the FMS does quite a bit of monitoring and logging. The 5GHz channel being used for N is not commonly used. As for rebooting during a match and being up again, I’d avoid that unless asked directed to do so by a coach or FTA.

IV.A. This system is indeed far more capable than the previous one and yet, it is one of the smallest and slowest systems NI makes. I’m pretty sure your metaphor could be used for every cell phone, laptop, and electronic device at the event. All of them were once done with less computational power, but does that make them sledgehammers?

IV.B. Since my stopwatch isn’t marked with “They suck”, what numbers are we talking about?

IV.C. Some people have been on pit crews, some haven’t. Can you be more specific about what you’d fix first?

V. If you can describe how to reproduce the probe crash, please post it in the LV section. As for times, I know of some things that annoy me, but not in the 10 minute range. Try to measure your times rather than guess, and exaggerations cause confusion.

Greg McKaskle

At this point, most of my complaints would be pretty nit-picky.

Overall I think the system is very good, and has potential to be great. Having the power pc on board is a big step up for FIRST. In the real world of embedded programming you have to worry a bit more about optimizing for a smaller processor, but I think it’s good that people who are new to programming don’t have to worry about stuff like that anymore.

I also think the LabVIEW option is a big step forward for FIRST. We were able to get an 8th grader to do about 75% of our tele-op software this year. I know that wouldn’t have been likely with C++ or Java (let me clarify this - if you have a kid who’s been programming on his own for years, that’s one thing. But to take a middle-schooler with little to no programming experience and have them learn C++ enough to program the robot during the build season - very unlikely.)

Having said all of that, I think there are some BIG improvements that can be made to the LabVIEW hardware interfaces to make them more user friendly and intuitive. After you understand how the current interface works, it’s pretty user friendly - but when I first saw it I found it to be very counter-intuitive. When I get some time I’ll write some simple VIs to show an example of how I think would make it highly intuitive for first timers.

We use the C++ side of the world and have been happy with download times. We’ve found that if there is running code on the cRIO that flash downloads take a much longer time to download than when there is no code running. It almost seems like whatever is doing the installing on the cRIO has a lower priority than what is already running. Could this be something that would apply to the LabView side too?

Something that I would like to see improved is safety at the driver station. As it currently is, the driver station remembers the teleop/autonomous state between robot resets and between enable/disable cycles. This scares me tremendously for teams that aren’t careful about the way they write autonomous. These apply to runs in the pit and practice field, not necessarily the real field. Here are my concerns (the theme is a robot should not move unless expected to):

  1. If a team doesn’t protect their autonomous code with a flag to tell if it has already run or not, they can accidentally rerun their autonomous if they go from auto-disable -> auto-enable -> auto-disable -> auto-enable. PROPOSED FIX: Always move the mode to teleop when disabling. Yes, if the team wants to run auto again they’ll have to change the mode to autonomous, but I think that’s a fair thing to do to protect people from a runaway robot.

  2. If a team doesn’t protect their autonomous code with some sort of lock in procedure (i.e. a switch that if not set at the OI will cause the robot to not move), they can accidentally run autonomous if they go through this sequence auto-disable -> auto-enable -> auto-disable -> robot reset (say, for a download). After the reset, if they want to run teleop and simply click enable, they will actually be in auto-enable since the mode hasn’t changed. PROPOSED FIX: The driver station already automatically changes from enabled to disabled across a reset/comm loss. It should also force the state back to teleop.

  3. There is no fast way to disable a robot with the Classmate. In years past, we’ve always had an external disable switch that we could physically have our hand on when testing in close quarters. Now, we have to move the mouse to the disable button and click. This reaction time could mean the difference between safety and tragedy. We have implemented an external software disable switch that neutralizes all of our outputs. Once that is switched, we can worry about moving the mouse to the disabled button. Yeah, I know there’s the E-stop button, but I think it’s ridiculous that when pressed, the robot needs to be completely rebooted to regain operation. PROPOSED FIX: Change the way the E-stop works. Rather than completely disable the robot, allow the user to re-enable it at the driver station (a simple disable-estop button would work). This button could be made available only when not connected to the FMS to prevent people from messing with it on the field.

There are far too many teams that veteran teams will be helping in the pits that won’t have any of the three protections above. For our safety and theirs, please do something about this.

All three of these issues seem like they could be easily handled (at least on the surface) in a much welcomed DS update.

A bit of an aside here, since I don’t have comments on the majority of the thread topic…

During our first competition this past weekend, we found this particular feature to be quite intrusive, especially during elimination matches where you can have as little as 6 minutes to remove your robot from the field, reset it to a starting condition, and get it back out there and re-connected to the field.

In our experience 45 seconds probably describes a mean, but we’ve seen upwards of 2 minutes; the DS software has to exit, Windows logs out the Driver user, user logs back in as Driver, then the DS software has to load again, and finally Cypress recognition and USB enumeration are complete.

Perhaps in the future, field management architecture could be revised to place the burden of safety and field control actually on FMS, rather than relying on the DS.

In the short term, seems like having a way to restart the DS without logging out the Windows account and logging it back in again might help shorten the cycle time somewhat.
Or… perhaps having a way to clear FMS locked when Ethernet physical connection (not robot comms, physical connection) is lost for more than a few seconds?

By the way, a team (thanks 397) showed me this weekend that you can enable your robot on a tether even during FMS lockout by using the F1 key. You can then disable with the spacebar. I haven’t tried it yet with our robot, but I saw them enable their robot with the big FMS Lockout on the screen. I asked how they did it, and they told me about the F1/spacebar thing.

And Dave: don’t you guys have the STOP button? Also, the spacebar works as a fast disable, but you have to train yourself to do it so you don’t forget when the important time comes.

That’s fantastic - never thought to try the shortcuts there! Thanks!

My primary concern with the current system is the long reboot time for the DS.

On a couple of occasions a reboot of the DS was necessary during our build. At FLR not being able to reboot or power cycle the DS in a reasonable time caused us to sit dead in the water during Teleop for two practice matches. Auton executed but during teleop the robot did not respond to either gamepad. At the time we assumed the issue was either FMS or a problem with our robot code or possibly CAN. The symptoms were as follows:

  • Auton executed as expected (the robot moved and obtained a ball)
  • The FMS performed its transition to teleop. The robot did not respond to input from either Gamepad during the entire teleop period but all non-gamepad dependent systems performed (compressor tasks).
  • Both Gamepads showed up in the dashboard view.
  • All Jaguar LEDS were solid orange during the teleop period.
  • The 2CAN status LED indicated the Plugin has loaded properly and that Comm with the crio has been established.
  • The robot was rebooted between practice matches but the DS was not. The matches were back to back.

After all practice matches were finished, the FTA (Liz) allowed us and several other teams that were having different issues to connect to the FMS and pinpoint the issue. We regressed our robot code three versions from what was on the robot during the failed practice matches. We performed four tests with the last test using the code that was loaded during the failed matches. All four tests produced the same results.

-The Robot moved in auton and in teleop without issue.

This result led me to conclude that the problem was most likely an issue with USB. We were not able to reproduce this issue after several attempts. The next day (Fri) while on the practice field, the symptoms appeared again. Good com, Good CAN, solid orange LED on the Jaguars, both gamepads showed in the dashboard view and auton executed as expected. We rebooted the DS, re-enabled, and both auton and teleop executed as expected. The DS was not hibernated between matches I know this because we disabled hibernation after experiencing Ethernet issues when waking. What we did as a workaround was as follows.

Before each match the robot was tethered and a full system check was performed as part of our normal pre-match check list. Instead of powering down the DS, we kept it in the EXACT state as when the system check was performed.

I do not know if anyone else has experienced this EXACT issue if they have I would like to know. It would be more suitable for FIRST if when this happens during a match a timely power cycle or re-enumeration of the USB ports was possible. This is one of the many issues that presents when using a DS that requires a bulky OS to operate. The current DS provides some nice features such as camera feedback and a customizable dashboard. I would trade all of these features for robustness and expeditious and deterministic behavior. Camera feedback is nice if you are driving your robot through a building from a remote location or trying to defuse a bomb but I feel it has little value in a FIRST competition. As I tell my drivers, if you are looking at the dashboard during a match you are looking in the wrong place.

Something that I would like to see improved is safety at the driver station…

There are many cases where the robot auto-disables, but avoiding a double auto never came up during the beta. If you feel strongly, please post or discuss directly with FIRST. Clearly safety is a priority, and this is their call. BTW, I don’t get the distinction between suggestion one and two.

As mentioned by another response, the spacebar acts as the Disable button. It is polled like a joystick button and should work no matter the key focus. When attached to a field, field controls take precedence, and F1 serves as a way to redo joystick enumeration and is the only key shortcut. As for the Stop button, it initially acted as a disable, but again, during beta requests were made to simulate what would happen on the field.

Similarly, requesting the FMS Lock have another way out is something that should be made on the FIRST forums. With encrypted radios, I don’t really see a way to wirelessly connect to the robot anyway, but again, it is their call. I think I see the issue with the code that can allow for unlocking. I’ll notify FIRST and see what they want to do about it.

Greg McKaskle

My primary concern with the current system is the long reboot time for the DS…

When this happens, do the joystick LEDs and list respond to the gamepads? Pressing any button on the gamepad should turn the LEDs blue.

As mentioned, F1 was added in an update to reenumerate the joysticks for teams that find they need to swap or plug in a joystick after the auto period. That will close all handles to joysticks, redo the enumeration, and reopen according to the list. Even if this fixes the issue, if at all possible, please help document how it gets in that state.

If anyone else finds a way to provoke joystick loss, please post steps. Everyone wants a robust DS.

Greg McKaskle

I did not notice the LEDs. I will be sure to check if this happens again.

I was not aware that F1 also re-enumerated USB. Good to know. As I am sure you know reproducing this kind of issue deterministically will be very difficult.

.

I do not know if this will be possible using a Windows based solutions. Windows is not intended for target specific or mission critical tasks such as a FIRST competition. Know matter how much you slim it down it will still take an excessive amount of time to reboot. At events, teams are given a very limited amount of time to troubleshoot between matches. Every second of every minute counts when you are only provided three minutes between semifinals and finals. Add the reboot time of the DS (up to 2 minutes) with the time the crio takes to boot (40 plus seconds) with the time needed to deploy code and you do not have enough time to make last minute changes to auton or other robot functions that may be critical to an effective strategy. The part that was really frustrating was that fact that no one from NI was at the FLR to provide support, this was something that you did not have to worry about with IFI.

Sure we do, and it is a perfectly valid way to stop the robot. My problem with it is if you press the STOP button you have to power cycle the robot to get it running again. If the STOP button mimicked the old disable switch I’d be perfectly happy.

Also, the spacebar works as a fast disable, but you have to train yourself to do it so you don’t forget when the important time comes.
I was unaware of that. That’s good to know.

Suggestion 1 was coming from a use case where I run autonomous, then disable, then reenable without changing to teleop. Our team protects against running accidental autonomous, but there are a lot of teams that don’t.

Suggestion 2 was coming from the initialization standpoint. If the last operation from the DS was to run autonomous, it is very easy to forget to change the state back to teleop. My point here was to default to teleop-disabled on startup (or initial link connection)

We have taken preventative measures on our end to try to make our robot as safe as possible with the way things currently work. I’m more worried about the teams that don’t. It’s far too easy to enable without looking at the run mode.

You could always use my Virtual DS for testing… it’s done that from the start. :rolleyes:

Mike,

1310 experienced a similar USB input device problem twice during the build season. We were never able to deterministically repeat it, however we saw all the same symptoms.

We use one Logitech Dual Action Gamepad and one Logitech Attack 3 Joystick (w/ modified internals) to control our robot.

On both occurrences of the problem, the devices appeared in the list of USB input devices in the DS. I do not recall if pressing buttons on them caused the LEDs to turn blue. However we printf’ed the getRawAxes on both joysticks and received nothing.

Simple replugging of the joysticks did not solve the problem.

Ultimately, it required us to quit the DS app, unplug the entire supplied USB hub, plug in one of our controllers directly to the Classmate, verify the OS re-install the device driver, unplug the device, restart the DS app, plug the hub back in, and reorder our input devices - the enumeration doesn’t always put them back in the order that we need them in.

Right now, I’m a little weary of the Targus USB hub that was supplied. I’d like to know if you’re using it as well? Beyond that, I’m a bit stumped as to the cause of the problem.

The first time the event occured we were using the hub. The second time the devices were plugged directly into the classmate.

You know me - I’m just giving you a hard time, Dave.

As far as we’re concerned, our autonomous is set up so that it can only run once no matter what. Once it starts, you’re forced to cycle power to get it to go again. So for us, the STOP button is just as good as the disable since we’re going to have to reboot to do another test anyway. I guess if we test using LabVIEW’s play button, we can just push the LabVIEW stop and then play again to clear the memory and that is much quicker than a full reboot, but with deployed code it doesn’t matter how we end auton.

@Greg:

1A: 30s is death in the pits. The IFI system could be up and running in under a second on tether, 5s on radios. 30s or more is waaaay too much time.

1B: My download times appear much greater than you stated, although I could attribute that to the 5 people standing behind me saying “WHEN CAN WE TEST THE ROBOT!!! GET IT DOWNLOADED NOWWWWW!!! GET IT THERED AND MOVE THE ARM NOWWWWW!!!” while waiting for my code to download.

1C:The IFI system sent packets every ~26 ms. The RS422 900mhz radios sent exactly the data that was needed in one fairly small packet to and from the robot. Data to the robot was 26 bytes in one packet, data back was 2 26 byte packets (most of the data back was exclusively used by the Dashboard). I do not know the latency on it, but it wasn’t much. As for the new radios, we had noticeable control lag when using the camera on the Dashboard, but only on the field. This lag was aprox. 3 seconds. I can’t say for sure what the cause is, but I can say it appeared when we added the camera and was fixed after we removed the camera, leading me to believe it was the camera.

IIA: We have to reboot Driverstation after every match to get to loose the FMS lock (another annoying feature) but sometimes we do reboot it. If it goes on to the field shut down, it will not find the Cypress board and we are dead.

IIB: This is why we are looking for a solution to the Cypress board, and this almost cost us 2 matches.

IIC: We come off the field with back-to-back matches and have to change all four bumpers, tether the robot to move the arm, change the battery, and get it ready to play again in 5 minutes. 45 seconds is DEATH.

III: Our radio had a loose terminal during one match (duck tape fixed this) and after loosing comm once it had no possibility of ever gaining it again during that match. We tied that match after playing 18 seconds.

IV: The cRio has a 400mhz processor, and while it has aprox. 30% free space (shown during Deploy) it is still not using nearly the power it has unless doing vision processing. We had a crab-drive last year using this system. We needed to do some trig calculations. Even with the IFI processor back when we built a crab drive in 2005, we had plenty of power to do trig calculations with lookup tables and it worked great. That was an 8-bit PIC with far less power or code space then the cRio, but it performed the same task just as well.

IVB and IVC: See IIC

V. If I place a probe on a VI before the “Robot Code” light on the Classmate comes up, LV will crash on my PC. Guaranteed. Always.

@Dave: I would like to see it reset to Teleop after comm or code loss, but not after disable. I often set controls on my VI’s for commonly changed parameters (e.g. distance or kick force) and change them while the code is running. I often run autonomous for many times in a row when debugging it, without ever changing to Teleop or downloading new code.
Spacebar disables the robot without e-stopping it. The e-stop button killing code annoys me too, but I have actually used it before (the Toyota bug) so I wouldn’t get rid of it.

On the Stop button acting as an E-Stop: Sometimes if the robot spontaneously acts wierd and needs to be stopped, it is nice to be able to probe everything to see what it is trying to do and find your problem. We have an arm on our robot with 800:1 off a CIM, and it has enough torque to cause damage to many parts of our robot. If it starts destroying something, we want to be able to probe it to see where our problem is. We had this exact issue, and found that the analog module on the cRio actually ejected from the cRio.

@Greg again: f1 re-enumerates JS’s? Does it find the Cypress board again?

@Everyone: Did anyone notice how nobody EVER debugged comm or firmware things EVER with the IFI system, and how it just always worked? I do not see the cRio or the entire control system as designed to handle the rigorous environment that is FIRST, nor see it as better than the IFI system. Everything, except vision, that we are doing now could be done just as well on the IFI processor. Vision could be done too, using the CMUcam. Now, we are doing vision and other complex processing using many times more code space on many times more expensive equipment, yet achieve the same result, sometimes worse. I am not against using the cRio as a robot controller, most of the cRio based things except for download and bootup times are quite nice to work with. The whole system is not designed to work together; it is a bunch of off-the-shelf stuff that people are trying to make work together. None of it is designed for competition robotics, and can’t handle the demands we place on it like the dedicated robot control system provided by IFI could. This all leads to another problem: Since the control system is not made by any one company, it is hard to lay blame for failure on any company.

I never noticed that, seeing as I was involved in three or four IFI emergency beta firmware debugging sessions in the middle of the season…

So soon we forget how many IFI Master Code upgrades we performed over the years. I remember version 15d
Selective memory is a killer.

No system is perfect. What you should look for is corporate willingness to recognize and react quickly to solving issues when they do arise.

Dave, this is Dave. Thought you two should meet…

(sorry, I just HAD to)