For teams having trouble downloading code

Toward the end of build season we started having trouble connecting to our cRIO. It turned out that when our cRIO booted and ran its deployed code it took up too much processor speed to run the code and connect to our programming computer properly. So we started flipping the “no app” dip switch on the cRIO every time we wished to download code to the cRIO and all of our connection issues were solved! Essentially, flipping this dip switch prevents the cRIO from running code when it boots preventing connection issues. (Of course you’ll want to reset the switch before you go to play a match). I didn’t want to post this until we had been to a competition and I was sure there wasn’t anything wrong with our code.

If you are having similar issues this following procedure may help.
Good Luck FRC! :cool:

Luke

We were having problems doing a “run as startup”. I can’t remember the error message, but we were getting several lines that said something to the effect that it couldn’t complete with the current dialog. Clicking NEXT for each error cleared the message and sometimes would let the load complete. But usually it would not succeed with an error about failing to download configuration settings. The only way that we found to get around the problem was to re-image the cRIO.
Doug Norman from NI was at the AZ regional, looked at the problem and couldn’t come up with a solution. He showed us the “no app” switch procedure. We now Set the “no app” switch, press “reset” on the cRIO, build, run as startup, reset the no app switch, and reset the cRIO. I made sure that all of the drive team knows to look for the switch in the wrong position and the coach carries a small screwdriver in the programming team forgets to reset the switch.

While we’re talking about odd bugs - we get one where when we try to run (not permanently depoy) code, it will not do so. It will get to the point where it is trying to run the project file, and we will lose connection to the crio.

Our solution to this has been to format and leave the crio clean. Then we can run to our heart’s content, and when we finally want to deploy permanently, we can do so.

The next time we want to “run”, we do a reimage of the crio so that the code is all wiped, and go back to “running” the code rather than permanently deploying it.

That seems like a pretty drastic approach, but it does seem like it should work. I would think there are other, faster ways to do this such as the no app switch. Also, you could set it to not run at startup (as long as you didn’t tell the imaging tool to “always run as startup”). You could also FTP to the cRIO and delete the startup application.

I never have this problem anyway… perhaps the code you are deploying is overly processor hungry? Perhaps you are cancelling when you don’t need to (there are a number of poorly worded dialogs in LabVIEW RT that can mislead you into thinking there is a problem and luring you into aborting an operation in progress that would complete successfully if you don’t read them carefully)?

-Joe

That sounds very much like the informational message one expects when about to overwrite a program that is already running on the cRIO. It’s not an error; it’s just letting you know that if you continue, the existing program will be halted. Since you’re trying to put a new program in place, halting the old one is exactly what you want to do.

Doug gave us the same advice. We were having random sucess with our deployment, more failures than successes. IMO, the NoApp switch is a better solution than a cRIO re-image for a few reasons … with the primary reason being the chance of lobotomizing the cRIO if the re-image goes bad.

Here are some steps I posted in the “Deployment Issues” thread …

  1. on the cRIO, set the No App dipswitch to TRUE (this prevents user code from running when the cRIO boots, regardless of the SW settings)

  2. reset the cRIO and feel free to deploy/run/build/set-as-startup, whatever … As long as that dip-switch is set, it’ll never run the startup apps when it boots, and the deployents (or just hitting RUN in the RobotMain.VI) works every time.

  3. when you are done debugging and/or troubleshooting and want to try it out in a “realistic” fashion, do a final build and Set as Startup.

  4. when the upload to the cRIO is complete, there’s a window to close and then it asks you to reboot the cRIO. Before you click the Reboot button, UNSET that dip-switch

He said that they are looking into why this is occurring, but apparently they’ve got a pretty good workaround until we get a patch.

After I learned this trick, I never saw those errors again, and we were slinging code onto our 'bot all day today (Practice day at the AZ regionals)

We have this problem often, but only when fully deploying (not running main). Our solution is to reformat, as the cRio was not put in a position that is easy to get to to flip the No App switch. We have never had a problem in the past that required us to press the buttons on the RC or cRio, so we burried it in a place where there is hardly enough room to get a tool in there. It works fine, and we can plug things into it, we just can’t flip the switch every other time we have to download. The errors seem to occur after a day of not downloading code, so formatting once will get us a day of downloading happily. We found that the No App switch works after talking to 27 at Kettering, seems they have a similar problem sometimes. We actually spend 1/2 hour of our unbag time trying to download code. (then we spend 45 minutes finding out why Arcade Drive in auton wasn’t driving, then found out it scales inputs without telling us). I always assumed this bug was just another LabVIEW bug since I have found several others (probing while the code loads will crash LabVIEW, this bug involving not downloading, loss of cRio while downloading, etc.)

This is exactly the problem we had. Setting your cRIO to “no app” before booting will allow you to download and run code on it. Without having to worry about formatting it :smiley:

Today I tried doing a Run from Main on the practice robot. It worked fine yesterday, and it worked fine Saturday, but it didn’t work today. While downloading, it would hit Buzz15.lvproj, wait for a while, then say “Waiting for Real-time target” with the option to “stop waiting and disconnect”. It will eventually timeout and say “Lost connection to real-time target”, and when I look at the run box the last thing it tried was "RT CompactRIO target (failed to deploy target settings) (newline) Deploy completed with errors. Its really really annoying. I tried everything. New battery, tether, reboot several times, then flipped the No App switch and booted it again. All was well. Its really annoying to have to reach deep into the robot where the cRio is. And this is the practice robot - the real robot with bumpers makes it even harder since its much harder to see without the bumpers. If we would have known this during week 1, we would have put the cRio somewhere else so at least we could flip the No App switch when we needed to. It’s really annoying to have to flip the No App switch.

That’s the /exact/ issue we see, on the same file apalrd.

Since formatting takes just slightly longer than building if you start both at the same time, we simply format before uploading each build, then format again before we run code.

We have a standard “hands off” policy regarding the Crio. Things do NOT get plugged and unplugged - everything goes through an adapter. Dip switches do NOT get changed, and no one touches it.

The last thing on earth I want is some over-zealous screwdriver-jockey to put one of those dip switches through the back of the crio because they don’t understand how fragile some things are. (Nothing against screwdriver-jockies of course…) Even worse would be to screw up the ethernet port to the point where we can’t program at all.

That’s just our way of doing it. It sounds like the dipswitches work well for lots of folks, and that’s cool!

I have never personally had to use the No App switch. If it is helping you to get things deployed, then by all means use it.

On the other hand, I suspect the need for using this indicates that the code is using so much CPU that the protocol methods are being starved.

If using LV, you should turn off the global for disk logging of errors. Doug Norman posted good directions for doing this. It involves changing one global variable and it is a very good thing for LV teams to do.

The next thing a LV team might want to do is to open Tools>>Real Time>>System Manager. This is basically the task manager for the LV cRIO. To get an accurate reading of CPU usage, click to the VI tab and turn off the Track VI States. Then click back to the first tab and Start. The bottom chart shows CPU usage and you should help understand what the cRIO is having to perform. Using this, you can run different modes, turn off different features, even comment out some code, and use the task manager to learn what the CPU cost is of different features.

Feel free to post questions if you have issues shrinking the CPU usage. My assumption is that by cutting the CPU usage even a bit, you will no longer need to use the No App button.

Sorry I don’t have as much detail for nonLV teams. The profiling tools exist, I’m just not as familiar with them. I suspect that the download issues would be caused by the same sorts of issues.

Greg McKaskle

@rwood359
That is common. There is just an old version of code on the cRio.

@Greg:
So, you are saying the cRio isn’t powerful enough? I will turn off the global logging and see if it helps. I’m not even using any camera-related things (no camera on dashboard or target tracking) so I can’t imagine how many problems teams using it must be having. It would have been nice to know about this back in week 2 when we mounted the cRio way in the belly of the robot…

@Joe:
We must run as startup, since it has to load the code when we go on to the field. That LabVIEW RT dialog that says “Waiting for Real-time target” with the “stop waiting and disconnect” box during Run and Deploy had me thinking it had failed - when it hadn’t. However, once I let it wait, sometimes it will actually fail and I will have to find a little screwdriver, reach way into the belly of the robot, flip the No App switch, try again, and flip the switch back.

I do not believe my code is overly processor-hungry, it has two threads processing PID loops running with 50hz (slightly faster than IFI processor) and a few other threads with open-loop controls running at 50hz or 20hz. My autonomous code has a single loop at 100hz (same speed as Victor updates), although it is not running when I am downloading. I have checked, and none of my VI’s are running without waits. Last year we never had this problem. What has changed to cause this?

I understand that. The problem is that the new code will not load without re-imaging the cRIO or using the no app startup.

Powerful is relative. As I said, I’ve never needed to use the switch. If the download fails, my assumption is that it is because the helper task that does the RT protocol, including downloading code, is not given enough CPU and is timing out. It could also be that it is failing for a reason unrelated to CPU.

Based upon the other posts you have made, it seems like something odd is definitely happening. Have you run the System Manager yet? I’m curious to see what it says, and in fact, you should be able to leave it up while you try to download. Even more informative would be to use the serial cable and the command line profiling.

I’d also be interested in running your project, looking at the performance, and determining why you are getting the dialog. I’ll be leaving for the Dallas event tomorrow, so I may be slow getting to you, but you can post it here or PM and I’ll give an email.

As for the processor power. Think of it as the immovable object/irresistible force conundrum. Computers do what you tell them, and the RT System Manager will help you figure out what it is being asked to do.

Greg McKaskle

Does anyone have any benchmark data for what a “normal” processor load is? I’ve found that running the default code puts the cRIO CPU load at about 92%. I’ve found that running our “full featured” code (ten motors, four encoders, two analog inputs, lots of global variable use) the processor load is about 95%. Even shutting down most of the VIs doesn’t seem to have a significant impact on processor load. With that level of insensitivity, I don’t see a whole lot to be gained by using the System Manager as an optimization tool. Are there a lot of housekeeping tasks going on in the background that drive the CPU load so high?

The System Manager is indeed a very high level tool, very easy to use, but it is a single number, a single speedometer. Before going any further, make sure to go to the VIs tab and turn off the Monitor VI State feature. This is also useful to see what is running, but with so many VIs on the cRIO, this monitoring will raise the CPU number by quite a bit.

If/when you are ready for more detail, try using the Performance Profiler. The basics are to open it using Tools>>Profile>>Performance and Memory. You probably don’t need to profile what is on the PC, so you can change the Targets to profile if you want.

The most trustworthy way to use it is to Start the Profiler, then run your app, then stop your app, then stop the profiler. Once you know what you are doing, you can use the profiler while your app is running, but until you know what is slow, this is the best way to start.

Once you stop the profile, you can click on a column to sort, and you can double click on a row to see what subVIs were called by and contributed to the VI time. Feel free to post profile images or data if you want some input.

Note that the profiler only reports VIs that started and finished while the profiler was on. Also note that the profiler works at the level of the VI. It will not tell you which loop or node within the VI took time unless that node is a VI.

Greg McKaskle

My team has also had the exact same problem’s regarding the downloading and deploying. And we are currently using the dip switch method at the moment. But during our two days of troubleshooting the cRIO way got a vast amount of data points, but due to the inconsistency have no idea what to do.

These data points include all the above symptoms described, but in addition we noticed some very inconsistent code speed. Sometimes we reboot the robot and things will just lag, and this may not seem like anything, but when we power cycled it the code ran at a normal speed. Also, when attempting to run the code as startup while there is currently code on the cRIO it fails about 99% of the time. When it fails and the robot reboots, the status lights on the cRIO blink orange 4 times. Further research on the NI website showed that this can be due to “The controller software has crashed twice without rebooting or cycling power between crashes. This usually occurs when the controller runs out of memory. Review your RT VI and check the controller memory usage. Modify the VI as necessary to solve the memory usage issue.” Then thought that is was a coding problem. As a troubleshooting step we re-imaged and then attempted to run as startup the FIRST default code. As we expected the default code deployed the target settings successfully, and that worked flawlessly. Then we attempted to download our code. This failed, which surprised me because if something on the cRIO code related was using too much memory during the deployment why would it fail when deploying out code over the FIRST default code. Our isn’t that modified from the FRC structure. Basically what we’ve done is read drive encoders in a 20ms loop then send those to another 50ms loop. The 50ms loop executes some calculations depending on the global variables sent into and out of the loop. e.g joystick, I/O and encoder values. It then drives the motors within the same 20ms loop. Furthermore, more complex code has been downloaded to the robot with 100% success about 2 weeks earlier. Which would point away from a code problem. Also when running the default FIRST code we get constant watchdog errors when disabling and enabling the robot. Another instance of these bugs are sometimes one of the PWM outputs being controlled by a PID loop will drop out. But this only happens when the process variable of the encoder reaches a specific point. To diagnose the problem I coded a emulation of that control loop, there was indeed one point on the code were the PID loop seamed to just stop outputting updated error. After further investigation this point where it drops out is also a cross over point for the encoder, e.g. The encoder jumps from -2.5 to 2.5. I’m am unsure if this would cause an error, or the PID loop to just stop working. The final data point we’ve gotten is that sometimes changing one number by .1 or some small value will allow successful deployment.

Also, I was wondering if a MAX reformat would be of order. As far as I know this would allow us to reconfigure the cRIO from it’s default off the shelf state. Would there be any repercussions, besides having to reconfigure the cRIO just like it was just shipped? If not, we could try running the default code twice. Build run as startup, and then determine if it truly is the code or something else?

Finally, I’ve heard of the idea to turn of the FTP, HTTP server, or the error logging on the cRIO. Does anyone have any wisdom about this. I’ve heard conflicting views, and does it actually make any noticeable difference?

Unfortunately at this point to my knowledge it’s like finding hay in a needle stack. Any help would be greatly appreciated.

You indeed have lots of observations, and I can’t make sense of all of them either.

I’d start by turning off the error logging – if using LV. To do so, open ErrorsGlobal.vi, drop the global icon into the diagram of Begin.vi and set it to FALSE. Turning off error logging will not hurt anything, as unhandled errors are still being sent to the DS and logged. This global should have been off all along and was an oversight.

While your cRIO is running your code, open the Tools>>Real Time>>System Manager and observe the memory usage over time and the CPU usage over time.

Another thing to point out is that the VIs tab is another way to selectively terminate VIs. It is likely to be faster to kill the Robot Main and deploy rather than flip the DIP, reboot, deploy, flip, reboot.

Greg McKaskle

Thanks I appreciate the procedure for turning off the error logging.We currently have the dip switch procedure down but I feel like it just bandaging the problem, and for future knowledge I’d like to know exactly what the problem is so we can avoid or quickly fix it. I’m currently unable to test the error logging off until tonight but I’ll post back with the results of the deployment and if possible get a CPU and memory usage. Does anybody know what the standard CPU usage is on a consistently working machine?

In addition would anyone advise a full system format using the MAX tool. This would provide a low level format and a clean slate. Am I correct?

One last thing, I doubt this would affect it but I run out of places to look, would changing the loop timings in periodic tasks to both 20ms have any effect on the thread scheduling, or watchdog, perhaps it’s using to much memory or periodically stopping a process to run another deterministic process. Anybody have any ideas? If it is a processing issue, any tips on making code more efficient?

Once again thanks.

In my previous post, I reported 92-95% regardless of the complexity of the code. I’m not currently having any download issues (at present) although I do still see the occasional “Invoke Node” problem with BUILD (as discussed in the “Labview Build Errors” thread, and still a very open issue from my perspective).

I’ve got all of our asynchronous loops (we don’t use “periodic tasks.vi” per-se) throttled at 50ms, FWIW.