LabView corrupt?

While many would have ideas, I would most appreciate hearing from actual NI employees and similarly knowledgeable folks.

We had working code, and then I noticed that our shooter was no longer rotating to aim. This aiming is controlled by a loop in Periodic Tasks.vi. When I probed the wire coming out of the global boolean (that goes “TRUE” to make the shooter aim), I did not get “FALSE”, I got “Not Executed”. What? So I put a probe on the wire between the constant and the Wait (ms) that controls the loop rate (see figure 1). The value of the probe was zero!

In desperation, I deleted the constant and the wait, and re-drew them. No help: still zero milliseconds. So I deleted them again, and and also removed the While loop (removed, not deleted, preserving the code within). When I re-drew the loop and the timing bits, things seemed to work…for about five minutes.

Then we started getting “Watchdog not fed” messages and the controls became entirely unresponsive. So I started probing other loops in Periodic Tasks. When I placed a probe as in figure 2, I got a value of “Not Executed”. That is just not right. The value of the global being probed in figure 2 is obtained (from the dashboard) in a loop that is also in Periodic Tasks (shown in figure 3). The value of the probe shown in figure 3 was 200 (the correct value). Please explain.

I “solved” this problem by just disabling the entire loop shown in figure 2. Again, things worked for a few minutes. Then our flywheel stopped operating (along with more watchdog errors and unresponsiveness.) When I probed the timing for that control loop (just as in figure 1), I got a value of some 11 million and change (!). I tried the trick of removing the while loop and timing, and then re-drawing them. Same problem, but a value of 12000344 when I probed between the constant 20 and the Wait vi (refer to figure 1 to see why 20 is the expected value).

I should mention that all of these problem occurred while using a cable to connect the laptop directly to the radio. No wireless, because these problems never surfaced until we were in the pit for this weekend’s competition.

My solution? I un-packed our last saved archive from before these problems started. Mostly the code we wanted with only a few minor issues not fixed. We made no changes. We did not even open the code, only built it and deployed it. And it works as before FOR ONE MATCH ONLY. Our robot will only function if we re-image our cRIO and re-deploy the build after every match! And, some things are no longer working properly, as if the build is slowly becoming corrupted…

Anther thing to make clear: between having working code and the first of these problems we made no changes to Periodic Tasks.vi. The few changes we did make were all in Teleop and Autonomous.

I’ve pretty much had it. Unless someone can tell me how to fix this problem and prevent similar absurdities from occurring in the future, we’re using C next year.

figure 1.png
figure 2.png




figure 1.png
figure 2.png

I fit the description. Can you attach your code, all of it, or PM me and I’ll give you my NI email address. I don’t have a cRIO at home, but I can look at the code and see if that helps. Also, what controller are you using? Are you potentially running out of memory? VxWorks is a lean OS and from what I understand, doesn’t take much pity on programming errors like running out of stack or memory.

Clearly, that is not normal.

Greg McKaskle

Greg-

Thank you for the quick reply. I’m attaching a zip to this post (~850 mB). I can tell you that this is problematic code. I can’t tell you exactly which of the problems go with this code, because we were frantically trying to get our robot functioning in time for our first time on the field, and a lot of people were all yelling at the programmer at the same time…

This is everything: our robot code, our test code, and our dashboard code. Our CPU runs at about 75%. My recollection from downloading the code (vs. deploying) is that we use less than half of the available memory. There’s a message line that says something about using xx kb of yy available. yy is ~60kB if I remember right, and the loading is done before xx get to 30kB. But again, memory from those pit conditions is poor.

Regards,
-Geoff

RoboArchive-2012-03-24-07_45_38.zip (894 KB)


RoboArchive-2012-03-24-07_45_38.zip (894 KB)

Your dashboard sends UDP packets every 20 milliseconds. Your UDP receive loop on the robot will wait 20 milliseconds between checking for new packets. This is a recipe for slowly degrading performance, as slight timing discrepancies will cause packets to be produced slightly faster than the robot will consume them.

This might not be the only thing that is causing issues with your code, but it is an obvious one that jumped out at me quickly. Remove the 20 ms delay in the UDP receive loop, and let the UDP receive function itself control the loop execution rate. By the way, there’s no need for such a short timeout on the UDP receive. The default of 25 seconds would be fine.

At the top of teleop, there is a loop that runs forever. It has a 20 ms timeout in it, but it means teleop will not return to the framework and process the next teleop packet.

The Robot Code State instrumentation VI should just be on the diagram, not in a loop.

Speaking of which, the rest of teleop shouldn’t be in a loop either. The template had a big green comment on the left explaining why it was important to return within 20ms. It has been deleted, but if you hover over the teleop icon, it still says it. You can also show the kids in RobotMain that teleop will be called each time the DS/FMS give you a teleop packet and the comments there about how it works.

If you see something that misled them into writing it this way, let me know and we’ll try to improve the template. Feel free to post or have them post questions.

I second Alan’s comment about the UDP read in periodic. Read is a blocking call anyway, no need to throttle the loop in parallel, especially when it can cause a backlog.

Other than that, I thought the code actually looked well written. The state machine in auto is nice, but it would typically use a shift register or a feedback node for the state variable.

Hope that clears everything up. Be sure to build and run as startup.
Greg McKaskle

It must have helped. They are in the final match at Lenape right now!

I will be very sad if this thread ends like this, because I don’t think the the problem is resolved. It is true that we made it to the Lenape finals (where we were crushed by DuPont Engineering), but we did so for two reasons:
1. The absolutely absurd ritual of never recompiling the code as described in my first post.
2. Exactly BECAUSE of the while loop in Teleop that Greg objects to.

We know that you are not supposed to do that, and at our first competition we didn’t. But the actual reality of communications on the FRC field are such that we had a totally un-drivable robot: the robot stops responding to the controls because it stops getting packets. Then they all arrive, and the robot leaps ahead, and extra far because of all the joystick action the driver made when the robot stopped responding in the first place. It’s a mess. But put your Teleop in a while loop, and all those problems simply disappear.

I am convinced that our corrupt data problem is a completely separate issue. Unfortunately (from this perspective) we can’t try to resolve this on the actual robot. But we have a second cRIO (both are the old 8-slot versions, at least 3 years old, but probably more), and I am going to load the code onto that and see if I can reproduce the problem. I will definitely be taking the wait out of the loop that receives UDP packets. I see why that could be a problem.

I like the idea of some kind of stack overflow or memory violation causing the corruption. Is it possible for my code’s to get “out of bounds” and compromise the deployed code? This seems unlikely to me, but it would explain the need to redeploy between every match…

Something doesn’t make sense, and the logs of your matches should help make sense of it. I’ll also be able to run the code on a cRIO tomorrow. But honestly, I know what an infinite loop does, and it is not going to solve anything. I know that it gets crazy at a competition, and I applaud you for walking the tightrope that allowed your team to do so well. But again, it just doesn’t make sense.

I’ll send you a PM.

Greg McKaksle

Where are those logs written? Whenever I’ve pressed the “View Log” button on the Driver Station I’ve just gotten an error message.

DS Logs are stored in Users\Public\Documents\FRC\Log Files. The viewer is in Program Files\FRC Driver Station

Be sure to post the solution when you find it. Many of us learn loads from these kind of threads.