How do I know if I'm overloading the RC program?

I’m an experienced embedded programmer, but I’m still getting to know the IFI RC. I’m getting the feeling that I’m on the hairy edge of what the RC is capable of.

Despite my best efforts to partition the program into neat modular sections, I’m starting to see erratic behaviour that is hard to explain from simple bugs.

I get different behavior just from adding a simple (fully tested) function call. Printf’s stop working for no apparent reason. I’m now able to crash the program simply by flicking a switch, and I can even put the RC into a fault mode that is only recoverable by powering it down (resetting it won’t fix the problem).

So, I have some basic debugging questions:

  1. What causes the Program State light to flash? Is it detecting that my code is not doing something in particular (like servicing the shared memory)?

  2. Could I be exceeding my stack size (from nesting my calls to deep)? Is there an easy way to increase the stack allocation, or will it just use available memory until it runs into data? I see the size set to 0x0100 in the linker but I’m not sure how to change it.

  3. I’ve limited my use of floats to only the essential functions, but I assume that I could be burning all my CPU time in PID loops etc. Will the program crash if I don’t service all my code in the allotted 28ms click time?

  4. I’m using 4 analog inputs and the Gyro and the Camera… Any way to tell if I’m getting too many interrupts or other critical elements?

Thanks for any advanced “generic” debugging help.

Phil.

In a quick regard to 4.) I am using all 6 interupts, the camera, the gyro and the accelerometer, not to mention an ultrasonic and some limit switches, pots, etc… I don’t think that this is the problem.

This sounds weird. Can you post a copy of your code in a zip? Or if you don’t want to give this away, can you give us an idea of the flow of the program? I doubt that there is too much stuff, unless you are doing a SERIOUS amount of math. Even in this case, the printf’s would just be slow, not stopped. And the program state light would not be blinking either O.o…

Jacob

I’d be wary about anything floating point. The 18F’s don’t have any hardware float processing capability, it’s all software emulated, so accuracy, speed, etc are all less than spectacular.

If you can, I’d suggest re-writing and either bit-shifting it to integer math or simplyfying the results (less precision). Otherwise, I’d suggest the use of a coprocessor. We have a nice Python implementation using GUMSTIX. If you like, you can see more details on our website ( www.adambots.com , click on “Co-processor” ) or PM me.

I have run into a similar situation with the RC. Try commenting out ALL of your printfs and see if you get the red light of death. Also, the processor hates the use of long data types (don’t ask me why, but replacing all longs with ints magically made a program of ours work). I think that under certain circumstances the user code can corrupt the I2C datastream that links the user and master processor, confusing the master. It would probably be helpful to have the IFI I2C link’s source code to figure out what is happening, but I don’t think there is any way to obtain that! :slight_smile:

As a side note, did you try protecting GPR14 (the old 8.2V bug fix)?

When you say you used all 6 interrupts, did you put encoders on all six of these pins? Make sure you watch your interrupt rates…

If you’re questioning how long it takes a routine to run you can always set a pin in the beginning then clear it at the end, just use a scope to time it. Or, if you want to see how often your procesor is idle, just toggle a pin in process_data_from_local_IO() and watch how much time you get a constant frequency compared to how long you have constant up/down states (this would mean your doing something else, not toggling your pin).

I had an issue with encoders having too high an interrupt rate myself, so we switched to a different encoding technology. Remember the ISR that the compiler puts in also takes time, can’t quite remember how long.

Remember that floating point operations use the same two registers, so if you interrupt a floating point operation and start doing one in the interrupt, you’ll get junk data when you return from the interrupt. Stack overflows on rare occasion will not cause the processor to crash, however usually this will cause the ‘code error’ light to flash. :ahh:

If you need anything I havent listed here just give me a pm/email.

-q

Thaks for the good insights… (I’ll have to track down that “protecting GPR14” fix… we do/did have the 8.2 bug).

I found part of my real bug. It was a very nastie case of code reentrancy gone wild. It wasn’t planned reentrancy, but a certain state transition was causing A to call B who called A who called B etc… until all memory was used up. Not very smart.

Now that’s fixed I can continue to look for the odd behavior bug.

Phil.

Oh Boy, that was a great tip. It took me a while to find the reference but I did… Here is the link to the actual file.

I have disovered that I am using the V2.0 version of Kevin’s camera code, which does not have the GPR14 protected. I see that V2.1 does have it protected. That will teach me to be afraid of the “Latest version” of code.

This explains a LOT of the strange behaviour that I was seeing (inputs on the wrong channels etc)

Maybe I’ll get some sleep now :slight_smile:

Also be aware that the hardware call (PC address) stack is limited. The 256 bytes of allocate stack space in ram/bank memory is for allocation of automatic variables at function call time.

“After the PC is pushed onto the stack 31 times (without popping any values off the stack), the STKFUL bit is set. The STKFUL bit is cleared by software or by a POR (Power-On-Reset).” [PIC Spec]

Typically the processor will reset when the STKFUL bit is set, but the behavior depends on the state of the STVREN (Stack Overflow Reset Enable) configuration bit. This should be enabled to allow the resets in FIRST environments. Runaway re-entrant functions will cause bad behavior including resets as a result of stack overflow. Moreover, out of the stack depth must be reserved call space for low and high interrupts. If the chip isn’t being reset because STVREN is clear, then you should be able to check the STKFUL bit in the STKPTR hardware register as the bits will remain on until POR condition or software resets them.

The high priority interrupt is a two deep stack on the slave processor:
     . (1) InterruptHandlerHigh
         . (2) Prep_SPI_4_First_Byte
         . (2) Handle_Spi_Int
The low priority interrupt is at least another 4 stack levels in the MPLAB default environment:
     . (1) InterruptHandlerLow
         . (2) CheckUartInts
              . (3) Handle_Panel_Data
              . (3) Process_TX
                   . (4) DisableXmitInts
              . (3) Serial_Char_Callback

So at LEAST 6 stack frames must be reserved for interrupts, maybe more.

Bud

I’m convinced that there are a lot of gremlins in RC, since in the years since 2003 I’ve seen a lot of bizarre stuff. The biggest one was the “8.2V” data alignment bug, which basically killed our bot for the entirety of BAE last year (and a bizarre scene in which one of the IFI guys insisted it was our code at fault when we had the default code loaded).

This year, I’ve had two times where editing a portion of code caused a completely different section of code to stop working. For example, today we added a single check of a limit switch in our non-autonomous code (test rc_dig_in10 and limit a single pwm, either with our own code or with the IFI-supplied routine), and that completely broke the PID loop in autonomous mode which doesn’t even call that limit switch code. All I can think of is that somewhere hiding in the code something is trampling on the variables for another variable, like a 16 bit value being written atop an 8 bit value.

Similarly, we had an issue during the Aces High scrimmage where random spurious values would appear on a variable that was tied directly to p1_wheel, even though I could monitor p1_wheel directly and verify it was constant.