PIC18F8722 Errata & High Priority Interrupts

I wanted to make sure that people who are using a lot of interrupts see the following thread on the IFI forums: http://www.ifirobotics.com/forum/viewtopic.php?t=498

I was also curious if this was the bug that Kevin mentioned here: http://www.chiefdelphi.com/forums/showpost.php?p=430290&postcount=10

We think we’ve seen some anomolies at high interrupt rates, although it’s been hard to quantify.

Any chance this could cause serial communications failures / overflows?

Could this issue possibly cause a problem like this?

(I’m probably just grasping at straws, but I’m going crazy over here.)

Does anyone know if this would be the cause of downloading your program and everything is working fine. Then sometime later in the night…the robot is DEAD…no controls or CPU excuting. We have to download the program back into the robot to get it to live again…

Does anyone else have that problem?

Yeah, I’ve been staring at that errata entry for over a week now and trying to figure out which of the wacky things teams have been experiencing can be attributed to it. Basically, the fast register save functionality of the high-priority interrupt is completely hosed. Some people have noticed wacky things like joystick data getting swapped. Well, this bug could certainly cause some of the weird things some folks have seen. The good news is that the fast save functionality can be turned off. The bad news is that the high-priority ISR is buried in IFI’s library and we’ll need to get it re-compiled. Now that this problem has been “outed”, I’ll give IFI a call tomorrow and see if we can get the new library sooner rather than later.

No, that’s a completely different %#&! bug that caused me a significant amount of grief for a few days. %#&! buggy silicon.

-Kevin

Team 386 spent way too many hours tracking this one down. When we finally isolated, we reported immediately to IFI with complete details in their forum. We had developed a workaround involving disabling high priority interrupts in the low priority interrupt exit code protecting the MOVFF instructions that were taking a hit. It conclusively proved that the issue was as documented in MicroChip’s errata.

Lynn (D) - Team Voltage 386

Yes, you guys did an awsome job tracking this down. IFI sent me your detailed write-up, which you might consider posting here.

-Kevin

Ask and you shall receive. Here’s the writeup from IFIRobotic’s FAQ:

We have been experiencing intermittent data corruption and random logic failures in our 2006 controller code. We have tracked the issue to interrupt frequency making it worse. But, we believe we have finally nailed the actual cause and have a user-level workaround but the actual correction needs to be made to IFI’s high priority interrupt handler.

First, please refer to the MicroChip PIC18F8722 errata sheet at http://ww1.microchip.com/downloads/en/DeviceDoc/80221b.pdf

According to this, the high priority interrupt fast return shadow registers have an incorrect interaction with the W, BSR, and STATUS registers when used as the destination of a MOVFF instruction. We have examined the assembly code generated by the MCC18 compiler and found the following:

  1. The high priority interrupt routine (located in ifi_library.c for which we have no source code) uses the RETFIE FAST instruction that causes the issues described in the errata.

  2. The low priority interrupt exit code generated by MCC18 in user_routines_fast.c as supplied in the default code uses the MOVFF instruction to restore the BSR and STATUS registers from the stack.

This combination of code, according to the errata, is subject to BSR and STATUS register corruption if a high priority interrupt occurs during the MOVFF instructions restoring BSR or STATUS on exit from a low priority interrupt. This situation has a higher chance of occurring as the interrupt count increases.

Our workaround, which seems to prove the issue, was to pull the interrupt exit code as generated by MCC18 and insert calls to disable and enable global high priority interrupts before and after the MOVFF instructions that restore the BSR and STATUS registers.

Before the workaround, our mean time to failure was 473 seconds with a variance of 210 seconds. With the workaround in place, we have run over 4,240 seconds without a single detectable corruption. In all situations, over 4,000 interrupts per second were excuting. No, this is not the norm, but was used to exacerbate the problem.

The actual correction, as suggested in the errata, is to manually save and restore the W, BSR, and STATUS registers and use the non-FAST RETFIE from the high priority interrupt service routine. They even recommend the correct #pragma to do this in MCC18.

We are willing to test a new default code version (should only need a new ifi_library.lib) if you would like us to. Additional details on how to detect this type of corruption can also be supplied at your request.

Answer: We have revised the FRC libraries files on the Robot Controller web page, http://www.ifirobotics.com/rc.shtml#Programming. You need to recompile your code using these updated libraries files.

Addendum:

IFI did provide us with the new library to test on Monday afternoon. As you can imagine, we declined actually putting into our robot before it shipped. On Thursday, we finally set up our 2005-upgraded-to-2006 controller and the 4,000+ interrupt per second test version of our control code. The library proved solid so IFI released it to the world. Record response, if you ask me!

Lynn (D) - Team Voltage 386 Software