Quote:
|
Originally Posted by Kevin Watson
Yes, you guys did an awsome job tracking this down. IFI sent me your detailed write-up, which you might consider posting here.
|
Ask and you shall receive. Here's the writeup from IFIRobotic's FAQ:
We have been experiencing intermittent data corruption and random logic failures in our 2006 controller code. We have tracked the issue to interrupt frequency making it worse. But, we believe we have finally nailed the actual cause and have a user-level workaround but the actual correction needs to be made to IFI's high priority interrupt handler.
First, please refer to the MicroChip PIC18F8722 errata sheet at
http://ww1.microchip.com/downloads/e...Doc/80221b.pdf
According to this, the high priority interrupt fast return shadow registers have an incorrect interaction with the W, BSR, and STATUS registers when used as the destination of a MOVFF instruction. We have examined the assembly code generated by the MCC18 compiler and found the following:
1) The high priority interrupt routine (located in ifi_library.c for which we have no source code) uses the RETFIE FAST instruction that causes the issues described in the errata.
2) The low priority interrupt exit code generated by MCC18 in user_routines_fast.c as supplied in the default code uses the MOVFF instruction to restore the BSR and STATUS registers from the stack.
This combination of code, according to the errata, is subject to BSR and STATUS register corruption if a high priority interrupt occurs during the MOVFF instructions restoring BSR or STATUS on exit from a low priority interrupt. This situation has a higher chance of occurring as the interrupt count increases.
Our workaround, which seems to prove the issue, was to pull the interrupt exit code as generated by MCC18 and insert calls to disable and enable global high priority interrupts before and after the MOVFF instructions that restore the BSR and STATUS registers.
Before the workaround, our mean time to failure was 473 seconds with a variance of 210 seconds. With the workaround in place, we have run over 4,240 seconds without a single detectable corruption. In all situations, over 4,000 interrupts per second were excuting. No, this is not the norm, but was used to exacerbate the problem.
The actual correction, as suggested in the errata, is to manually save and restore the W, BSR, and STATUS registers and use the non-FAST RETFIE from the high priority interrupt service routine. They even recommend the correct #pragma to do this in MCC18.
We are willing to test a new default code version (should only need a new ifi_library.lib) if you would like us to. Additional details on how to detect this type of corruption can also be supplied at your request.
Answer: We have revised the FRC libraries files on the Robot Controller web page,
http://www.ifirobotics.com/rc.shtml#Programming. You need to recompile your code using these updated libraries files.
Addendum:
IFI did provide us with the new library to test on Monday afternoon. As you can imagine, we declined actually putting into our robot before it shipped. On Thursday, we finally set up our 2005-upgraded-to-2006 controller and the 4,000+ interrupt per second test version of our control code. The library proved solid so IFI released it to the world. Record response, if you ask me!
Lynn (D) - Team Voltage 386 Software