|
Re: paper: Low Latency Interrupt Framework
Another update to the document, V0.6.
I added a brief section on how a device driver writer would make something for the framework.
I've changed how I'm invoking the user registered device drivers. I got rid of the ram function tables becuase it seemed wasteful of a limited resource. I'm setting up to use program memory and flash it as needed. A block of program memory is set aside for each of the 20 interrupts in the framework. When a device driver is bound to a particular interrupt, the function routine address is flashed into the program memory jump table as part of setup. The next time that interrupt is invoked, the supplied device driver is branched to via the dispatch table. This had a couple positives and negatives.
The negative is that you don't want to be dynamically changing which drivers are running attached to which interrupts. The good news is this shouldn't happen. A robot's configuration is pretty stable while running (no one running along side swaping io wires). Although it is possible you'd like to share an io pin with a couple different drivers, the best way of handling something like that is to write your own software mux routine to multiplex between the two drivers. It's just something that seems a bit odd-ish so I didn't worry about it too much. There are ways of programming around it within the custom drivers you'd need anyway. Another negative is the programmable jump table code sucks up chunks of program space. I thought about reserving a flash block for each interrupt per service layer but thats something like 2k instructions with the 64 byte flash blocks of the 8722. But no matter how you work the issue, a chunk of program memory gets used. Currently a little under 1k bytes of program space gets used for the dispatch tables. The code to setup and flash the jump table entries is a bit ummm interesting, which is also a negative. The time it requires to flash the tables is in the ms range, but since this only has be to done once per code image the overhead isn't too bad. The 2nd time the flash is done to bind a driver to an interrupt the code finds the appropriate address already there so doesn't have to do anything. Yeah, it boarders on self-modifying code but what the heck you only live once.
The positives of the program jump tables are improved latency and reduced ram resources. The jump tables bring the framework back toward being more like compile-time bindings. The three PCLAT registers don't need to be saved either since the code no longer does jumps through ram. This saves context time within the ISR and every bit helps (a total of 12 instruction cycles). The jump code is also cleaner code wise with a few less instruction cycles - only 1 pipeline break vs the three in using ram function pointers. I'm holding back on doing the final step of putting the jump tables in-line within the ISR proper. That would save 2 instruction cycles but overly complicate the flash code. With these changes the maximum execution path is just under 100 cycles. The average ISR time was about 20 cycles less than that. So, for common interrupts the latency time is under 10usec which was my original goal.
Along the way I found the compiler being "helpful" and taking logic that was optimally laid out and compiling it so it took 2.5x longer than it should by reorganizing the code in program memory. I haven't found which optimization did it, but sticking in asm nop turned it all off and generated the code expected. I keep looking at coding the main chunk of the physical hardware ISR layer in assembly. Its not that big and I'm getting tired of fighting the compiler.
Bud
|