View Single Post
  #12   Spotlight this post!  
Unread 02-02-2008, 15:15
dcbrown dcbrown is offline
Registered User
AKA: Bud
no team
Team Role: Mentor
 
Join Date: Jan 2005
Rookie Year: 2005
Location: Hollis,NH
Posts: 236
dcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud of
Wink Re: Team 166 Has Open Sourced Its Code

Quote:
Compilers, not programmers, should do optimizations
Correct. Too bad we don't have a decent optimizing compiler available.

But is it better to have code that reads:

Code:
    for (samples_recieved=0; samples_recieved<10; samples_recieved++)
    {
           :
Or code?

Code:
    for (samples_left_to_receive=10; --samples_left_to_recieve !=0; )
    {
Both do equivalent things, the best optimizing compiler won't turn one into the other and yet the 2nd is much more efficient in that the change in control variable also will set status/results for testing and avoid a second comparison operation. With the PIC18F, the above is 3x+ times more efficient due to a programmer choosing a different but equivalent method.

Does this really matter? In most cases, no. But if the code happens to be executing at interrupt level or is executed thousands of time per second then generally yes it does matter. A few changes like this usually can reduce the interrupt code footprint to 1/2 to 1/3 of what it was. Typically I scrutinize only about 10% of the code I write for efficiency, yet the run-time impact can be huge.

Knowing the underlying architecture of a processor you're working on also can impact which code constructs you choose to utilize. Yeah, it would be great if the compiler figured out this stuff for you... but often it doesn't or like above can't. On the PIC, for example, computing ram addresses isn't exactly efficient. In some cases if makes sense to unroll "for(...)" loops. For example:

Code:
    for(ndx=0; ndx<10; ndx++)
    {
          sample[ndx] = 0;
    }
is terribly inefficient. It takes 14-16 instructions to calculate the ram address plus the overhead of ~13 instructions to manage the loop variable. Choosing to unroll the loop in the above case into the following is a huge win as the code execution is on average 10x times faster with little change in code size.

Code:
    sample[0] = 0;
    sample[1] = 0;
    :
    sample[9] = 0;
Anyway my point is a programmer should care about how code constructs map to the underlying architecture. If they didn't, then the way to go would be to just change everything to extended floating point precision and do all calculations that way -- the "smarts" in the compiler would figure out when we needed integers vs floats and 8 bit values vs 32 bit values and hide all that nonsense from the programmer.

PS
Another common trick/practice on the PIC when dealing with loading data from h/w such as:

Code:
379:               	timer_count = TMR1H;
380:               	timer_count <<= 8;
381:               	timer_count += TMR1L;
382:               	timer_count -= offset;

 06730    50CF     MOVF 0xfcf, W, ACCESS
 06732    6F17     MOVWF 0x17, BANKED
 06734    6B18     CLRF 0x18, BANKED

 06736    C517     MOVFF 0x517, 0x518
 06738    F518     NOP
 0673A    6B17     CLRF 0x17, BANKED

 0673C    50CE     MOVF 0xfce, W, ACCESS
 0673E    6E2B     MOVWF 0x2b, ACCESS
 06740    6A2C     CLRF 0x2c, ACCESS
 06742    502B     MOVF 0x2b, W, ACCESS
 06744    2717     ADDWF 0x17, F, BANKED
 06746    502C     MOVF 0x2c, W, ACCESS
 06748    2318     ADDWFC 0x18, F, BANKED

 0674A    5119     MOVF 0x19, W, BANKED
 0674C    5F17     SUBWF 0x17, F, BANKED
 0674E    511A     MOVF 0x1a, W, BANKED
 06750    5B18     SUBWFB 0x18, F, BANKED
A simple change can result in much more efficient code by using an anonymous union structure:

Code:
typedef union u_U16
{
	unsigned int data;
	struct {
		unsigned char b0;
		unsigned char b1;
	};
} u_U16;

u_U16 timer_drift;


384:               	timer_drift.b1 = TMR1H;
385:               	timer_drift.b0 = TMR1L;
386:               	timer_drift.data -= offset;

 06752    CFCF     MOVFF 0xfcf, 0x516
 06754    F516     NOP

 06756    CFCE     MOVFF 0xfce, 0x515
 06758    F515     NOP

 0675A    5119     MOVF 0x19, W, BANKED
 0675C    5F15     SUBWF 0x15, F, BANKED
 0675E    511A     MOVF 0x1a, W, BANKED
 06760    5B16     SUBWFB 0x16, F, BANKED
Another quick 3x increase in code size and execution efficiency.