Go to Post You shouldn't have to apologize at all for wanting to have fun. This is supposed to be fun. You shouldn't gauge your level of enjoyment at the event based on your place in the standings. - Koko Ed [more]
Home
Go Back   Chief Delphi > Technical > Programming
CD-Media   CD-Spy  
portal register members calendar search Today's Posts Mark Forums Read FAQ rules

 
Closed Thread
 
Thread Tools Rate Thread Display Modes
  #1   Spotlight this post!  
Unread 21-01-2007, 22:25
DonRotolo's Avatar
DonRotolo DonRotolo is offline
Back to humble
FRC #0832
Team Role: Mentor
 
Join Date: Jan 2005
Rookie Year: 2005
Location: Atlanta GA
Posts: 6,998
DonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond repute
CPU Load in FRC RC

It has been years - no, decades - since my last programming class. (We used punchcards, if that's a clue...).

My question has to do with the relative load on the CPU for the various programmatical operations that can be performed. I was always under the impression that an IF statement took a lot more CPU time than a series of 3 or 4 simple additions - but a thread near here made me re-think that belief.

So, can anyone explain to me the load on the CPU of the various operations, particularly Math (integer and quasi-floating [bit shifts?], if there's any difference), Logical comparisons (<=), Conditionals, and I/O statements (and what else did I miss?).

Also, my assumption is that if I do a certain calculation, it takes as long to do it in a 'subroutine' (or whetever we call them today - a called function?) as it would in the main programming loop. (Ignore proper programming practice, I'm only worried about CPU load)

Not looking for exact numbers, a relative comparison would be fine. Even better would be a reference where I could just read on it myself...

Thanks,
Don
__________________

I am N2IRZ - What's your callsign?
  #2   Spotlight this post!  
Unread 21-01-2007, 22:32
X-Istence X-Istence is offline
Melt the RC controller!
AKA: Bert JW Regeer
no team
Team Role: Alumni
 
Join Date: Jan 2006
Rookie Year: 2006
Location: Montville
Posts: 151
X-Istence will become famous soon enoughX-Istence will become famous soon enough
Send a message via AIM to X-Istence Send a message via MSN to X-Istence
Re: CPU Load in FRC RC

Sure, certain operations take a certain amount of tics to complete, however none of the operations you have mentioned would require more than it could handle, not even close.

I have no references, but each operation takes a certain amount of assembly, which requires a certain amount of CPU tics to complete.

Is there any specific reason as to why you are asking for this information? I personally do not see a reason to worry about how many tics a certain operation takes to succeed.

One thing you mentioned was if it would take longer in a subroutine (functions these days), or in the main loop, both would take the same time. Calling a function however does take some more tics than having all your code in the main loop, however it is so VERY little, it is negligible, and for the sake of being able to read your own code it is better you keep it in there.
__________________
My Blog!
  #3   Spotlight this post!  
Unread 21-01-2007, 22:38
Kevin Sevcik's Avatar
Kevin Sevcik Kevin Sevcik is offline
(Insert witty comment here)
FRC #0057 (The Leopards)
Team Role: Mentor
 
Join Date: Jun 2001
Rookie Year: 1998
Location: Houston, Texas
Posts: 3,673
Kevin Sevcik has a reputation beyond reputeKevin Sevcik has a reputation beyond reputeKevin Sevcik has a reputation beyond reputeKevin Sevcik has a reputation beyond reputeKevin Sevcik has a reputation beyond reputeKevin Sevcik has a reputation beyond reputeKevin Sevcik has a reputation beyond reputeKevin Sevcik has a reputation beyond reputeKevin Sevcik has a reputation beyond reputeKevin Sevcik has a reputation beyond reputeKevin Sevcik has a reputation beyond repute
Send a message via AIM to Kevin Sevcik Send a message via Yahoo to Kevin Sevcik
Re: CPU Load in FRC RC

Alright, at the very least, I can state that IF statements won't cost much programming time. The PIC isn't a pipelined processor, so you don't lose time from a branch, so it's not particularly costly. Secondly, doing a calculation in a function/subroutine incurs a fairly significant processing cost versus not doing so. This is mostly because when you call a function, the program has to build a stack saving the current processor state and possibly passing arguments so the instructions in the function can operate in a clean environment. It's not a huge hit, something on the order of 10-20 instructions depending on things, but it does take time.

Edit: X-Istence, computation time can be very very important if you're coding an interrupt service routine or other bits of code where you want things to happen very very quickly. Calling 20 different functions in an interrupt just to keep your code clean isn't a good idea at all. If you want to use functions to keep code clean, I think you can make them inline functions. An inline function basically tells the compiler to take the function and paste it into where your function is called. I'm not positive the C18 compiler supports them out of the box, but I'll look into it.
__________________
The difficult we do today; the impossible we do tomorrow. Miracles by appointment only.

Lone Star Regional Troubleshooter

Last edited by Kevin Sevcik : 21-01-2007 at 22:45.
  #4   Spotlight this post!  
Unread 21-01-2007, 23:31
Joe Ross's Avatar Unsung FIRST Hero
Joe Ross Joe Ross is offline
Registered User
FRC #0330 (Beachbots)
Team Role: Engineer
 
Join Date: Jun 2001
Rookie Year: 1997
Location: Los Angeles, CA
Posts: 8,567
Joe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond reputeJoe Ross has a reputation beyond repute
Re: CPU Load in FRC RC

You can look at the listing file to see exactly how many assembly instructions different C operations take.

Last edited by Joe Ross : 21-01-2007 at 23:53.
  #5   Spotlight this post!  
Unread 21-01-2007, 23:50
Alan Anderson's Avatar
Alan Anderson Alan Anderson is offline
Software Architect
FRC #0045 (TechnoKats)
Team Role: Mentor
 
Join Date: Feb 2004
Rookie Year: 2004
Location: Kokomo, Indiana
Posts: 9,113
Alan Anderson has a reputation beyond reputeAlan Anderson has a reputation beyond reputeAlan Anderson has a reputation beyond reputeAlan Anderson has a reputation beyond reputeAlan Anderson has a reputation beyond reputeAlan Anderson has a reputation beyond reputeAlan Anderson has a reputation beyond reputeAlan Anderson has a reputation beyond reputeAlan Anderson has a reputation beyond reputeAlan Anderson has a reputation beyond reputeAlan Anderson has a reputation beyond repute
Re: CPU Load in FRC RC

Quote:
Originally Posted by Don Rotolo View Post
...My question has to do with the relative load on the CPU for the various programmatical operations that can be performed...
The PIC processor is designed so that every operation takes one clock cycle. Branch instructions are an exception; they take two. But the architecture of the PIC CPU is unusual in that read/write data and program code are not in the same memory space, and the ALU has some odd restrictions on where the results end up. This means that what looks like a simple operation in C might end up being a simple operation on the PIC -- or it might end up being a page of assembly language to implement. Floating point arithmetic is particularly costly, as the PIC ALU doesn't support floating point in hardware.

If you're interested in the relative cost in program space (which translates almost directly to execution time) for various operations, I suggest you try compiling a program which uses those operations and then inspect the listing file to see how the compiler translates them into PIC assembly language.
  #6   Spotlight this post!  
Unread 22-01-2007, 12:40
Mike Bortfeldt Mike Bortfeldt is offline
Registered User
FRC #1126 (& 1511)
Team Role: Mentor
 
Join Date: Oct 2004
Rookie Year: 2004
Location: Rochester, NY
Posts: 119
Mike Bortfeldt has much to be proud ofMike Bortfeldt has much to be proud ofMike Bortfeldt has much to be proud ofMike Bortfeldt has much to be proud ofMike Bortfeldt has much to be proud ofMike Bortfeldt has much to be proud ofMike Bortfeldt has much to be proud ofMike Bortfeldt has much to be proud of
Re: CPU Load in FRC RC

Don,

As Alan mentioned, branching instructions by themselves only take two instruction clock cycles (an instruction clock is 4 clock cycles, so the 40mhz PIC processor in reality only has a 10mhz instruction clock. I will be strictly talking about instruction clock cycles in this note). It's the instructions that have to be done within the IF statement that take most of the time, not the actual branching itself. In general, addition & subtraction operations all occur in-line, that is, the actual assembly code is generated separately for each add/sub operation in the routine the operation is found and is quick. For more complex operations (multiplication, division, trig, all floating point operations), the operands are passed to a math subroutine (function) that will perform the actual operation and the result returned to the calling routine. The source code for the math routines can be found in the mcc18 directory if you selected the appropriate option during the original install of the software. Some of these routines actually have min/max/mean clock cycles to execute. Based on a couple of observations, these times do not include the time to copy your arguments to the math variables (minimum 2 clock cycles per byte), or the result back into another variable. It also doesn't include the necessary CALL/RETURN (4 clock cycles). Here are some samples - the math operation only, not the call/return or the passing of the arguments:

unsigned char * unsigned char = 6 clock cycles
signed short * signed short = 35 clock cycles
signed short / signed short = 85 clock cycles average (min 28, max 149)
signed long / signed short = 376 clock cycles average (min 84, max 421)
any floating point multiply/divide = 1835 clock cycles average
any floating point addition/subtraction = 80 clock cycles average

It should be noted that the floating point information came from the C18 compiler version 2.2 and may not be the same in version 2.4 (I believe they changed their floating point storage format between these two versions). Trig routines generally will have multiple floating point operations. I assume around 6 multiply/divide for lack of a better number (from a quick scan of the source – I may be way off base). Based on that assumption, a SINE call could consume upwards of 11,000 clock cycles (0.11% cpu). Doing one of these operations in the main loop of your code (approximately 38 times per second) would result in over 400,000 clock cycles per second (>4% cpu) - very costly.

I'm not sure if this answered your question or not, but hopefully it helped.

Mike
  #7   Spotlight this post!  
Unread 22-01-2007, 14:28
dcbrown dcbrown is offline
Registered User
AKA: Bud
no team
Team Role: Mentor
 
Join Date: Jan 2005
Rookie Year: 2005
Location: Hollis,NH
Posts: 236
dcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud ofdcbrown has much to be proud of
Re: CPU Load in FRC RC

Thought I'd add some pictures from the PIC16 architecture guide. It defines the two stage pipeline used within the processor with each instruction having 4 clock cyles they name Q1..Q4. You can see in the diagram that the fetched instruction after a branch/call has to be flushed since it is from the wrong pc.

Quote:
The clock input (from OSC1) is internally divided by four to generate four non-overlapping quadrature clocks, namely Q1, Q2, Q3, and Q4. Internally, the program counter (PC) is incremented every Q1, and the instruction is fetched from the program memory and latched into the instruction register in Q4. The instruction is decoded and executed during the following Q1 through Q4.
Quote:
An “Instruction Cycle” consists of four Q cycles (Q1, Q2, Q3, and Q4). Fetch takes one instruction cycle while decode and execute takes another instruction cycle. However, due to Pipelining, each instruction effectively executes in one cycle. If an instruction causes the program counter to change (e.g. GOTO ) then an extra cycle is required to complete the instruction.

The instruction fetch begins with the program counter incrementing in Q1. In the execution cycle, the fetched instruction is latched into the “Instruction Register (IR)” in cycle Q1.

This instruction is then decoded and executed during the Q2, Q3, and Q4 cycles. Data memory is read during Q2 (operand read) and written during Q4 (destination write). The diagram shows the operation of the two stage pipeline for the instruction sequence shown.

At time TCY0, the first instruction is fetched from program memory. During TCY1, the first instruction executes while the second instruction is fetched. During TCY2, the second instruction executes while the third instruction is fetched. During TCY3, the fourth instruction is fetched while the third instruction (CALL SUB_1) is executed. When the third instruction completes execution, the CPU forces the address of instruction four onto the Stack and then changes the Program Counter (PC) to the address of SUB_1. This means that the instruction that was fetched during TCY3 needs to be “flushed” from the pipeline. During TCY4, instruction four is flushed (executed as a NOP) and
the instruction at address SUB_1 is fetched. Finally during TCY5, instruction five is executed and the instruction at address SUB_1+1 is fetched.
So, PC is incremented and latched in Q1, the next instruction fetch is initiated during Q1. The returned instruction from program memory is then latched in Q4 for execution during the next instruction cycle. If the current instruction changes the PC by writing new data in Q3 as with a branch or call then that new PC won't show up to be used until the following instruction cycle's Q1 period but the next instruction fetch is already under way... so flush the next instruction by executing it as a no-operation.
Attached Thumbnails
Click image for larger version

Name:	Pipeline.JPG
Views:	34
Size:	24.4 KB
ID:	4917  Click image for larger version

Name:	Clocks.JPG
Views:	36
Size:	25.8 KB
ID:	4918  

Last edited by dcbrown : 22-01-2007 at 14:30.
  #8   Spotlight this post!  
Unread 22-01-2007, 21:12
DonRotolo's Avatar
DonRotolo DonRotolo is offline
Back to humble
FRC #0832
Team Role: Mentor
 
Join Date: Jan 2005
Rookie Year: 2005
Location: Atlanta GA
Posts: 6,998
DonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond reputeDonRotolo has a reputation beyond repute
Re: CPU Load in FRC RC

OK, thanks for the details. Different from what I was thinking, and so very helpful.

The reason for the question is to help my programmers generate more efficient code. As everyone knows, there's more than one way to skin a cat in software, and so some careful choices can bring us faster execution. This isn't a really significant concern - not yet, at least - because we're nowheres near the liit of code space or CPU cycles in the loop. But, with some of the things being planned, we may come close to one, the other, or both - and I want to be prepared when we do.

Part of this whole exercise is to let kids understand what 'efficient code' means. Again, coming out of the punch card era, where a 1 MHz processor and 100 MB of online disk were very large mainframe characteristics, today's bloated software, while nifty, is misleading. I showed a kid a copy of TinyEd, a work processor for DOS that's something like 6k of .EXE. He didn't believe it, how could a word processor be only 6k big? Compare that to MS Word.

Thanks again, I have enough information and references to move forward on my own.

Don
__________________

I am N2IRZ - What's your callsign?
Closed Thread


Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
CPU loop speed Windward Programming 6 07-02-2007 15:22
**FIRST EMAIL**/Championship Updates - Load In and Load Out and Finale Ticket Inform AmyPrib FIRST E-Mail Blast Archive 3 21-04-2006 11:08
corrupt CPU? Windward Programming 3 14-01-2006 15:33
best cpu _GP_ Technical Discussion 28 24-04-2004 21:15
cpu ivanslost Programming 1 15-02-2003 23:23


All times are GMT -5. The time now is 04:04.

The Chief Delphi Forums are sponsored by Innovation First International, Inc.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi