Extracting Individual bits in C

Since I didn’t know how to do this, I thought some people might want to know.

First off, the code:


int some_var=5;         /* the variable we will be extracting a bit from. */
int n=3;                /* the position of the bit we want */

the_bit = (( some_var & (1 << (n-1) ) ) ? 1 : 0 ); 

That’s it!
Yes folks, a left shift, some binary logic, and the ternary operator, all in one line! looks impressive, huh? (Does anyone know if the PIC supports the ternary operator? if not, it’s simple to convert to an if-else, but this is so clean! )

Now, some explanation:

We’ll start by talking about the << operator.
This shifts whatever is to it’s left of to the left the number of places given by the expression on it’s right. In the code above, it shifts 1 left by 2 (n-1 = 3-1 = 2) places, filling spaces on the end with zeroes. This gives us 00000001 => 00000100 (assuming an 8 bit int, didn’t check how large an int on the PIC is, but it doesn’t matter for the point of this demonstration) An expression like this is known as a mask, and can be used with the & operator to test a single bit.

Now you see that we compare the mask (1 << (n-1) ) with the variable to be tested using the binary bitwise and operator (&). It checks each bit place in each string against each other. Any place where both digits are 1 evaluate to 1. All other places evaluate to zero.

So, what we are really looking at in the example is
(00000101 & 00000100)
Which will evaluate to
(00000100)
Now, we plug this value (4) back into our ternary expression.

Time for a quick lesson in ternary! The ternary operator ? looks at the expression that precedes it. If the expression is true, the expression following the operator is evaluated. If it is false, then it ignores code until it reaches a : . It then procedes to evaluate whatever follows the colon.

In our example, we get an expression that looks like
4 ? 1 : 0;

Four is non-zero, so it evaluates to true. This means that the statement evaluates to 1. So, in our example,


int some_var=5;         /* the variable we will be extracting a bit from. */
int n=3;                /* the position of the bit we want */

the_bit = (( some_var & (1 << (n-1) ) ) ? 1 : 0 );
printf("the_bit: %d", the_bit);

OUTPUT: 
the_bit: 1

A slightly more generalized look at the whole ternary statement now:

the_bit = (( some_var & (1 << (n-1) ) ) ? 1 : 0 );

First we create a mask with a 1 in the place of the bit we want to find and zeroes elsewhere. The zeroes elsewhere mean that the & on those bits will always evaluate to zero. The real comparison, therefore, only takes place on one bit, somevar.bit[n]. If this bit is one, then the expression returns a non-zero value, which is true, which then returns one from the ternary. If this is false, then the whole thing works out to zero, or false, so the ternary evaluates to 0.

And there you have it folks, how to fetch bits in c.

You could round out the set by providing macros to set a bit value and clear a bit value. This would be useful if you have a byte with individual flag bits. var is a value and bit is the bit position you’re working with. Bit 0 is the least significant bit (on the far right-hand side).


//get a bit from a variable
#define GETBIT(var, bit)	(((var) >> (bit)) & 1)

//set a bit to 1
#define SETBIT(var, bit)	var |= (1 << (bit))

//set a bit to 0
#define CLRBIT(var, bit)	var &= (~(1 << (bit)))

Dave had a good point, added parens around arguments. Left the parens out of SETBIT and CLRBIT since var needs to be a left hand argument. Will leave the conversion to functions to you all, but logic is the same.

We have a union to match the PBASIC BYTE variable type that is bit-addressable. Its not portable but works with the PIC controllers. You can pull out the 8-bit value using the byte field or get a single bit using b0 - b7. Makes a handy way to track a set of flags.


//byte, addressable bits
//size: 8 bits (1 byte)
//range: 0 to 255
typedef union {
    struct {
        unsigned b0:1;
        unsigned b1:1;
        unsigned b2:1;
        unsigned b3:1;
        unsigned b4:1;
        unsigned b5:1;
        unsigned b6:1;
        unsigned b7:1;
    };
    uchar byte;
} byte;

</edit>

Sean

You have a good suggestion to make these macros. They seem to work to perfection, however, there is a common error that you made.

When writing macros, it is a good idea to surround your arguments with parenthesis in the definition.

Let’s walk through this example

#define PI 3.14
#define CIRCLE_AREA(r) (PI * (r) * (r))
area = CIRCLE_AREA(4)

expands to

area = 3.14 * (4) * (4)

Often times, though, you may have an expression that you’re passing to the macro

area = CIRCLE_AREA(i+2)

which would expand to

area = 3.14 * (i+2) * (i+2)

Now let’s see what happens when we remove the parenthesis in the macro

#define CIRCLE_AREA(r) (PI * r * r)
area = CIRCLE_AREA(i+2)

expands to

area = 3.14 * i + 2 * i + 2

The order of operations that C follows causes the result to be clearly different from what you would have expected. It would evaluate as if it were grouped like this

area = (3.14 * i) + (2 * i) + 2

The moral of the story is that the parenthesis around parameters in the macro definition to make it more portable. If I see a macro that calculates the area of a circle, I don’t want to have to double check it to see how the order of operations will play out. Same goes for the parenthesis around the entire expression. For the most part, you will have no problems, but it never hurts to be safe.

On a side note, I try to use functions instead of macros when efficiency is less critical or when the routine to be run is complex.

Now, I don’t want to get into a whole debate about macros and functions, because each has their place. I find that it is easier to debug functions than macros because you can step through them in a debugger.

I think that you’ve got a slight problem here: I suspect that the expression where (bit) is needs to be replaced with (bit - 1)

Example: Try to access the 3rd bit of 20. 20 = 00010100
00010100 >> 3 == 00000010, but
00010100 >> 2 == 00000101, yielding the correct digit in the correct place.
However, without the ternary operation here, you are likely to return a number of non-binary answers: as you just saw, running GETBIT (20, 3) would yield 2 as you have it written, or five after the (bit-1) correction.

One other correction: The comments you used are C++ style, and not valid in C. watch out!

I believe that corrected macos follow:


/* get a bit from a variable*/
#define GETBIT(var, bit)	(( var & (1 << (bit-1) ) ) ? 1 : 0 )

/* set a bit to 1 */
#define SETBIT(var, bit)	(var |= (1 << (bit-1)))

/* set a bit to 0 */
#define CLRBIT(var, bit)	(var &= (~(1 << (bit-1))))

The original macro is correct, you’re just not used to thinking like a computer person; when counting, always start with 0. Bit 0 is the least significant and bit 7 is the most.

Also, GETBIT cannot return 2 or 5, it can only return 0 or 1 since 'AND’ing anything with 1 will get rid of all but the least significant bit.

Hey, can anyone give me an idea about the relatative effeciency of the underlying assembly language code for using the shifting macros vs. using the union/structures method?

I suppose that most times the code ends up more or less identical but I wonder on this particular CPU with this particular compiler, if there is a major advantage one way or the other.

My main reason for asking is that I am dreaming about 100 million things I want to do in the interrupt service routine (ISR), 88 million of them involve such bit piddling, not being one who likes to stack up ISR upon ISR upon ISR, I am concerned about effeciency of the code that does these things.

Your comments welcome…

Joe J.

oops, I missed the signifigance of the parenthases around the shift operation… You’re right! It should work as you wrote it.

Well, lets throw a sample together and see what we get:

First, the union method:


typedef struct bits {
	unsigned char b0:1;
	unsigned char b1:1;
	unsigned char b2:1;
	unsigned char b3:1;
	unsigned char b4:1;
	unsigned char b5:1;
	unsigned char b6:1;
	unsigned char b7:1;
	} BITS;

typedef union bit_char {
	unsigned char byte;
	BITS b;
	} BIT_CHAR;

main ()
{
	BIT_CHAR test1;
	test1.byte = 5;
	test1.b.b2 = 1;
}

And the corresponding assembly:


000802   cfd9     MOVFF     0xfd9,0xfe6                                                 
000806   cfe1     MOVFF     0xfe1,0xfd9                                                   
00080a   52e6     MOVF      0xe6,0x1,0x0           
00080c   0e05     MOVLW     0x5
00080e   6edf     MOVWF     0xdf,0x0    
000810   84df     BSF       0xdf,0x2,0x0
000812   52e5     MOVF      0xe5,0x1,0x0  
000814   52e5     MOVF      0xe5,0x1,0x0
000816   cfe7     MOVFF     0xfe7,0xfd9                                                  
00081a   0012     RETURN    0x0                                                              

And the macro version (using bit-shifting)


//get a bit from a variable
#define GETBIT(var, bit)(((var) >> (bit)) & 1)

//set a bit to 1
#define SETBIT(var, bit)var |= (1 << (bit))

//set a bit to 0
#define CLRBIT(var, bit)var &= (~(1 << (bit)))
main ()
{
	char test1;
	test1 = 5;
	SETBIT(test1,2);
}

ASM:


000802   cfd9     MOVFF     0xfd9,0xfe6    
000806   cfe1     MOVFF     0xfe1,0xfd9 
00080a   52e6     MOVF      0xe6,0x1,0x0
00080c   0e05     MOVLW     0x5
00080e   6edf     MOVWF     0xdf,0x0  
000810   84df     BSF       0xdf,0x2,0x0
000812   52e5     MOVF      0xe5,0x1,0x0
000814   52e5     MOVF      0xe5,0x1,0x0  
000816   cfe7     MOVFF     0xfe7,0xfd9   
00081a   0012     RETURN    0x0    


So it appears that the compiler optimizes both methods to the same ASM code. Therefore I’d say that which to use is a matter of personal preference.