Bit hacks

I recently started watching Harvard’s OpenCourseWare Performance Engineering of Software Systems class. The course was taught by Saman Amarasinghe, and Charles Leiserson, which should make it a worth going through. The second lecture was on bit hacks and here are my notes from the lecture. My notes are pretty sparse but feel free watch the lecture and check out Sean Anderson’s Bit Twiddling hacks at http://graphics.stanford.edu/~seander/bithacks.html

[youtube:https://www.youtube.com/watch?feature=player_embedded&v=xc9DDSbf0NQ%5D

Value Swapping

x = x ^ y
y = y ^ x
x = x ^ y
  • XOR is it’s own inverse
  • Poor performance compared to the swapping with a temporary symbol.

Minimum of two integers( x and y)

Common inefficient method

r = (x < y) ? x : y
  • This is slow, the compiler may not be able to avoid the unpredictable branch.
r = y ^ ((x ^ y) & -(x < y))
  • Finds the minimum value without a branch.

Modular Addition

Common inefficient method

(x + y) % n
  • Expensive, unless by a power of 2

Or

r = x + y
r = (z<n) ? z : z-n
  • Expensive, unpredictable branch

Better yet,

z = x + y
r = z - (n & -(z >= n))

Round up to a power of 2

--n
n |= n >> 1
n |= n >> 2
n |= n >> 4
n |= n >> 8
n |= n >> 16
n |= n >> 32
++n

Least significant bit

r = x & (-x)

DeBrijn sequence

const unit65_t deBrijn = 0x022fdd63cc953986d
const unsigned int convert[64] = [ 0, 1 ,2, 53, 3, 7, 54, 27, 4, 38, 41, 8, 34, 55, 48, 28, 62, 5, 39, 46, 44, 42, 22, 9, 24, 35, 59, 56, 49, 18, 29, 11, 63, 52, 6, 26, 37, 40, 33, 47, 61, 45, 43, 21, 23, 58, 17, 10, 51, 25, 36, 32, 60, 20, 57, 16, 50, 31, 19, 15, 30, 14, 13, 12 ]

r = convert[(x*deBrujin) >> 58]

Population Count

Problem: Count the number of 1 bits in a word x

for (r=0; x!= 0; ++r)
  x &= x - 1;
  • Repeatedly eliminate the least-significant bit
  • Fast if the population count is small, but in worst case, the running time is proportional to the number of bits in the word.

Population Count II

static const int count[256] = {0,1,1,2,1,2,2,3,1,...,8}; // #1's in index

for (r=0; x!=0; x>>=8)
  r += count[x & 0xFF];
  • Memory operations are much more costly than register operations:
    – Register: 1 cycle(6 ops issued per cycle per core)
    – L1-cache: 4 cycles (per 64-byte cache line)
    – L2-cache: 10 cycles (per 64-byte cache line)
    – L3-cache: 50 cycles (per 64-byte cache line)
    – DRAM: 150 cycles (per 64-byte cache line)

Population Count III – Parallel divide and conquer

// Create masks
B5 = (-1) ^ ((-1) << 32);
B4 = B5 ^ (B5 << 16);
B3 = B4 ^ (B4 << 8);
B2 = B3 ^ (B3 << 4);
B1 = B2 ^ (B2 << 2);
B0 = B1 ^ (B1 << 1);

// Compute popcount
x = ((x >> 1) & B0) + (x & B0);
x = ((x >> 2) & B1) + (x & B1);
x = ((x >> 4) + x) & B2;
x = ((x >> 8) + x) & B3;
x = ((x >> 16) + x) & B4;
x = ((x >> 32) + x) & B5;

Queens Problem

Problem: Place n queens on a n x n chessboard so that no queen attacks another, i.e, no two queens in any row, column, or diagonal.


Strategy: Try placing queens row by row. If you can’t place a queen in a row, backtrack.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s