To Cache or Not to Cache

Caching data can save a computer from doing unnecessary processing work and can be a simple way to speed up your programs. There are two types of caching you could use, in-memory(RAM) and out-of-memory(Physical Storage). Storing data onto a hard drive is generally the way to go if you’re working with larger data. The downsides of storing information within a system’s RAM is that won’t always know how much is available at run-time or how it will affect the system overall. The benefits on the other hand is that it’s a fast and simple way to allocate memory for a program when you need it.

I’m not going into too much detail on how to set up a physical cache, this post was created because I wanted to run some speed tests on a simple Java program I wrote using a mapping data structure. I made two different types of Fibonacci sequence generators. I’ll mention a few other designs considerations if you want to take this a step further but this was quick program I spent about 45 minutes on. I wrote these scripts pretty quickly and didn’t completely iron out a few details I found annoying so bare with me. The first version of a Fibonacci sequence generator I made was a basic recursive version that might look familiar.

package euler;

public class Recursion extends FibonacciSequencer {
	int sequenceRange;
	public Recursion(int sequenceRange) { 
		super(); 
		this.sequenceRange = sequenceRange;
	}

	@Override
	public long findFibonacci(int i) {
		if(i <= 1)
			return 1;
		else
			return(findFibonacci(i-1) + findFibonacci(i-2));
	}
	
	@Override
	public void printSequence() {
		//I'm sorry about this
		for(int i = 0; i < sequenceRange; i++)
			if(i == (sequenceRange -1))
				System.out.print(findFibonacci(i)+".\n");
			else
				System.out.print(findFibonacci(i)+", ");
	}
}

There wasn’t a lot of code I had to write, findFibonacci() will generate the sequence up to a given range. As I mention annoying details, I didn’t store any of the data generated to print out the data. I consider about storing every found Fibonacci number into a map so you could see when that number was generated but that would quickly grow into a large data set that wouldn’t print neatly at the console. Feel free to store or print each generated value if you want to see when a number is generated. You can also break down the logic into a process tree by hand if you’re filling up to it. Word of warning, if you’re not familiar with the Fibonacci sequence, it’s a repeating pattern found in natural and grows exponentially. Also not a reason not to bread rabbits. The next version I made was to take this code to store any new Fibonacci numbers into an map and retrieve them when necessary.

package euler;

import java.util.HashMap;
import java.util.Map;

public class MemCache extends FibonacciSequencer {

	Map<Integer,Long> memCache;

	public MemCache(){
		super();
		memCache = new HashMap<Integer,Long>();
	}
	
	@Override
	public long findFibonacci(int i) {
		//Index of 0 & 1 will have a Fibbonacii number of 1.
		if(i <= 1){
			memCache.put(i,1)
			return 1;
                }
	
		// Returns cached Fibonacii number.
		else if(memCache.containsKey(i))
			return(memCache.get(i));
	
		// Adds new Fibonacii number into the cache and returns the number.
		else{
			memCache.put(i, (findFibonacci(i-1) + findFibonacci(i-2)) );
			return(memCache.get(i));
		}
	}
	
	@Override
	public void printSequence() {
		for(int i = 0; i <= memCache.size(); i++){
			System.out.print( memCache.get(i)) ; 
			
			if(i == memCache.size() )
				System.out.print(".\n");
			else
				System.out.print(", ");
		}
	}
}

If you hadn’t notice I used an abstract class called FibonacciSequencer to build these two Java classes. The objects Recursion() and MemCache() only contain the Fibonacci algorithm but they both use the testing code within the FibonacciSequencer() object. There are some unused code here but I only executed the function, trials() outside this object. My trials() method takes two arguments, one for how many trials to run and the second is the range of the Fibonacci sequence. I was pretty lazy so I print out how well the sequence generated did I made throw the output from the trials() function into the printTrials(). The next part after this class if my test harness if you want to see how I used these two functions.

public abstract class FibonacciSequencer {		
	public FibonacciSequencer(){}
	public abstract long findFibonacci(int i);
	public abstract void printSequence();

//---Some inefficient code ;(
	protected boolean isLength(int len, long fib){
		if(len >= 0 && len == findDigitLength(fib) )
			return true;
		else
			return false;		
	}
	
	protected boolean isWithinRange(long i, long x, long y){
		if(i >= x && i <= y)
			return true;
		else
			return false;
	}
	
	protected int findDigitLength(long fib) {
		int count = 0;
		
		while(fib > 0){
			count++;
			fib /= 10;
		}
		
		return count;
	}

//-----Time Trials Code
	
	public long[] trials(int numTrials, int sequenceRange){
		long[] trials = new long[numTrials];
		
		for(int i = 0; i < numTrials; i++)
			trials[i] = timeIt(System.nanoTime(), findFibonacci(sequenceRange));
		
		return(trials);
	}
	
	protected long timeIt( long time, long fib ){
		return(System.nanoTime() - time);
	}
	
	public void printTrials(long[] trials){
		long avg = 0;
		
		for(long trial : trials)
			avg += trial;
		
		avg /= trials.length;
		
		System.out.println(trials.length+" trials where exicuted.");
		System.out.println("Average time of completion was, " + nanoToMicroseconds(avg) + " microseconds.");
		System.out.println("Time of completion as nanoseconds, " + avg + ".");
		System.out.println("Time of completion as milliseconds, " + nanoToMilliseconds(avg) + ".");
	}
	
	protected long nanoToMicroseconds(long nano){
		return((nano/1000));
	}
	
	protected double nanoToMilliseconds(long nano){
		return(nano*(Math.pow(10, -6)));
	}
}

How much better was the program with the hash map?


Well, averaging the results from ten trials and finding up to the sixth index of a Fibonacci sequence generated these results.

Memory Cache Test

10 trials where executed.
Average time of completion was, 87 microseconds.
Time of completion as nanoseconds, 87637.
Time of completion as milliseconds, 0.08763699999999999.
1, 1, 2, 3, 5, 8.

Recursion Test

10 trials where executed.
Average time of completion was, 4 microseconds.
Time of completion as nanoseconds, 4274.
Time of completion as milliseconds, 0.004274.
1, 1, 2, 3, 5, 8.

Ok, the recursion function won out because I used a small Fibonacci sequence. There was some small overhead in the hash map version that slowed down program but lets try running a slightly larger sequence. The next tests found a sequence of fifty values. Let’s see how long it takes only for the hash map.

Memory Cache Test

10 trials where executed.
Average time of completion was, 92 microseconds.
Time of completion as nanoseconds, 92246.
Time of completion as milliseconds, 0.092246.

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170, 1836311903, 2971215073, 4807526976, 7778742049, 12586269025.

It only took about 92 microseconds the reach this point not too bad. Lets see if the recursion method fare any better.

Recursion Test

10 trials where executed.
Average time of completion was, 95281927 microseconds.
Time of completion as nanoseconds, 95281927647.
Time of completion as milliseconds, 95281.92764699999.

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170, 1836311903, 2971215073, 4807526976, 7778742049,

Minus that little bug at the end of the sequence the program average around 95,281 milliseconds to complete, not microseconds. As you can see the recursion method breaks down quickly once you start increasing the scale. This is pretty much end of how far I took this simple script but here are few ideas you could try implementing to improve the performance.

Store the data into a file, simple and highly efficient. This way you’re not dependent on the scale of the Fibonacci sequence and you can still incorporate in-memory caching to tackle the problem in manageable blocks. You also should use some kind of compression on the file to keep it manageable, if you don’t want to code you’re own compression algorithm you can using an existing algorithm like gZip to do the heavy lifting. Also try working with binary files rather than text string files.

Here is my test harness if you want to try running my scripts. I have some time this weekend to work on another quick and simple programming exercise. Right now I’m leaning towards writing a Monty Carlo program in Python, but it would feel a little weird doing it in any other language besides lisp. Side note, why can’t languages support an non-crippled lambda expression. ;(

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s