Text Compression in Java

I’ve made this compression program in Java after watching this lecture from Harvard’s OpenCourseWare class, Information and Entropy. My code is still incomplete but it appears to be working as it should. I also took out some of my code and only included the example to compress a single string. I chose to solve this problem with iteration rather than recursion. As it is now it’s kinda of a mess of nested while loops. Later I’m going to try to encapsulate that code into an iterator and add some extra logic to build the dictionary faster.
 
Here is the video of the lecture, you can also visit the class site for more infomation:
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-050j-information-and-entropy-spring-2008/
[youtube:https://www.youtube.com/watch?feature=player_detailpage&v=mM9m9uDlHno%5D
 

import java.util.ArrayList;
import java.util.Hashtable;

public class Compress {
	public Hashtable<Object, Integer> dict; // Encryption Dictionary
	public int code; 			// Compression Code

	public Compress(){
		dict = new Hashtable<Object, Integer>();

		dict.put( true, 256 );  // Beginning of the message
		dict.put( false, 257 ); // End of the message
		code = 257;		// New codes should start at 258
	}

	// Compresses a single string.
	public ArrayList<Integer> scan(String line){
		ArrayList<Integer> msg = new ArrayList<Integer>();		
                int i = 0; // Index pointer

		msg.add( dict.get( true ) ); // Can also be a hard coded as 256.
		
		// Traverse the string
		while( i < line.length() ){
			int r = ( i + 1 ); // Range pointer of the slice

			// Peeks next characters after the index.
			while( r <= line.length() ){
//-------------------------------------------------------------------------
// Case 1: New dictionary keys with the length of 2.
//-------------------------------------------------------------------------
//  - Conditions: The range pointer must be less than the line length. 
//    + ( r < line.length() ) 
//  - Conditions: The range pointer must be set one index ahead.
//    + ( (r - i) == 1 )
//  - Conditions: The dictionary must not contain a string of the index and following character. 
//    + !dict.containsKey( line.substring(i, ( r + 1) ) )
//-------------------------------------------------------------------------
				if( r < line.length() && (r - i) == 1 && !dict.containsKey( line.substring(i, ( r + 1) ) ) ){
					dict.put( line.substring(i, (r + 1) ), ++code );
					msg.add( (int) line.charAt(i) );
					i = r;
					break;
				}
//-------------------------------------------------------------------------
// Case 2: New dictionary keys with the length of greater than 2.
//-------------------------------------------------------------------------
//  - Conditions: The range pointer must be less than the line length. 
//    + ( r < line.length() ) 
//  - Conditions: The range pointer must be at least two positions ahead of the index.
//    + ( (r - i) > 1 )
//  - Conditions: The dictionary must not contain the slice of the string.
//    + !dict.containsKey( line.substring(i, ( r + 1) ) )
//-------------------------------------------------------------------------
				else if( r < line.length() && (r - i) > 1 && !dict.containsKey( line.substring(i, (r + 1) ) ) ){
					dict.put( line.substring(i, (r + 1) ), ++code );
					msg.add( dict.get( line.substring( i, (r - 1) ) ) );
					i = r;
					break;
				}
//-------------------------------------------------------------------------
// Case 3: End of string, last known sequence. 
//-------------------------------------------------------------------------
//  - Conditions: The range pointer must be set as the line length. 
//    + ( r == line.length() ) 
//  - Conditions: The slice of the string must contain more than a single character.
//    + ( (r - i) > 1 )
//  - Conditions: The dictionary must contain the slice of the string.
//    + dict.containsKey( line.substring(i, r)
//-------------------------------------------------------------------------
				else if( r == line.length() && ( r - i ) > 1 && dict.containsKey( line.substring(i, r) ) ){
					msg.add( dict.get( line.substring(i, r) ) );
					i = r;
					break;
				}

//-------------------------------------------------------------------------
// Case 4: End of string, unknown prior sequence.
//-------------------------------------------------------------------------
//  - Conditions: The range pointer must be set as the line length. 
//    + ( r == line.length() ) 
//  - Conditions: The index must be set at the last available character.
//    + ( (r - i) == 1 )
//-------------------------------------------------------------------------
				else if( r == line.length() && ( r - i ) == 1 ){
					msg.add( (int) line.charAt(i) );
					i = r;
					break;
				}
				// Alternative: Advances the range
				else
					r++;
			} // !while( r <= line.length() )
		} // !while( i < ( line.length() - 1 ) )

		msg.add( dict.get( false ) ); // Can also be a hard coded as 257.
		return( msg );
	} // !scan()

//-------------------------------------------------------------------------
// Debuggers
//-------------------------------------------------------------------------
	// Prints contents of dictionary
	public void debugDict(){
		for(Object key : dict.keySet() ){
			if( dict.get(key) > 257 )
				System.out.println(dict.get(key) +" : " + key);
		}
	}

	// Prints ArrayList<Integer>
	public void debugMsg( ArrayList<Integer> msg ){
		for( int i : msg ){
			
			// Single chars
			if( i <= 255 )
				System.out.println( i +" -> "+ (char) i);
			
			// Sliced strings
			else {
				for(Object key : dict.keySet() ){
					if( i == dict.get(key) && i > 255 ){
						System.out.println( i + " -> " + key );
						break;
					}
				}
			}
		}
	}
}
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s