CSIS 3103 - Lab 3 Algorithm Analysis
Sept 29, 2010


You should work in pairs for this lab.

Objectives:

This exercise involves several algorithms for generating random permutations of the integers in 1..N. For example, when N = 5, [3, 1, 5, 4, 2] and [1, 5, 2, 4, 3] are legal permutations, but [4, 2, 1, 5, 1] is not because the integer 1 is duplicated and 3 is missing.

The files Permutations.java and PermutationTester.java contain most of the code for the exercise. Create a new Java project in Eclipse and load these files into it.

The Random class has methods for generating pseudo-random numbers. The method nextInt(n) returns a random integer in 0..n-1.

Tracing in debug mode

Open the PermutationTester class in an editor window. Double-click in the left margin next to the statement p.algorithm1(a);

This should display a blue dot next to the statement, which marks a "break point". When run in debug mode, the program will pause at this line. Click the bug icon to start execution of the program. The program will stop at the breakpoint. Eclipse will also change the the "Debug perspective" that shows panel with a list of all visible variables and their values in the upper right corner.

1) What values are stored in array perm?

Click the Step Over icon to execute the method call.

2) What values are stored in array perm now? It is not correct according to the problem specifications described above. What is wrong?

The error is in the statement inside the do-while loop in the algorithm1 method of the Permutations class. Fix the error (check the Javadoc for the nextInt method if necessary) and run the tester in debug mode again to make sure it is now correct.

Add two more statements to the main method to also run algorithm2 and algorithm3. Run the program in debug mode again and execution will pause at the breakpoint.

3) Advance one statement and write down the contents of array perm after algorithm1 executes. Do the same for algorithm2 and algorithm3.

Measuring Algorithm Performance

The most significant factor affecting the performance of these algorithms is the number of loop interations performed. The Permutations class has a variable stepCounter that will be used to count the loop iterations.

Algorithm 1

This algorithm fills the array as follows. To fill a[i], generate random integers until you get one that is not already in a[0..i-1]. The algorithm works correctly, but does not count the number of steps it performs. In this case, there are several nested loops, and adding a statement to increment stepCounter at the innermost point in the loops is what's needed. Notice that there is a call to a helper method which must be taken into account when looking for the deepest nested loop.

Algorithm 2

This is similar to Algorithm 1, but has an array that keeps track of which numbers have been used. When a random number r is put in array a, set used[r] = true. In this way, when filling a[i] you only need to check one position in the used array to determine if it is a new number or not.

Once again, there are nested loops that need to be considered when determining where to increment stepCounter. The loop that initializes the used array also needs to be counted by stepCounter.

Algorithm 3

This algorithm fills the array with all the numbers in order, and then "shuffles" them by swapping random pairs. Be sure to count the steps performed by both loops in stepCounter.

Gather some data

We're ready to see how efficient each of these algorithms is. Add a class with a main method to the project. Then copy the following to the main method:

Permutations p = new Permutations();
int reps = 5;
int[] sizes = {125,250,500,1000,2000,4000,8000,16000}; 
for (int i = 0; i < sizes.length; i++) { 
   long totalSteps = 0;
   double start = System.currentTimeMillis(); 
   for (int j = 0; j < reps; j++) { 
      int[] data = new int[sizes[i]]; 
      p.algorithm1(data); 
      totalSteps += p.getStepCounter();
   } 
   double stop = System.currentTimeMillis(); 
   System.out.println("" + sizes[i] + ", " + totalSteps/reps + ", " +(stop - start)/reps);
}

The experiment will be repeated 5 times. The sizes array defines how many numbers will be generated in each permutation. In this case we'll have 125 numbers, then 250, then 500, etc.

We'll be taking two measurements for each experiment - the count of the steps performed by the algorithm, and the elapsed time in milliseconds. Each experiment is repeated 5 times and the average taken to get more consistent results. The size of the permutation, the average steps, and the average time is printed after each experiment.

Run the program. The output should look something like this:

125, 35758, 0.6
250, 152667, 0.4
500, 932254, 2.0
  etc...

Copy and paste the output into a plain text file.

Change the sizes to the following:

int[] sizes = {12500, 25000, 50000, 100000, 200000, 400000,800000,1600000};

Also change the permutation method call to algorithm2. Then run the program again, and paste the output into the same text file as in the previous step. Note that the array sizes for this algorithm are a lot bigger than the previous one.

Finally, change the sizes to this:

int[] sizes = {50000, 100000, 200000, 400000, 800000, 1600000, 3200000, 6400000};

Also change the permutation method call to algorithm3. Run the program and paste the output into the text file. Again notice the increase in array sizes.

Graphing the results

Now save the text file with a name that ends with .csv. This is a "comma separated values" file. Open this file in a spreadsheet such as Excel.

Although we have both counts of steps taken by the algorithms and the elapsed times, we will not be using the times for the analysis that follows.

Select the cells in the first two columns (size and step count data) for algorithm1. Make a  Scatter chart (XY graph) with markers only (no lines).

Right click a data point on the chart and Add Trendline... to help visualize the results. Choose the type of trendline that looks like it will best fit your data. Studying the nature of the loops can be helpful when selecting a good trendline.

Format the chart to display the equation for the trend line and the R-squared value. The R-squared value  should be very close to 1.0. If it is not, then your choice of trendline isn't a very good fit and you should select a different one.

Repeat the above to draw graphs of the results from the other two algorithms.

You can do any other formatting on the graphs if you want.

Hand-in:

A. Written answers to questions 1), 2), and 3) above.

B. For each algorithm, hand in

  1. The table of output data
  2. The scatter chart
  3. The type of trendline you chose

C. Based on the experimental results, rank the algorithms in order according to their efficiency (be careful to notice that each uses different sizes which will have an effect on their performance).

D. Which algorithm ranks worst based on how efficiently it uses memory?