Assignment: Algorithm Design & Analysis
You have learned the data structure of red-black tree, which is a height balanced binary search tree. In this assignment, you will implement this data structure using a certain programming language based on the course survey during the first lecture. You will also compare the performance of your implementation (of the red-black tree) with a library class which implements a red-black tree. You are required to write a report about the result and running time of your program. You need to submit your source file(s) to Blackboard, and your report on the due day in class. See requirements below.
1. Your program runs on the Linux server (hopper/turing) at the Computer Science Department.
2. Implementation of the data structure -- Provide at least the following public member methods for the red-black tree class:
Insertion: insert a value in the tree. If the value already exists in the tree, this operation has no effect on the tree.
Search : return true if a given value exists in the tree; otherwise false. Height : return the height of the tree. The height of an empty tree is -1.
Size: return the number of elements in the tree. The size of an empty tree is 0.
1) It is best to design a system of base and derived classes of binary tree (Height, Size), binary search tree (Search, Binary-Search-Tree-Insertion), and red-black tree (Red-Black-Tree-Insertion) classes. But it is not required to do so. You will also need to design your Node class(es) accordingly.
2) You may find it relatively easy to implement most of above methods in a recursive fashion.
3. The library class that implements a red-black tree: C++ user: STL set
Java user: java.util.TreeSet
Python user: There is no built-in library class in python. You may need to install a separate package. Please contact the instructor if you want to switch to another programming language.
4. About the program:
i) The driver program takes two command line arguments in form of "[N=n] [L=Y/N]". The first argument specifies the number of elements to be inserted in the tree, i.e. the number of integers. You will use values 100, 1000, 10000, 100000, 1000000, ... etc. to test your program. The second argument specifies whether to use the library implementation of the red black tree (e.g. STL set in C++). ‘Y' means yes; ‘N' means no. "[]" indicates an argument is optional. By default, N is 1000, L is ‘N'. For example, invoking the following command
./your-program N=100000 L=Y
will generate 100000 integers to be added in the tree provided by the library. And the command
./your-program
will generate 1000 integers to be stored in the tree that you implement. I assumed a C++ program in above examples. If yours is a Java or Python program, you need to use proper command.
ii) Preparation of your input data: For the command line argument N, produce two collections of data, each has N random integers in the range of [1, 2N]. These two collections of random integers need to be generated by different seed values of the random generator (e.g. srand() and rand() for C++). One collection of integers (referred to as tree-data) is to be inserted into the red black tree. One collection of integers (referred to as search-data) is to be searched in the red black tree. Note that you will get the same sequence of random numbers given a same seed value for the random number generator. You want to use the same seed value for all tree-data, and the same seed value for all search-data to compare the performance of your implementation and the library class of the red-black tree.
iii) Implementation of the program:
a. Produce the tree-data and search-data described above;
b. Create an instance of the red-black-tree (your own implementation or the library class, depending on the command line argument);
c. Insert all tree-data to the red-black-tree;
d. Print out the height of the tree and the size of the tree. Since there may be duplicated values from the random generator, this value of size may be less than your input N. If the class library does not support certain operation, you can skip it. For example, the STL set does not have public method to display height. Do not print out the values of whole tree. (But you can print out these values during testing and debugging stage of your work.)
e. Search each value from the search-data in the red-black-tree, count the number of successful searches, and divide it by the total number of searches. Print the percentage of successful searches.
iv) When running your program, use the time command to record time spent on the program. E.g. time ./your-program N=100000
Again, this example assumes a C++ program. If yours is a Java or Python program, you need to use a proper command to run your program. When N is small, your program will finish in a very short time. You can gradually increase your input size to test your program. If your program has run for over a few seconds, kill the program and stop increasing the size.
5. Report: Summarize your discovery about the data structure you have implemented in respect to the relationship between the height and the problem size, the relationship between time cost and problem size, and compare the result and performance of your implementation and the library class. Your report needs to be typewritten. The next page contains a form you can use.