Detailed Question: This is a java programming assignment, all problems including graduate problems must be completed.
Part- Programming Problems
Overview
In Problem Set, you began to implement a simple relational DBMS by taking the code base that we gave you and adding support for INSERT commands and for SELECT * commands involving a single table.
In the current assignment, you will implement whatever additional functionality is needed to support SELECT commands that include projections and/or multiple tables. Adding this functionality is a nontrivial task, so please begin work as soon as possible.
Getting Started
You will not need any additional files for this part of the project. You should do your work in the same directory that you used for Problem Set, and you should continue to consult the resources provided in that assignment.
Problems for this Assignment
Adding support for a crossproduct iterator
In order to execute a SELECT command involving more than one table, your DBMS needs to be able to iterate over the tuples in the cross product (Cartesian product) of the tables in the FROM clause. In this problem, you will add support for an iterator that does just that by implementing the methods of the CrossIterator class. Like the table iterator that you implemented in Problem Set 2, you will be able to associate a WHERE clause with your crossproduct iterator, in which case it will only visit rows that satisfy the WHERE clause. Your crossproduct iterator should maintain a collection of table iterators, one for each table in the Cartesian product, and it should use the table iterators to iterate over the rows in the individual tables.
Your crossproduct iterator will be similar to the iterator for nestedloop join outlined in the lecture notes on the logical tophysical mapping.
To understand how your iterator should work, it helps to consider a concrete example. Let's say that we want to iterate over the full Cartesian product of the following three tables:
table R
table S
table T
f
|
g
|
"about"
|
"above"
|
"before"
|
"behind"
|
To get the first row of the Cartesian product, we position the table iterators on the first row of each table:
a |
b |
c |
d |
e |
f |
g |
1 |
2 |
10 |
20 |
30 |
about |
above |
To get the next row, we advance only the iterator for the last table (table T):
1
|
2
|
10
|
20
|
30
|
"before"
|
"behind"
|
Next, we try to advance T's iterator again, but we receive a return value telling us that there are no additional rows in T. Thus, we need to reset T's iterator to the first row and advance the iterator for the second table (table S):
1
|
2
|
40
|
50
|
60
|
"about"
|
"above"
|
To get the next row, we advance the iterator for the last table (table T). Because we get a return value indicating success, we don't advance the iterators of the other tables:
1
|
2
|
40
|
50
|
60
|
"before"
|
"behind"
|
Next, we try to advance T's iterator again, but we receive a return value telling us that there are no additional rows in T. Thus, we reset T's iterator to the first row and try to advance the iterator for the secondtolast table (table S). We again get a return value telling us that there are no additional rows, so we reset S's iterator and advance the iterator for R, which succeeds.
3
|
4
|
10
|
20
|
30
|
"about"
|
"above"
|
This process continues until there are no remaining tuples in the Cartesian product. It's worth noting that in each of the cases above, we started with the iterator for the rightmost table and worked from right to left as needed. If we were able to advance a given table's iterator, we didn't need to advance the iterators of the tables to its left. However, if there were no additional rows in a table, we reset its iterator to the first row and then tried to advance the iterator of the table to its left.
Below are descriptions of each of the methods that you will need to implement for this problem. In addition, you should read the comments that appear before each of the methods in the code file.
a. You should begin by implementing the CrossIterator constructor, which should initialize all of the object's fields. We have given you the following suggested set of fields:
tableIter, which should hold a reference to an array of table iterators, one for each table in the FROM clause of the SQL statement that is passed in as an argument. You may assume that all of the tables have already been opened, but you will need to create the TableIterator object for each table. Note that you should create table iterators that do not pay attention to the WHERE clause, because you will be testing the WHERE clause at the level of the cross product, rather than at the level of the individual tables. Review the TableIterator constructor to see how to specify that the WHERE clause should be ignored.
columns, which should hold a reference to an array of Column objects, one for each column in the Cartesian product. columns[i] should be a Column object for the ith column in the Cartesian product, where column 0 is the leftmost column. Note that you should not need to create any Column objects. Rather, you should create an array of type Column, and store in it references to the Column objects that are obtained from the Table objects associated with the FROM clause.
where, which should hold a reference to an object that represents the WHERE clause for the statement with which the iterator is associated. If there is no WHERE clause in the statement, your constructor should set the where field to an instance of the TrueExpression class; its isTrue() method always returns true. This will allow your next() method to always invoke where.isTrue() to determine if the iterator should visit the current tuple.
numTuples, which will be used to keep track of the number of tuples in the result of the SELECT command associated with this CrossIterator
b. close(): This method should simply close all of the table iterators used by the crossproduct iterator.
c. next(): This method should advance the iterator to the next tuple from the Cartesian product that satisfies the WHERE clause. The lecture notes on the logicaltophysical mapping outline what this method would look like for an iterator that performs a nestedloop join of only two tables. You will need to extend the logic to allow this method to work for an arbitrary number of tables. To do so, you should use the example provided above as a model for what your code will need to do. Use some sort of iteration construct (e.g., a for loop) to iterate over the underlying table iterators, starting from the rightmost table's iterator and working left as needed. When it is no longer necessary to go any further left, you can break out of the loop to stop the iteration. Note that a given execution of next() may need to need to repeat this righttoleft iteration until it reaches a tuple from the Cartesian product that satisfies the WHERE clause. If the method succeeds in finding an appropriate tuple, it should increment the count of the number of tuples and return true; if there are no remaining tuples that satisfy the WHERE clause, the method should return false. Note that because we are using the table iterators to iterate over the rows in each table, this method will not need to invoke any Berkeley DB methods itself.
d. getColumn(i): This method should return a Column object representing the ith column in the cross product, where column 0 is the leftmost column. In our example above, column 2 of the cross product would be column c from table S. If you set up the columns array correctly in the constructor, this method should be very simple!
e. getColumnVal(i): This method should return the value of the ith column in the tuple from the Cartesian product on which the iterator is currently positioned. This method does not need to include any new unmarshalling code. Instead, you should find a way to make use of the TableIterator.getColumnVal() method that you wrote for Problem Set 2 either by calling it directly on the appropriate TableIterator for the ith column, or by calling another method that calls getColumnVal() for you.
7. Adding support for SELECT * commands with multiple tables
Once you have implemented the CrossIterator class, you should modify the execute() method in the SelectStatement class so that it correctly handles SELECT * commands with multiple tables. The revised method should still be able to handle cases in which there is only one table in the FROM clause.
Notes:
In order to take advantage of polymorphism, the variable that you use for the iterator should be of type RelationIterator. That will allow you to assign to it a reference to either a TableIterator (for SELECT * commands with only one table) or a CrossIterator (for SELECT * commands with multiple tables). Having done so, a command like
iter.printAll(System.out);
will invoke the appropriate method for the type of iterator to which iter refers.
All tables should be opened before you construct the TableIterator or CrossIterator. This will allow you to check that all of the tables actually exist, and to handle cases of nonexistent tables. In such cases, you should throw an exception with an appropriate message, as we did in Problem Set 2.
In addition to handling tables that do not exist, you should also check for column names in the WHERE clause that do not belong to one of the tables in the FROM clause. Here again, you should throw an exception as needed.
You do not need to check for any other problematic cases. For example, you may assume that column names are unambiguous i.e., that they are prepended with the corresponding table name as needed.
8. Adding support for a basic projection iterator
In this problem, you will implement a third type of relation iterator: a projector iterator. A projection iterator will use either a table iterator or crossproduct iterator to iterate over the tuples in the underlying table or Cartesian product, but it will also provide the added functionality needed to support projections. In particular, it will allow us to obtain the columns in the projection in the order in which they are specified in the SELECT clause.
Below are descriptions of each of the methods that you will need to implement for this problem. In addition, you should read the comments that appear before each of the methods in the code file.
a. You should begin by implementing the ProjectionIterator constructor, which takes two parameters: a reference to the object representing the SELECT statement, and a reference to the TableIterator or CrossIterator that will be used to iterate over the underlying relation. This underlying relation is referred to as the "subrelation" of the projection iterator. It is the relation to which the projection is applied. You should store a reference to the subrelation iterator in the field called subrel so that the ProjectionIterator methods can use it as needed. You will also need to initialize the remaining fields of the object. We have given you the following suggested additional fields:
columns, which should hold a reference to an array of Column objects, one for each column in the projection. columns[i] should be a Column object for the ith column in the projection, where column 0 is the leftmost column. Note that you should not need to create any Column objects. Rather, you should create an array of type Column, and store in it references to the appropriate Column objects from the object that represents the SELECT statement.
checkDistinct, which you should set to false because we are not attempting to support SELECT DISTINCT queries in this problem.
numTuples, which will be used to keep track of the number of tuples in the result of the SELECT command associated with this ProjectionIterator.
b. close(): This method should simply close the subrelation iterator.
c. next(): This method should advance the projection iterator to the next tuple in the projection. This mainly entails advancing the underlying subrelation iterator, although you should also update the count of the number of tuples visited by the iterator when appropriate.
d. getColumn(i): This method should return a Column object representing the ith column in the projection, where column 0 is the leftmost column. Here again, if you set up the columns array correctly in the constructor, this method should be very simple!
e. getColumnVal(i): This method should return the value of the ith column in the projection tuple on which the iterator is currently positioned. As with the CrossIterator version of this method, you will not need to implement any new unmarshalling code for this method.
In your work on this problem, it is helpful to remember that the code that we gave you for the TableIterator constructor already connects each column in the SELECT clause to the corresponding table iterator. These connections should simplify your projectioniterator code. See the TableIterator constructor for more detail.
9. Adding support for SELECT commands with projections
Once you have implemented the ProjectionIterator class, you should modify the execute() method in the SelectStatement class so that it correctly handles SELECT commands with columns in the SELECT clause. The revised method should still be able to handle the types of SELECT commands that were supported in earlier versions of the method.
Notes:
You should continue to take advantage of polymorphism by using a variable of type RelationIterator for the iterator. That will allow you to assign to it a reference to either a TableIterator (for SELECT * commands with only one table), a CrossIterator (for SELECT * commands with multiple tables), or a ProjectionIterator (for SELECT commands with columns in the SELECT clause). Remember that a ProjectionIterator takes either a TableIterator or a CrossIterator as a "subrelation" iterator.
In addition to performing the types of errorchecking that are already supported, you should also check for column names in the SELECT clause that do not belong to one of the tables in the FROM clause. Here again, you should throw an exception as needed.
You may assume that all column names in the SELECT clause are unambiguous (i.e., that they are prepended with the corresponding table name as needed).
10. Gradcredit problem: Adding support for SELECT DISTINCT commands (10 points; required of gradcredit students; may be completed by other students for "partial" extra credit)
In this problem, you will extend the projection iterator so that it eliminates duplicates from the projection if the user specifies the DISTINCT keyword in the SELECT command. (Note: you do not need to handle SELECT DISTINCT * commands i.e., SELECT DISTINCT commands without a projection.)
There are different techniques that can be used to eliminate duplicate tuples from the results of a projection. The one that we recommend involves creating an inmemory BDB database and using it to detect duplicate tuples. Every time that the ProjectionIterator.next() method advances to the next tuple in the projection, you should concatenate all of the values in the tuple to create a single value that can serve as the key of a key/data pair; the corresponding data item should be empty. You should then attempt to add this key/data pair to the inmemory database, and you should specify that the new key/data pair shouldn't overwrite any existing key/data pair. If the BDB method used to add the key/data pair returns an error value indicating that there is an existing item with the same key, that means that the current tuple is a duplicate and should be skipped. In such cases, the next() method should advance to the next tuple and try again.
Notes:
You should modify the ProjectionIterator constructor to perform whatever additional initialization is needed to support duplicate elimination. For example, you should set the checkDistinct field to either true or false depending on whether the user included the word DISTINCT in the SELECT clause. There is a method in the SelectStatement class that you can use to determine whether this is the case.
Review the Berkeley DB Database, DatabaseEntry, and TupleOutput classes, and the Berkeley DB methods used to create a new database. You may want to consult the code in CreateStatement.execute() for a model of how to create a database. You can specify that you want a memoryonly database by using null values for both the fileName and databaseName parameters of the Environment.openDatabase() method.
Although the detection and elimination of duplicates should happen in the context of the ProjectionIterator.next() method, we encourage you to put the bulk of the new code in one or more separate methods that are invoked as needed. Make sure that you add comments explaining the purpose of each new method that you add.
Attachment:- dbms_source.zip