Advanced Database Topics Assignment
This will give you a chance to apply everything you learned in the class. We will use two different databases (Postgres and Oracle are easiest because you have access to them already and you even have the data loaded). You can use MySQL, but that would require additional guess regarding the estimated cost or using real runtime. Of course, you are welcome to install or use your own (relational) database engine if you want.
In the interest of finishing quickly, we will not run "real" runtimes for the queries and focus on explain costs.
Baseline results -
Start this part by collecting the baseline performance (estimated query runtimes with no additional structures in the database) for all 13 queries in the SSBM workload - we will use these numbers as are reference point to estimate improvement. If you have any indexes from previous assignments, drop them first.
You should have estimated numbers (for MySQL you can use real runtime or your own estimate) for all 13 queries for two different databases. You need to submit all 26 estimated cost numbers - but only have to submit one query screenshot from each database showing the EXPLAIN plan output.
Please be sure to specify which database / version / computer hardware you are using.
You should provide an answer for each database you chose. That answer can be the same for both databases, but only if you verified that it works in the second database. Your index/MV should improve the query cost (that's the point, after all). However, if your suggested index/MV does not get picked up, do not worry about it. If that happens, be sure to describe what you have attempted and why you think your index was appropriate.
Please do not forget to include the SQL code for index creation (and query rewrite for MVs in Postgres) for any of your answers.
1: Indexes
A. Create a secondary index for Q1.2
B. Create a secondary index for Q2.2
C. Create a secondary index for Q3.2
D. Create a secondary index for Q4.3 (Include a screenshot of EXPLAIN here for both databases in addition to the estimate number)
E. Create a secondary index for Flight3 (Q3.1, Q3.2, Q3.3, Q3.4). Report the explain costs for all 3 queries.
F. Create a clustered index for Q3.2. Remember that in Oracle you have to create an IOT (Index Organized Table) and cannot re-cluster an existing table like you can in Postgres. Include a screenshot here in addition to reporting the estimate EXPLAIN cost.
G. Create a clustered index for Flight2 (Q2.1, Q2.2, Q2.3).
2: Materialized Views
A. Create a materialized view (no pre-filtering, that is only GROUP BY and JOIN is allowed, but no predicates in the MV) for Q1.3
B. Create a materialized view (no pre-filtering, that is only GROUP BY and JOIN is allowed, but no filter predicates may be used) for Q3.4
C. Create a materialized view for Flight4 (Q4.1, Q4.2, Q4.3)
D. Add an index to your MV answer in 2-C and re-evaluate all three queries (submit a screenshot of explain for one of the three queries for each database you are using)
3: Database Physical Design
Using the structures you have already created, put together a physical design, specifying the size of the combined structures and the estimated cost for all 13 queries. For example, if you choose an index from 1-A and an MV from 2-C, you would report the size of these structures and the (estimated EXPLAIN) runtimes that can be achieved if both of them were used. Do this for each database -- you do not need to re-run anything for this.
Which DBMS achieved a better (size-to-improvement) ratio?
4: Further database optimization
For one database, evaluate the benefits of compression for any one previous structure. I.e., compress a structure and check for cost difference that results. In Oracle, it is as simple as rightclick on the structure (table/mv/etc), then choose "Storage -> Compress..."
5:
Use your code from previous homework assignments, create an automatic index builder. That is, given a query such as what you parsed in HW1, generate the CREATE INDEX AutoIndex ON TABLE ... SQL code that can be pasted into a database. I suggest assuming simplified queries (simple equality or range predicates only) but using your query parser and statistics to estimate selectivity. You can assume that all selectivities are independent, so you would create an index for columns as long as product of their selectivies is low enough (e.g., lo_discount 0.1, lo_quantity 0.05, is 0.005 which is low enough and would produce CREATE INDEX AutoIndex on Lineorder(lo_quantity, lo_discount);).
Attachment:- Assignment File.rar