Advanced Database Topics Assignment
Q1) a) Describe the difference between optimistic and pessimistic concurrency control mechanisms.
b) Define transaction deadlock (what is it / what causes it).
c) Why do we always have to record the log entry before we update an attribute?
d) Consider the following schedule of transactions (the beginning of rectangle is when transaction starts and end of the rectangle is when transaction commits) with checkpoints and crash point denoted on the same schedule. During recovery after crash, which transactions need to be rolled back? Which transactions need to be re-done?
Q2) a) Name at least one failure type/cause that is specific to distributed databases.
b) Under what circumstances is semi-join preferable to traditional execution of a distributed join?
3) In this homework you will use Oracle to load data and execute some SSBM queries. Please DO NOT load too much data into your DePaul CDM Oracle account. I have created Oracle accounts on my personal server that you can use to load data - instructions are included below.
You can also use your own Oracle installation if you prefer (but not the DePaul CDM account because you won't be able to load that much data). I am attaching a separate document with instructions on how to install Oracle on Windows.
a) Create the SSBM tables and load Scale1 data (please see a section below that discusses your data-loading options with Oracle).
b) Get the baseline performance for Q2.2 and Q2.3 (just the two queries) by noting down the "real" runtime and the "estimated" (EXPLAIN/F-10 in SQL Developer) query cost. Real time might vary since you'll be sharing the server, so report whatever numbers you get.
Include a screenshot of result from running one of the queries (either one)
c) Create an index for Q2.2 and report the estimated query cost using your index Include a screenshot of the resulting query plan in SQL Developer
d) Create an index usable by both Q2.2 and Q2.3 - is the index the same compared to before or different? Report the estimated query costs for Q2.2 and Q2.3.
e) Now, let's try using some materialized views. Create a materialized view that will benefit both queries - it is up to you whether you want to pre-join all columns or also preaggregate the MV. Do not use any filter (WHERE Column = 'XXX') predicates in this MV yet. Report the estimated query costs for Q2.2 and Q2.3.
f) Next, add an index to the previously created MV to improve query performance. Report estimated costs for Q2.1 and Q2.2.
g) Now create another MV that pre-filters the rows by including predicates from the original queries. Report the estimated query costs for Q2.2 and Q2.3.
h) Re-evaluate one of the queries (your choice) and add any query optimization hint. It is up to you what kind of query change you make - you can try forcing a different join, a different index or a different MV. Include a screenshot of the "before" and "after" query plan.
Attachment:- Assignment.rar