Advanced Database Topics Assignment
1) Recall that an equi-width histogram splits the value range into X equal ranges and fills in each bucket with a count of values within each particular range. An equi-height histogram adjusts the bucket sizes in such a way that every bucket contains the exact same number of values.
Given the following data: [1, 5, 6, 7, 8, 12, 28, 29, 30, 36, 37, 39, 42, 50]
a) Construct an equi-width histogram (with 3 buckets).
You can report your answers in text notation, e.g., ranges and counts like this: {1-10}: 5, {11-20}: 12, {21-30}: 2
b) Construct an equi-height histogram (also with 3 buckets).
2) Consider the following histogram that represents Hours column
a) What is the answer to SELECT AVG(Hours) FROM SleepTable?
b) What is the answer to SELECT COUNT(Hours) FROM SleepTable?
c) What is the answer to SELECT COUNT(*) WHERE Hours = 4?
d) What is the answer to SELECT COUNT(*) WHERE Hours BETWEEN 6 and 9? (BETWEEN is inclusive).
3) Using MySQL DBMS, download and load SSBM benchmark and execute and time some of the queries. You can use any MySQL installation that you wish. I am going to provide instructions on how to create an account with Amazon EC2, but you can use your own setup in Linux/Windows/Mac. I am also including instructions on how to set up MySQL in Linux below (assuming Amazon Linux) - and Windows MySQL installation is very straightforward process if that's what you choose to do.
Here are the queries: https://rasinsrv07.cstcis.cti.depaul.edu/CSC553/SSBM_queries.sql
a) Time and report how long it takes to populate each one of the tables with data. (MySQL should be reporting the timing of each command - in Linux you can also precede each of your commands with "time" if you follow my load instructions below).
b) Time the running of SSBM Q1.1
c) Repeat the timing of Q1.1 again - was the runtime similar to part-b) or not? Why or why not?
d) Time queries Q1.2 and Q1.3 (once) and report their runtimes.
e) What is the selectivity of the following predicates:
i) Q1.1 lo_discount between 1 and 3
ii) Q1.1 lo_quantity < 25
iii) Q1.1 d_year = 1993
iv) Q1.2 d_yearmonth = 'Jan1994'
v) Q1.2 lo_quantity between 36 and 40
f) Create and evaluate (based on runtime) an index for Q1.1. Include a screenshot for this part.
g) Create and evaluate an index for Q1.2
h) Create and evaluate an index for Q1.3
i) Drop all previously created indexes and create a new "shared" index that will work for all 3 queries (Q1.1, Q1.2, Q1.3). Time all of the 3 queries with this new index.
You can verify that the index is being used by running EXPLAIN [Q1.1 SQL]
4) Write code that will read queries from a .sql file (assume semicolon-separated queries), connect to MySQL database and run each query reporting the time it took. I will post some example code on how to connect to MySQL installation in python. If you are using another language, let me know, and I'll see about posting examples for that.
Attachment:- Assignment File.rar