Consider the instance of the Sailors relation shown in Table 1.
1. Write SQL queries to compute the average rating, using AVG; the sum of the ratings, using SUM; and the number of ratings, using COUNT.
2. If you divide the sum computed above by the count, would the result be the same as the average? How would your answer change if the above steps were carried out with respect to the age field instead of rating?
sid
18
|
sname
jones
|
rating
3
|
age
30.0
|
41
|
jonah
|
6
|
56.0
|
22
|
ahab
|
7
|
44.0
|
63
|
moby
|
null
|
15.0
|
Table 1: An Instance of Sailors
3. Consider the following query: Find the names of sailors with a higher rating than all sailors with age <>21. The following two SQL queries attempt to obtain the answer to this question. Do they both compute the result? If not, explain why. Under what conditions would they compute the same result?
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS ( SELECT *
FROM Sailors S2
WHERE S2.age 21
AND S.rating = S2.rating )
SELECT *
FROM Sailors S
WHERE S.rating > ANY ( SELECT S2.rating
FROM Sailors S2
WHERE S2.age <>21 )
4. Consider the instance of Sailors shown in Figure 5.1. Let us define instance S1 of Sailors to consist of the first two tuples, instance S2 to be the last two tuples, and S to be the given instance.
(a) Show the left outer join of S with itself, with the join condition being sid=sid.
(b) Show the right outer join of S with itself, with the join condition being sid=sid.
(c) Show the full outer join of S with itself, with the join condition being sid=sid.
(d) Show the left outer join of S1 with S2, with the join condition being sid=sid.
(e) Show the right outer join of S1 with S2, with the join condition being sid=sid. (f) Show the full outer join of S1 with S2, with the join condition being sid=sid.
|
sid
18
|
sname
jones
|
rating
3
|
age
30.0
|
sid
18
|
sname
jones
|
rating
3
|
age
30.0
|
4. (a)
|
41
|
jonah
|
6
|
56.0
|
41
|
jonah
|
6
|
56.0
|
|
22
|
ahab
|
7
|
44.0
|
22
|
ahab
|
7
|
44.0
|
|
63
|
moby
|
null
|
15.0
|
63
|
moby
|
null
|
15.0
|
1. SELECT AVG (S.rating) AS AVERAGE
FROM Sailors S
SELECT SUM (S.rating)
FROM Sailors S
SELECT COUNT (S.rating)
FROM Sailors S
2. The result using SUM and COUNT would be smaller than the result using AV- ERAGE if there are tuples with rating = NULL. This is because all the aggregate operators, except for COUNT, ignore NULL values. So the first approach would compute the average over all tuples while the second approach would compute the average over all tuples with non-NULL rating values. However, if the aggregation is done on the age field, the answers using both approaches would be the same since the age field does not take NULL values.
3. Only the first query is correct. The second query returns the names of sailors with a higher rating than at least one sailor with age <>21. Note that the answer to the second query does not necessarily contain the answer to the first query. In particular, if all the sailors are at least 21 years old, the second query will return an empty set while the first query will return all the sailors. This is because the NOT EXISTS predicate in the first query will evaluate totrue if its subquery evaluates to an empty set, while the ANY predicate in the second query will evaluate to false if its subquery evaluates to an empty set. The two queries give the same results if and only if one of the following two conditions hold:
- The Sailors relation is empty, or
- There is at least one sailor with age > 21 in the Sailors relation, and for every sailor s, either s has a higher rating than all sailors under 21 or s has a rating no higher than all sailors under 21.