Suppose we have a stream of tuples with the schema
Assume universities are unique, but a course ID is unique only within a university (i.e., different universities may have different courses with the same ID, e.g., "CS101") and likewise, student ID's are unique only within a university (different universities may assign the same ID to different students). Suppose we want to answer certain queries approximately from a 1/20th sample of the data. For each of the queries below, indicate how you would construct the sample. That is, tell what the key attributes should be.
(a) For each university, estimate the average number of students in a course.
(b) Estimate the fraction of students who have a GPA of 3.5 or more.
(c) Estimate the fraction of courses where at least half the students got "A."