Question 1. Consider the dataset in the table below:
10 |
Beer, Nuts, Diapers
|
20 |
Beer, Coffee, Diapers
|
30 |
Beer, Diapers, Eggs, Milk
|
40 |
Nuts, Eggs, Milk
|
50 |
Beer, Coffee, Milk
|
60 |
Diapers, Eggs, Milk
|
70 |
Beer, Coffee, Diapers, Eggs
|
80 |
Beer, Nuts, Coffee, Diapers, Eggs, Milk
|
and the itemsets with minimum support of 3: {Beer, Diapers, Eggs}.
Considering a minimum confidence threshold of 7.5%, which of the following association ri as strong? (Select all that apply)
a) {Eggs, Beer} -> Diapers
b) {Diapers, Eggs} -> Beer
c) {Diapers, Beer} -> Eggs
d) Beer -> { Diapers, Eggs}
e) Diapers -> {Eggs, Beer}
f) Eggs -> {Diapers, Beer}
Question 2.
Given the decision tree in the image, which of the following are rules extracted from it? (Select all that apply)
a IF Age<=30 AND Income=low THEN )
Buys_computer=no;
b) IF Age=31.,40 AND Income=medium THEN
Buys_computer=yes;
c) IF Age=.31..40 THEN Buys_computer=yes;
d) IF Age<=30 AND Credit_rating =excellent THEN
Buys_computer=yes;
e) IF Age>40 AND Credit_rating=excellent THEN
Buys_computer=no;
f) IF Age<=30 AND Student=yes THEN
Buys_computer=no;
Question 3
Consider a cube defined on the following dimension hierarchies:
{customer < customer_city < customer_state} {supplier < supplier_city < supplier_state} {product < product_group}.
Which of the following are possible cuboids of this cube?
a) (customer , customer_state, supplier, product)
b) (customer , customer_city F customer_state)
c) (customer_state, supplier, product)
d) all answers are correct.
e) (customer_state, supplier, supplier_state, product, product_group)
Question 4
By applying the Apriori algorithm to the dataset in the table below:
TID
|
Items
|
10 |
Beer, Nuts, Diapers
|
20 |
Beer, Coffee, Diapers
|
30 |
Beer, Diapers, Eggs, Milk
|
40 |
Nuts, Eggs, Milk
|
50 |
Beer, Coffee, Milk
|
60 |
Diapers, Eggs, Milk
|
70 |
Beer, Coffee, Diapers
|
80 |
Beer, Nuts, Coffee, Diapers, Eggs, Milk
|
where the minimum support for frequent patterns set at 3, the set of three items frequent itemsets, L3 is:
a) L3 = {Beer, Diapers, Milk}
b) L3 = {Beer, Diapers, Milk}, {Beer, Diapers, Eggs}
c) L3 = {Diapers, Eggs, Milk}
d) L3 = {Beer, Coffee, Diapers}, {Beer,. Diapers, Eggs}
e) L3 = {Diapers, Eggs, Milk}, { Beer, Coffee, Diapers
Question 5
Given the training set below:
Age
|
Income
|
Student |
Credit_rating
|
Buy s_computer
|
<=30
|
high
|
no |
fair
|
no |
<=30
|
high
|
no |
excellent
|
no |
31_40
|
high
|
no |
fair
|
no |
>40
|
medium
|
no |
fair
|
yes |
<=30
|
low
|
no |
fair
|
yes |
>40
|
high
|
no |
fair
|
no |
>40
|
low
|
yes |
fair
|
yes |
>40
|
low
|
yes |
excellent
|
no |
31...40
|
low
|
yes |
excellent
|
yes |
<=30
|
medium
|
no |
fair
|
no |
<=30
|
low
|
yes |
fair
|
yes |
>40
|
medium
|
yes |
fair
|
yes |
<=30
|
medium
|
yes |
excellent
|
no |
31...40
|
medium
|
no |
excellent
|
no |
31_40
|
high
|
yes |
fair
|
yes |
>40
|
medium
|
no |
excellent
|
yes |
The information gain for attribute Student is:
a) 0.066
b) 0.863
c) 0.918
d) 0.053
e) 0.106
Question 6
Given the dataset in the table below:
TID
|
Items
|
10 |
Beer, Nuts, Diapers
|
20 |
Beer, Coffee, Diapers
|
30 |
Beer, Diapers, Eggs, Milk
|
40 |
Nuts, Eggs, Milk
|
50 |
Beer, Coffee, Milk
|
60 |
Diapers, Eggs, Milk
|
70 |
Beer, Coffee, Diapers
|
80 |
Beer, Nuts, Coffee, Diapers, Eggs, Milk
|
What is the support and confidence of the rule: Nuts -> Beer?
a) [support=25%, confidence=40%]
b) [support=37.5%, confidence=75%]
c) [support=25%, confidence=66.66%]
d) [support=37.5%, confidence=60%]
Question 7
Consider the following snowflake database schema:
Tb_Supplier(Supp_ID,Name, City_ID)
Tb_Consumer(Con_ID, Name, City_ID)
Tb_Cities(City ID, City_Name)
Tb_Product(Prod_ID, Name, MU)
Tb_Trarisactions(T_ID_ Supp_ID, Con_ID,ProLID. Quantity, Price)
where Tb_Supplier, Tb_Consumer Tb_Cities, Tb_States, Tb_Product are dimension tables. Tb_Transactions is a measures table.
Given the query:
"What are the sales of suppliers from Madison to consumer in Toronto?"
which of the candidate cuboids below is best fitted to compute the query?
a) (supplier, consumer-city, product)
b) (supplier, consumer, product)
c) (consumer, product)
d) (supplier-city, consumer, product)
Estimated time to complete:
#1
Based on the tables in the database given by the description below:
Tb_Supplier(Supp_ID, Name, City, State)
Tb_Consumer(Con_ID, Name, City, State)
Tb_Product(Prod_ID, Name, Product_Category, Product_Line, Product_Packaging)
Tb_Offers(Supp_ID, Prod_ID, Quantity, Price)
Tb_Requests(Con_ID, Prod_ID, Quantity, Price)
Tb_Transactions(Tran_ID, Supp_ID, Con_ID, Prod_ID, Quantity, Price) use SQL with GROUP BY, CUBE and ROLLUP to create a cube with the following characteristics:
The dimensions of the cube are: Tb_Supplier and Tb_Product. Measure groups table is: Tb_Offers.
Measure aggregates: SUM(Quantity), SUM(Quantity*Price), MAX(Price) , MIN(Price).
Dimension hierarchies:
Tb_Supplier: State > City > Name
Tb_Product: Product_Packaging > Name
Product_Category > Product_Line > Name
b) Given the cube created at point a) solve the following queries using SQL:
1. Value of products offered by supplier and by product packaging?
2. Volume of milk offered by each supplier in Wisconsin?
3. Find the maximum price for each product offered in Madison?
4. For each supplier city find the product offered in largest quantity?
5. For each product find the city where it is offered at the lowest price?
a Based on the tables in the database given by the description below:
Tb_Supplier(Supp_ID, Name, City, State)
Tb_Consumer(Con_ID, Name, City, State)
Tb_Product(Prod_ID, Name, Product_Category, Product_Line, Product_Packaging)
Tb_Offers(Supp_ID, Prod_ID, Quantity, Price)
Tb_Requests(Con_ID, Prod_ID, Quantity, Price)
Tb_Transactions(Tran_ID, Supp_ID, Con_ID, Prod_ID, Quantity, Price) use SQL with GROUP BY, CUBE and ROLLUP to create a cube with the following characteristics:
The dimensions of the cube are: Tb_Consumer and Tb_Product. Measure groups table is: Tb_Transactions.
Measure aggregates: SUM(Quantity), SUM(Quantity*Price), MAX(Price) , MIN(Price).
Dimension hierarchies:
Tb_Consumer: State > City > Name
Tb_Product: Product_Packaging > Name
Product_Category > Product_Line > Name
b) Given the cube created at point a) solve the following queries using SQL:
1. Value of products purchased by consumer and by product?
2. Volume of gas purchased by each consumer in Wausau?
3. Find the minimum purchase price for each product sold in Wausau?
4. For each consumer find the cheapest product he/she purchased?
5. Name of all consumers and volume of oil and milk each purchased (columns: consumer name, total quantity of oil - 0 if none, total quantity of milk - 0 if none)?