Write the answer under each question.
Questions:
1- Consider the data in the following table:
CID
|
TID
|
Items Bought
|
10
|
100
|
{Milk, Butter, Diapers}
|
10
|
115
|
{Milk, Eggs, Coke, Diapers}
|
20
|
85
|
{Milk, Eggs, Butter, Diapers}
|
20
|
125
|
{Milk, Coke, Butter, Diapers}
|
30
|
90
|
{Eggs, Coke, Diapers}
|
30
|
130
|
{Eggs, Butter, Diapers}
|
40
|
155
|
{Coke, Butter}
|
40
|
160
|
{Milk, Eggs, Coke}
|
50
|
60
|
{Milk, Butter, Diapers}
|
50
|
75
|
{Milk, Eggs, Diapers}
|
a) Find the support for itemsets {Diapers}, {Eggs, Butter}, and {Eggs, Butter, Diapers} by treating each transaction (TID) as a market basket.
b) Based on the results you got in part (a), Find the confidence for the association rules {Eggs, Butter}à{Diapers} and {Diapers}à{Eggs, Butter}. Is confidence a symmetric measure?
c) Find the support for itemsets {Diapers}, {Eggs, Butter}, and {Eggs, Butter, Diapers} by treating each Customer (CID) as a market basket. Each item should be treated as a binary variable (1 if an item appears in at least one transaction bought by the customer, and 0 otherwise.)
d) Based on the results you got in part (c), Find the confidence for the association rules {Eggs, Butter}à{Diapers} and {Diapers}à{Eggs, Butter}.
2- A database has ten transactions. Let min_sup = 30%.
TID
|
Items Bought
|
100
|
{A, B ,D, E}
|
200
|
{B, C, D}
|
300
|
{A ,B, D, E}
|
400
|
{A, C, D, E}
|
500
|
{B, C, D, E}
|
600
|
{B, D, E}
|
700
|
{C, D}
|
800
|
{A, B, C}
|
900
|
{A, D, E}
|
1000
|
{B, D}
|
(a) Apply the Apriori algorithm to the above data set.
(b) Show the FP tree that would be made for the data set.