Cind - data organization for data analysts data mining


Data Organization for Data Analysts

Data Mining Concepts

1. On describing discovered knowledge using association rules

One of the major techniques in data mining involves the discovery of association rules. These rules correlate the presence of a set of items with another range of values for another set of variables. The database in this context is regarded as a collection of transactions, each involving a set of items, as shown below.

Trans ID          Items Purchased

101                  milk, bread, eggs

102                  milk, juice

103                  juice, butter

104                  milk, bread, eggs

105                  coffee, eggs

106                  coffee

107                  coffee, juice

108                  milk, bread, cookies, eggs

109                  cookies, butter

110                   milk, bread

1.1 Apply the Apriori algorithm on this dataset.

Note that, the set of items is {milk, bread, cookies, eggs, butter, coffee, juice}. You may use 0.2 for the minimum support value.

1.2 Show two rules that have a conftdence of 0.7 or greater for an itemset containing three items.

2. On describing discovered knowledge using classiftcation

Classiftcation is the process of learning a model that describes different classes of data and the classes should be pre-determined. Consider the following set of data records:

RID

Age

City

Gender

Education

Repeat Customer

101

20..30

NY

F

College

YES

102

20..30

SF

M

Graduate

YES

103

31..40

NY

F

College

YES

104

51..60

NY

F

College

NO

105

31..40

LA

M

High school

NO

106

41..50

NY

F

College

YES

107

41..50

NY

F

Graduate

YES

108

20..30

LA

M

College

YES

109

20..30

NY

F

High school

NO

110

20..30

NY

F

college

YES

2.1 Assuming that the class attribute is Repeat Customer, apply a classiftcation algorithm to this dataset.

3. On describing discovered knowledge using clustering

Consider the following set of two-dimensional records:

RID

Dimension 1

Dimension 2

1

8

4

2

5

4

3

2

4

4

2

6

5

2

8

6

8

6

3.1 Use the K-means algorithm to cluster this dataset. You can use a value of 3 for K and can assume that the records with RIDs 1, 3, and 5 are used for the initial cluster centroids (means).

3.2 What is the difference between describing discovered knowledge using clustering and describing it using classiftcation.

Solution Preview :

Prepared by a verified Expert
PL-SQL Programming: Cind - data organization for data analysts data mining
Reference No:- TGS02256416

Now Priced at $25 (50% Discount)

Recommended (94%)

Rated (4.6/5)