1. A one page single or double spaced, does not matter) Word or PDF file that has the recommendations and a brief outline of the approach you took for a) and b) below.
2. A ZIP file of your RAPIDMINER repository that has your data and process.
Whiskey Analytics
In Chapter 6 of the book "Data Science for Business" by Provost and Fawcett, there is a reference (page 144) to NYU colleague Foster Provost's desire to find Whiskeys that are similar to Bunnahabhain (he really likes this drink!!). We will use a data science approach to help Professor Provost's friend Professor Johnson. The relevant data and data-dictionary for this are posted below and was originally curated by François-Joseph Lapointe and Pierre Legendre (1994) of the University of Montréal. You will of course use machine learning to do address the issues at hand:
a) Clustering- Your goal is to suggest a few interesting Whiskies to Professor Johnson whose favorite is the Dalwhinnie. Try both hierarchical and k-means clustering, and then choose one of two methods to find some meaningful clusters of whiskeys that can help business decisions makers gain insights from the Whiskey dataset. Based on the cluster Professor Johnson's favorite whiskey falls in suggest 4-5 other whiskies to him.
b) Association rules - ProfessorJohnson and Professor Provost were overheard having a heated argument around whiskey makers preferences and understanding of the market. Provost claimed that there is a higher than random chance that those drinkers that likes a dry palate and a dry finish also liked a whiskey that was dry on the nose, "and that's why any distiller worth his name in salt would make em' that way." Provost claimed that his Scottish grandmother told him so. You have been hired by Bapna as a well-trained data scientist to verify this claim from actual compositions of whiskey (hint: this time using association rules mining). Please also suggest a few interesting patterns of association that you can discern from that data with respect to the traits/characteristics of Scotch whiskies.
c) BONUS-- See if you can replicate the table below from the book. You only have to worry about the Distance column, not the labels that go with it. (see page 146 of the attached book pages)
It is important to note that these category values are not mutually exclusive (e.g., Aber¬lour's palate is described as medium, full, soft, round and smooth). In general, any of the values can co-occur (though some of them, like Color being both light and smoky, never do) but because they can co-occur, each value of each variable was coded as a separate feature by Lapointe and Legendre. Consequently there are 68 binary features of each whiskey.
Foster likes Bunnahabhain, no we can use Lapointe and Legendre's representation of whiskeys with Euclidean distance to find similar ones for him. For reference, here is their description of Bunnahabhain:
• Color. gold
• Nose: fresh and sea
• Body: firm, medium, and light
• Palate: sweet, fruity, and clean
• Finish: full
Here is Bunnahabhain's description and the fivesingle-malt Scotches most similar to Bunnahabhain, by increasing distance:
Whiskey Distance Descriptors
Bunnahabhain - gold; firm,mallight; sweetfruitriean; fresh,sea; full
Glenglassaugh 0.643 gold; firm,fight,smooth; sweet,grass; fresh,grass
Tullibardine 0.647 gold; firm,med,smooth; sweet,fruit,full,grass,clean; sweet; big,arome,sweet
Ardbeg 0.667 sherry; firm,med,fulklight; sweet; dry,peat,sea;salt
Bruichladdich 0.667 pale; firm,light,smooth; dry,sweeksmoke,clean; light; full
Glenmorangie 0.667 p.gold; med,oily,light; sweekgrass,spice; sweet,spicy,grass,sea,fresh; full,long
Using this list we could find a Scotch similar to Bunnahabhain. At any particular shop we might have to go down the list a bit to find one they stock, but since the Scotches are ordered by similarity we can easily find the most similar Scotch (and also have a vague idea as to how similar the closest available Scotch is as compared to the alternatives that are not available).
This is an example of the direct application of similarity to solve a problem. Once we understand this fundamental notion, we have a powerful conceptual tool for approach
Attachment:- scotch1.xlsx