Assignment:
Problem 1
Choose the data technology (Q, H, U, or S) that is most appropriate for each of the following business questions/scenarios and briefly explain your reasoning.
Q - Database Querying
H - Statistical Hypothesis Testing
U - Unsupervised Data Mining
S - Supervised Data Mining
A) I want to know which of my current customers have spent the most money on my products over the last six months.
B) I need to get data on all my on-line customers who were emailed a special offer last month, including their registration data, all their past purchases, and whether they purchased the product from the special offer in the 15 days following receiving the email.
C) I would like to segment my customers into groups based on their demographics and prior purchase activity. I am not focusing on predicting anything but would like to generate ideas.
D) I have a budget to target 10,000 existing customers with a special offer. I would like to identify those customers most likely to respond to the special offer.
E) I want to know what characteristics differentiate my most profitable customers.
F) A new model to find customers to target with online advertising yields a response rate of 0.5%,while the old targeting model yieldeda 0.3% response rate. I want to know if the response rate of the new model is measurably better than that of the old model.
Problem 2
Label each case as describing either data mining (DM), or the use of the results of data mining (Use) and briefly explain your reasoning.
A) Choosing customers who are most likely to respond to an on-line ad.
B) Discovering rules that indicate when an account has been defrauded.
C) Finding patterns indicating what customer behavior is more likely to lead to response to an on-line ad.
D) Estimating the probability of default for a credit application.
Problem 3
Plumbing Inc. has been selling nothing but plumbing supplies for the last 20 years. Now, the owner has decided that next year is the right time to diversify by selling gardening tools as well.
A previous consultant to the owner had success using customer data to build predictive models to guide direct mail campaigns for special plumbing offers. The owner now thinks that data mining could help them identify a subset of customers who would be good prospects for their new set of products.
Explain how the 6 steps of the CRISP-DM process could be applied to solve this as a supervised learning problem. You should also indicate what the target of your analysis would be. Try to focus on how to turn the relevant questions to ask at each stage into ones that are specific to this problem, providing general answers if possible.