Question 1:
Mobile-phone manufacturers would like us to believe that phone quality is closely reflected by its price, with the assumption that better quality phones are more expensive. Suppose we want to test this assertion by developing a model to predict phone price by its overall quality rating score. The scatterplot of price versus overall quality score for 21 makes and models of mobile phones currently on offer, together with the equation of a fitted regression line, is shown below.
(a) Which is the independent and which is the dependent variable for this example?
(b) What is the value of the slope and what does it measure in this example?
(c) What is the value of the intercept and what does it measure in this example?
(d) What is the value of the coefficient of determination and what does it measure in this example? Interpret its value.
(e) Based on your analysis so far, are more expensive phones necessarily the better quality ones?
(f) Use the linear regression model described above to predict the price of a mobile phone with an overall quality score of 80. Is this prediction likely to be accurate? Explain briefly.
Question 2:
Suppose you are investigating selling prices of properties. You have obtained summary measures for a random sample of 500 properties. The mean selling price was $520,000 while the median was $410,000. The highest and lowest selling prices were $850,000 and $250,000, respectively.
(a) Is the distribution of property prices symmetric, right-skewed or left-skewed? How many properties in the sample sold for less than $410,000? Explain how you know.
(b) Based on the given information, can you determine how many properties sold for more than $520,000? Why or why not? Explain.
Question 3:
A real estate company surveyed 50 of its sold properties. The following table gives the selling prices (in thousand dollars) of these properties.
61.4
|
27.3
|
26.4
|
37.4
|
30.4
|
47.5
|
63.9
|
46.8
|
67.9
|
19.1
|
81.6
|
47.9
|
73.4
|
54.6
|
65.1
|
53.3
|
71.6
|
58.6
|
57.3
|
87.8
|
71.1
|
74.1
|
48.9
|
60.2
|
54.8
|
60.5
|
32.5
|
61.7
|
55.1
|
48.2
|
56.8
|
60.1
|
52.9
|
60.5
|
55.6
|
38.1
|
76.4
|
46.8
|
19.9
|
27.3
|
77.4
|
58.1
|
32.1
|
54.9
|
32.7
|
40.1
|
52.7
|
32.5
|
35.3
|
39.1
|
|
|
|
|
a. Use Excel to obtain a histogram of this data. It should be formatted as described in the Excel Booklet which you received in your practical in Week 1. The bins start with 10 as the lowest class beginning point and use a class width of 10.
b. Briefly describe the shape (symmetry, modality and outliers) of the data based on the histogram obtained from part (a).
c. Use Excel to obtain the Descriptive Statistics, Quartile1 and Quartile 3 for the data.
d. Which measures of location and dispersion should you use for this data? What are their values? Give a brief explanation for your decision.
Question 4:
A T-shirt company is interested in knowing the average retail price charged for one product sold in stores across the country. The company cannot justify a national census to generate this information. Based on the company information system's list of all retailers who carry the product, a researcher for the company contacts 36 of these retailers and ascertains the retail prices for the product. A population standard deviation is known to be $1.13. The price data (in dollar) is listed in the following table.
22.3
|
21.6
|
21.2
|
20.1
|
19.9
|
22.3
|
21.1
|
23.1
|
20.7
|
18.7
|
21.0
|
21.2
|
19.8
|
21.7
|
21.8
|
20.9
|
20.8
|
22.0
|
23.0
|
21.8
|
22.2
|
20.5
|
21.7
|
21.4
|
22.9
|
23.2
|
21.5
|
21.0
|
18.2
|
21.9
|
20.2
|
21.9
|
22.6
|
22.4
|
21.7
|
21.6
|
a. Use Excel to calculate the sample average retail price.
b. Set up a 95% confidence interval of the population average retail price charged for this T-shirt item.
c. Does the population of retail price have to be normally distributed here? Explain briefly.
d. What is the probability that the mean retail price, in a sample of size 36, is greater than $20.80? Assume the population average retail price is $21.40
Question 5 Do gender and age influence movie preferences?
Data for this question has been adapted from the following paper:
Fischoff, S, Antonio, J & Lewis BA (1998). Favorite Films and Film Genres As A Function of Race, Age, and Gender. Journal of Media Psychology, vol. 3 (1).
Movie genre preferences by genre of 560 randomly selected survey respondents have been summarized in the following contingency table:
Favourite Genre
|
Male
|
Female
|
Total
|
Action
|
61
|
45
|
106
|
Drama
|
91
|
111
|
202
|
Comedy
|
32
|
37
|
69
|
Romance
|
28
|
53
|
81
|
Sci-Fi
|
32
|
16
|
48
|
Fantasy
|
20
|
34
|
54
|
Total
|
264
|
296
|
560
|
Table 2: Movie genre preferences by gender.
Results of the same survey have also been classified according to the respondents' age, producing the following contingency table:
Favourite Genre
|
Gen Y (13-25)
|
Gen X (26-49)
|
Baby Boomers (50+)
|
Total
|
Action
|
54
|
38
|
14
|
106
|
Drama
|
63
|
89
|
50
|
202
|
Comedy
|
26
|
33
|
10
|
69
|
Romance
|
35
|
32
|
14
|
81
|
Sci-Fi
|
18
|
22
|
8
|
48
|
Fantasy
|
24
|
20
|
10
|
54
|
Total
|
220
|
234
|
106
|
560
|
Table 3: Movie genre preferences by age.
(a) Use Excel to obtain a 100% stacked column chart for the data from each table.
EXCEL Instructions:
Refer to Topic 6 in the Excel Booklet for instructions on how to obtain 100% stacked column chart. Please make sure that the title of your chart ends with your network ID (e.g. Movie Genre preferences vs gender bloggsj001).
(b) From Table 2, what is the probability that a respondent chose Sci-Fi as their favourite movie genre?
(c) From Table 3, What is the probability that a respondent chose Sci-Fi as their favourite movie genre, if:
(i) He or she was Gen Y?
(ii) He or she was Gen X?
(iii) He or she was a Baby Boomer?
(d) Repeat calculations from (b) and (c) for Action instead of Sci-Fi.
(e) From Table 2, What is the probability that a respondent chose Sci-Fi as their favourite movie genre, if:
(i) They were male?
(ii) They were female?