Q1. The data file airfares.txt on the book web site gives the one-way airfare (in US dollars) and distance (in miles) from city A to 17 other cities in the US. Interest centers on modeling airfare as a function of distance. The first model fit to the data was
Fare = β0 + β1Distance + e
(a) Based on the output for model (3.7) a business analyst concluded the following:
The regression coefficient of the predictor variable, Distance is highly statistically significant and the model explains 99.4% of the variability in the Y-variable, Fare. Thus model (1) is a highly effective model for both understanding the effects of Distance on Fare and for predicting future values of Fare given the value of the predictor variable, Distance.
Provide a detailed critique of this conclusion.
(b) Does the ordinary straight line regression model (3.7) seem to fit the data well? If not, carefully describe how the model can be improved.
Q3. The price of advertising (and hence revenue from advertising) is different from one consumer magazine to another. Publishers of consumer magazines argue that magazines that reach more readers create more value for the advertiser. Thus, circulation is an important factor that affects revenue from advertising. In this exercise, we are going to investigate the effect of circulation on gross advertising revenue. The data are for the top 70 US magaimes ranked in terms of total gross advertising revenue in 2006. In particular we will develop regression models to predict gross advertising revenue per advertising page in 2006 (m thousands of dollars) from circulation (in millions). The data were obtained from https://adage.com and are given in the file AdRevenue.csv which is available on the book web site. Prepare your answers to parts A, B and C in the form of a report.
Part A -
(a) Develop a simple linear regression model based on least squares that predicts advertising revenue per page from circulation (i.e., feel free to transform either the predictor or the response variable or both variables). Ensure that you provide justification for your choice of model.
(b) Find a 95% prediction interval for the advertising revenue per page for magazines with the following circulations:
(i) 0.5 million
(ii) 20 million
(c) Describe any weaknesses in your model.
Part B
(a) Develop a polynomial regression model based on least squares that directly predicts the effect on advertising revenue per page of an increase in circulation of 1 million people (i.e., do not transform either the predictor nor the response variable). Ensure that you provide detailed justification fee your choice of model [Hint: Consider polynomial models of order up to 3.]
(b) Find a 95% prediction interval for the advertising page cost for magazines with the following circulations:
(i) 0.5 million
(ii) 20 million
(c) Describe any weaknesses in your model.
Part C -
(a) Compare the model in Part A with that in Pan B. Decide which provides a better model. Give reasons to justify your choice.
(b) Compare the prediction intervals in Pad A with those in Part B. In each case, decide which interval you would recommend. Give reasons to justify each choice.
Q4. Tryfos 0998, p. 57) considers a real example involving the management at a Canadian port on the Great Lakes who wish to estimate the relationship between the volume of a ship's cargo and the time required to load and unload this cargo. It is envisaged that this relationship will be used for planning purposes as well as for making companions with the productivity of other ports. Records of the tonnage loaded and unloaded as well as the time spent in port by 31 liquid-Carrying vessels that used the port over the most recent summer are available The data are available on the book website in the file glalees.txt. The first model fit to the data was
Time = β0 + β1Tonnage + e (3.8)
On the following pages is some output from fitting model (3.8) as well as some plots of Tonnage and Time (Figures 3.42 and 3.43).
(a) Does the straight line regression model (3.8) seen to fit the data well? If not, list any weaknesses apparent in model (3.8).
(b) Suppose that model (3.8) was used to calculate a prediction interval for Time when Tonnage = 10,000. Would the interval be too short, too long or about right (i.e., valid)? Give a reason to support your answer.
Q5. An analyst for the auto industry has asked for your help in modeling data on the prices of new cars. Interest centers on modeling suggested retail price as a function of the cost to the dealer for 234 new cars. The data set, which is available on the book website in the file cars04.csv, is a subset of the data from https://www.amstat.org/publications/jse/datasets/04cars.txt
(Accessed March 12, 2007)
The first model fit to the data was
Suggested Retail Price = β0 + β1 Dealer Cost + e
(a) Based on the output for model (3.10) the analyst concluded the following:
Since the model explains just more than 99.8% of the variability in Suggested Retail Price and the coefficient of Dealer Cost has a t-value greater than 412, model (1) is a highly effective model for producing prediction intervals for Suggested Retail Price.
Provide a detailed critique of this conclusion.
(b) Carefully describe all the shortcomings evident in model (3.10). For each short-coming, describe the steps needed to overcome the shortcoming.
The second model fitted to the data was
log(Suggested Retail Price) = β0 - β1Iog(Dealer Cost) + e (3.11)
Output from model (3.11) and plots Figure 3.47) appear on the following pages.
(c) Is model (3.11) an improvement over model (3.10) in terms of predicting Suggested Retail Price? If so, please describe all the ways in which it is an improvement.
(d) Interpret the estimated coefficient of log(Dealer Cost) in model (3.11).
(e) List any weaknesses apparent in model (311).
Q8. Chu (1996) discusses the development of a regression model to predict the price of diamond rings from the size of their diamond stones (in terms of their weight in carats). Data on both variables were obtained from a full page advertisement placed in the Straits Times newspaper by a Singapore-based retailer of diamond jewelry. Only rings made with 20 carat gold and mounted with a single diamond stone were included in the data set. There were 48 such rings of varying designs. (Information on the designs was available but not used in the modeling.)
The weights of the diamond stones ranged from 0.12 to 0.35 carats (a one carat diamond stone weighs 0.2 gram) and were priced between $223 and $1086. The data are available on the course web site in the file diamonds.txt.
Part 1 -
(a) Develop a simple linear regression model based on least squares that directly predicts Price from Size (that is, do not transform either the predictor nor the response variable). Ensure that you provide justification for your choice of model.
(b) Describe any weaknesses in your model.
Part 2
(a) Develop a simple linear regression model that predicts Price from Size (i.e., feel free to transform either the predictor or the response variable or both variables). Ensure that you provide detailed justification for your choice of model.
(b) Describe any weaknesses in your model.
Part 3
Compare the model in Part A with that in Part B. Decide which provides a better model. Give reasons to justify your choice.
Complete assignment in attachment.
Attachment:- Assignment.rar