Assignment:
Consider the file "advertising.xls" (attached) showing data for magazine titles, the cost of a full-color page advertisement (page), audience (subscribers), male percentage of subscribers, and household income. The objective of this project is to find out if there is any relationship among variables using regression analysis techniques. You are to write a report about your findings after analyzing the data set. You need to perform more in-depth analysis. For example, you may have to use such tools as confidence interval estimates and one or two-sample tests on the data to improve the quality of your report.
a) State your statistical objective for this data set.
b) Perform exploratory data analysis, such as numerical measures or the box-and-whisker plot for this data set.
c) Construct scatter diagrams for pairs of variables. Describe the relationship that you may see. Do these appear to have some association (linear or non-linear)?
d) Does a linear model appear to hold for any pair of variables? You may want to run some testing to substantiate. Why or why not.
e) Apply the best-subsets approach to model building to see if there is any variable that shouldn't be used for this
model.
f) Consider the male percentage of subscribers as categorical data, for example, if it is more than 66%, input as "male magazine," between 66% and 33% as "gender free," and less than 33% as "female magazine." Then introduce dummy variables for these data. Will this give you a meaningful (better) output for this model since
some households use male names to subscribe any magazine? Can you introduce any other dummy variables to improve your analysis? A new dummy variable can be created within the data or external data.
g) Once you determine which variables are to be used, perform a multiple regression analysis, including co-linearity, on this subset of variables.
h) Summarize and comment on your results.