Assignment:
Q1: The CEO has asked you to do some basic statistical analysis of the theaters, looking at the relationship between some basic theatre-specific/demographic factors and theatre revenue. Most of the data is fairly easy (though time consuming and expensive) to get, but some members of your team are concerned about the difficulty of quickly getting accurate and meaningful theatre revenue data for the current year. React to each of the concerns raised below:
a) Some of the theaters have really old infrastructure and are not even hooked up to our data warehouse, so to get their data will take FOREVER. Plus, with the amount of work those theatres do by hand, I'm sure there will be tons of mistakes in their data, which will just screw up our figures. For now, let's not worry about those theaters and just collect data from those who are hooked up to our network--we can get the data more quickly that way, and it's probably better data to boot.
b) Most of our theatres play first run movies, but about 25% of our theatres show second run movies and those theatres might really screw up our data. I'd suggest limiting the data analysis to our first run theatres.
c) Some movies don't make it to all the theatres, you know. It doesn't make much sense to compare the revenues of two theatres when they might be playing very different movies. Instead, we should look at revenue from a well defined subset of movies that were shown at most or all of our theatres (e.g. only the top 20 grossing films of the year).
Q2: Ultimately you decide upon a simple random sample of 90 theatres. Use this data to:
a) create a graph that shows the distribution the different types of theatre setting (mall, strip mall, free standing)
b) create a histogram of theatre revenue
c) determine if theatre revenue is approximately normally distributed
d) generate summary statistics for theatre revenue
e) generate summary statistics for theatre revenue broken out by location (mall vs. strip mall vs. free standing)
Q3: a) Last year, your chain averaged $6,500,000 in revenue per theatre. Test (using 6,500,000 as your Null Hypothesis) whether or not revenues are up this year.
b) Recalling that your simple random sample is of 90 of the 270 theatres in your chain, calculate a 95% confidence interval for the total revenue of all 270 theatres in the chain.
Q4: Your company is thinking of building a number of new theatres in the Carolinas and is currently trying to decide on building locations.
a) Conduct a 1-way ANOVA (and follow-up tests as needed) to see if there is a significant relationship between location (mall/strip mall/free standing) and revenue
b) What advice might you give regarding location choice?
Q5: a) Estimate a multiple regression model using revenue as your dependent variable. For independent variables, include population, average income, screens, and dummy variables for mall and strip mall. Interpret your results!
b) Based on your results in (a), evaluate the following statements:
i) Because you only have a third of the theatres in your dataset, to interpret those estimated coefficients correctly you need to first triple them.
ii) We need more midnight showings of the hot new releases to bring up our revenue figures.
iii) Because the estimated coefficient on the number of screens is huge, we should just start building theaters with 40 and 50 screens to really start raking in the cash!
c) After presenting your initial results, your boss suggests that i) population and average income shouldn't matter, total economic activity should, and ii) age should have a non-linear effect--it takes some time before people learn that the theater is there, so early on revenues should increase with age, but after a while the theaters get run down and thus revenues start to fall with age. To consider these two effects, create two new variables: an interaction effect between population and income (which will account for total economic activity) and age squared (to capture the hypothesized non-linearity). Add these two variables to the model estimated in part (a) of this question and estimate the new model.
d) How does this model compare to the one from (a)? Do your new results vindicate or run contrary to your boss's suggestions? Why or why not?
Q6: Shortly after reading this report, the CEO confides in you that he is thinking of expanding abroad by purchasing a small chain of theatres in Quebec, Canada. Because this chain is privately held, he does not have access to things like their annual financial reports. However, data of the sort here (demographics, theatre size, etc) are publically available, and he has given those to you. Data on all of the theatres owned by this firm can be found in the tab Target Company. He is wondering if your results you showed him might be useful in figuring out the annual revenue figures for the Canadian theatre chain.
a) Use the data in the tab Target Company. Use your results from question 5c to estimate the revenue of the target firm.
b) Was what you did in part (a) a good idea? Why or why not?