Assignment:
The purpose of this assignment is to improve your understanding in choosing between and using methods to assess the relationship between variables and make comparisons across different segments. It will develop your skills in being able to critically think and enhance your analytical skills.
Specifically, we want you to access the Excel database provided on UTSOnline that describes the evaluation of more than 500 restaurants that are listed on a website /app that consumers can search and consult to determine where to go for breakfast, lunch or dinner in a capital city similar in geography to Sydney. Restaurants are categorised by a number of dimensions and this data is fed into the app and website search engine for consumers to help narrow their search.
You are asked to treat this database as a sample of the restaurants in Sydney - that is, the database is a listing describing a subset of restaurants in the capital city, but not all restaurants are included for various reasons (e.g., new restaurants being opened on a daily basis).
We want you to investigate a series of questions using this database using the various statistical techniques you have been exposed to in Business Statistics and answer several managerial questions.
In answering each question, please consider the following:
• Explain which variables you used as inputs into your answers
• You should clearly show the various inputs to the calculations you used to construct your answer and justification or assumptions made in some cases for why you chose these as inputs (e.g., degrees of freedom; assumption).
• As you have access to a computer, you will be expected to use critical table values that are accurate (i.e., not rounded or obtained from an offline source such as a table listed in a textbook or table you may use in exam settings). This selection of table should be based on your knowledge about the population parameters, level of significance and degrees of freedom inputs, which you should also state to justify your selection of table.
• Present any inputs and answers to two decimal places, but round your calculations only at the end though to avoid round error. Probabilities should be presented as percentages.
Question 1) Anna faces a pricing dilemma
Anna is examining prices and wondering where she sits relative to competitors based on when she is open. She only opens for lunch. She does not open for breakfast or dinner.
She thinks that some restaurants charge different prices based on how often they are open during the day (e.g., some open for lunch and dinner), signalling (e.g., a higher price may signal a certain level of quality) and a host of reasons relating to costs etc.
Given a restaurant could open upon three occasions during the day, there are up to seven combinations (23-1) of whether a restaurant serves breakfast, lunch, and/or dinner.
In this question you are asked to calculate and further consider suitable summary statistics for the price of each of these combinations of opening times. To start, calculate the mean price, standard deviation in prices, and number of restaurants (i.e., frequency) for each of the seven combinations.
To help Anna further determine prices being charged by her competitors, you are asked to construct a confidence interval describing the average price of restaurants in each of the seven categories.
Your task is to present these in a nice summary table. That is, construct a table with the seven opening combinations listed in rows and the following columns listed to describe each: average, standard deviation, frequency count, standard error, margin of error, confidence interval lower bound and confidence interval upper bound. Feel free to label the columns using suitable letters (e.g., SD for standard deviation; SE for standard error; n for frequency; ME for margin of error). Add a final row to the table whichsummarises the average price of those open for lunch regardless of whether they are open for breakfast and/or dinner. Each entry, apart from sample size, should be to two decimal places.
Without doing any formal significance testing (i.e., simply examine your output from above) answer the following:
i) Is there any combination that appears to be significantly more expensive (on average) than any other? Write a short explanation of why this may be the case.
ii) Is there any combination that is significantly less expensive (on average) than any other? Write a short explanation of why this may be the case.
iii) Is there any combination out of the eight possible that does not exist in the sample of restaurants considered? Write a short explanation of why this may be the case.
iv) Explain why some confidence intervals appear to be much wider than others.
v) If Anna was to set prices in line with the average charged by her competitors who have similar opening times, what range should her prices be within? How does this range in prices compare to the confidence interval describing the average price of those open for lunch regardless of whether they are known to open for breakfast and/or dinner?
Question 2) Parking Costs and Proximity to City
Brett, an analyst working for a company that looks after a suite of parking stations, asserts that the cost of parking significantly increases for customers as the location of a restaurant at which they patron is located closer to the city.
Select a suitable graphical method and numerical measure to examine whether this to be the case using the data you have been provided.
Examining your graphical results, can you see whether, and why, there is a further consideration in the data that should be explored? In particular, repeat your analysis above by segmenting the data based on restaurants located in the City and Metro regions combined (i.e., those restaurants that are 10km or less in distance from the CBD) as compared to those located in the outer suburbs (i.e., at a location that is more than 10km from the CBD).
To be clear, you are asked to provide three figures and three numerical measures to answer this question in its entirety.
Write a short summary that describes these results to Brett and suggest possible reasons why the results vary depending on how the data is segmented. Is there an additional way of measuring the distance variable that you could propose that could be useful to explain the variation in parking costs not captured presently?
Question 3) Christine's Italian restaurant (or cafe; or pizzeria)
Christine has recently opened a restaurant. She is interested in listing her restaurant with various online vendors, but anticipates that the listing will receive a better rating in terms of perceived healthiness based on a halo effect. That is, she feels that her rating will be biased simply because it will be associated with other restaurants that share the same categorisation as her.
Some people have described her restaurant as a cafe. Some have referred to her as a pizzeria, whilst others suggest its best to describe her restaurant as "Italian".
She worries that if people see her categorised as a pizzeria they will infer it as being less healthy relative to having a listing categorising her as "Italian" or as "Coffee/Cafe".
In turn, Christine is now interested in examining the evaluation of restaurants with respect to their perceived levels of healthiness and these three cuisine types.
A) Can you please formally test whether a perceived level of healthiness is independent of whether the cuisine is classified as ‘Pizza', ‘Italian' or ‘Coffee/Cafe. In doing so, you should present a table of joint frequencies in the case of what is observed and another table in terms of what is expected under the assumption of independence, along with the relevant information you need to formally test for independence.
B) Present a table that summarises the various conditional probabilities that focus on the probability a perceived level of healthiness rating a restaurant receives, given the type of cuisine the restaurant offers. Your table should be based on data referring to restaurants identified as offering Italian, Coffee/Café or Pizzeria only.
C) Present another table that summarises the various conditional probabilities that focus on the probability a type of cuisine is offered by the restaurant, given the perceived healthiness rating the restaurant receives. Your table should be based on data referring to restaurants identified as offering Italian, Coffee/Café or Pizzeria only.
D) With the above information, write a short recommendation to Christine based on her current menu offering and her desire to appeal to a segment looking for a healthier type of cuisine out of the three potential categorisations she could be listed as.
E) Re-expand your analysis to consider all restaurants in the database. Given a restaurant is rated as being "very healthy", what is the probability that this restaurant is one that offers pizza? Given a restaurant is rated as being "very unhealthy", what is the probability that this restaurant is one that offers pizza? With this type of thinking, expand the table presented in part 3B to focus on the entire database to consider all 10 cuisines. Is there a cuisine categorisation that stands out for its perceived level of healthiness? Is there a cuisine (or perhaps more than one) that stand out for being likely to be rated as being less healthy?
Question 4) Outdoors Add Value?
David is a real estate vendor that focuses on rental of restaurant spaces. He believes that restaurants that have outdoor seating generally receive higher customer ratings than those that do not. Consistent with the background information provided by the online vendor who has provided the standardized ratings, David examines the data and thinks a rating above 50 is a good benchmark.
Formally test the hypothesis that the average rating of a restaurant without outdoor seating is 'good' (i.e., significantly more than 50).
Formally test the hypothesis that the average rating of a restaurant with outdoor seating is 'good' (i.e., significantly more than 50)
Construct a confidence interval for the average rating of restaurants without outdoor seating and a second confidence interval focusing on the average rating of restaurants with outdoor seating. Present these in a graphical format to further elaborate on whether David's assertions that one type of restaurant receives significantly higher ratings than the other are correct.
Write a short summary describing the various sets of results for David.
Attachment- Data Dictionary.rar