Stats301 applied statistical methods for business assignment


STAT S301 APPLIED STATISTICAL METHODS FOR BUSINESS ASSIGNMENT: Convenience Shopping- Indiana University, Bloomington.

Question Set I

I. Get to know your scientific question

1) Identify the variable of interest.
2) Identify the population(s) and sample(s).
3) Identify the parameter(s) and statistic(s).
4) What is the scientific question? Is this Descriptive Statistics or Inferential Statistics?

II. Get to know your data

1) Identify the types of your data: nominal data, ordinal data or quantitative data.
2) Identify the types of your data: time series data or cross-sectional data.
3) Identify the source of your data: primary data or secondary data. Do you think the data is reliable? Are there possible issues with your data?

III. Calculate descriptive statistics in Excel

1) Calculate the statistics for your variable of interest, such as sample mean (x¯), median, mode, variance (s2), and standard deviation (s).

2) Identify two different groups based on the qualitative data. Calculate the above statistics for each group to compare.

IV. Display your data with charts and graphs in Excel

1) Construct displays that best describe your qualitative variable (e.g. bar chart, pie chart); and describe the distribution.

2) Construct displays that best describe your variable of interest and describe its distribution. (Use: Frequency distribution tables, histograms and/or the empirical rule to discuss normality, symmetry and skewness)

3) Construct displays that best describe the relationship/association between two quantitative variables (the variable of interest as the dependent variable, y, and another quantitative variable as the independent variable, x); and describe the relationship.

V. Distributions

1) Consider the distribution of your quantitative data in IV(2). Would it be appropriate to use the Binomial or Normal distribution to model your data? Why or why not? Hint: The binomial distribution models success/failure discrete data while the normal distribution is for bell- shaped continuous data.

Question Set II

I. Construct a confidence interval for a population mean

1) Do you need to make assumptions in order to perform the procedure of constructing a confidence interval? If so, what assumptions need to be made? If not, why?

2) Construct a confidence interval for the average sales.

o Should you use a z-interval or a t-interval? Why?
o Compute the necessary sample statistics for constructing a confidence interval.
o Find the margin of error of the confidence interval at confidence levels of 92% and 95%, respectively.
o Calculate these two confidence intervals.

3) Someone believes that the average sales is 2421 Dollars. Does the sample support the claim? Explain if you have different conclusions using the above two confidence intervals. (You must discuss in terms of accuracy and precision.)

II. Conduct a hypothesis test for a population mean

1) Do you need to make assumptions in order to perform the procedure of conducting a hypothesis test? If so, what assumptions need to be made? If not, why?

2) Using α = 0.07 perform a hypothesis test to determine if the average sales is higher than 2350 Dollars.

o Write down the hypotheses.
o Calculate the test statistic, critical values and p-value.
o Describe your decision of the test and make a conclusion based on the context.

III. Compare two population means

1) Do you need to make assumptions in order to perform the procedure of conducting a hypothesis test or constructing a confidence interval? If so, what assumptions need to be made? If not, why?

2) Using α = 0.04 perform a hypothesis test to determine if the mean Sales Dollars of the two groups identified by your qualitative variable are different. We cannot assume equal variances. List the results of all key steps before you reach your conclusion, such as the hypotheses, test statistic, critical value(s) and/or p-value. (Use the Data Analysis Toolpak in Excel.)

3) Find the 90% confidence interval to estimate the average difference in sales between the two populations according to the qualitative variable.

4) Interpret the above confidence interval.

Question Set III

I. Building a Simple Linear Regression Model: Preprocess.

1) Identify all quantitative variables from the dataset.

2) Construct a Scatter Plot to show the relationship between Sales Dollars (Y) and each independent variable. Calculate the sample correlation coefficients for all pairs. Describe the association.

3) Which pair has the strongest linear association?

4) Write down the general formula for the Simple Linear Regression Model between Y and X. (Write the formula using general parameters notation β0 and β1, what should be capitalize or lowercase ? what should be added, if any? )

II. Describe the linear relationship between Sales Dollars (Y) and the variable you answered in 23) (above) as x.

1) Calculate the slope and y-intercept of the least squares regression line using Excel. Write down the linear equation.

2) Interpret the regression slope.

3) What percentage of the total variation in y can be explained by this independent variable x?

III. Use the regression model to predict Sales (Y).

1) What is the predicted sales with 3250? (Fill in the blank with units and name of the independent variable you chose.)

2) Calculate the 93% confidence interval for the average Sales Dollars (Y) with 3250 and interpret. (Fill in the blank with units and name of the independent variable you chose.)

3) Calculate the 93% prediction interval for a SINGLE sales (Y) with 3250 and interpret. (Fill in the blank with units and name of the independent variable you chose.)

IV. Is there a linear relationship between Y and X?

1) Test the significance of the slope of the regression equation. Use α = 0.09.

o Write down the hypotheses.
o What is the p-value?
o Describe your decision.

2) Develop a 90% confidence interval for the population slope. Does this confidence interval include 0?

3) State your conclusion.(Hint: You may need to re-calculate Regression analysis: Data → Data Analysis → Regression → Confidence level.)

V. Check the assumptions for regression analysis. Make necessary plots in Excel to justify and include them in your answers.

1) Is the relationship between the dependent and independent variables linear? Which plot should you check?

2) Do the residuals exhibit some pattern across values for the independent variable? Which plot should you check?

3) Is the variation of the dependent variable the same across all values of the independent variable? Which plot should you check?

4) Do the residuals follow the normal probability distribution? Which plot should you check?

5) Conclusion: Are the results from the regression analysis reliable?

Question Set IV

I. Model 1: Develop a multiple regression model to predict the Sales (Y) using all the other variables of interest as listed above. (Round all numerical answers to two decimal places as needed.)

1) Identify qualitative variable(s) from the list of variables of interest, if there is any, and create a dummy variable in Excel. (Note: use Excel function =IF() and use alphabetical order to assign values 0 and 1)

2) Perform a multiple regression with the Data Analysis Toolpak in Excel, and write down the regression equation for Model 1. (Enter in Excel the confidence level given in question I(5). Note: Excel requires that the independent variables be located in adjacent columns)

3) Explain the variation of the dependent variable after accounting for the effects of the other independent variables:

o What percentage of total variation in the Sales (Y) can be explained by Model 1?
o What is the value of the adjusted multiple coefficient of determination, R2A?

4) Is the overall regression model significant using α = 0.07? State the hypotheses and your conclusion.

5) Which independent variables are signifcant predictors using α = 0.005 or confidence level 99.5%? Which are not significant? (After accounting for the effects of the other independent variables)

II. Develop a second multiple regression model (Model 2) using ONE step of the "backward elimination method". (Remember: variables should be removed one at the time and regression analysis i.e. coefficients, R2, p-values, etc must be re-calculated at each step) (Round all numerical answers to two decimal places as needed.)

1) Which variable should you remove from Model 1? Why?

2) Perform a multiple regression with the Data Analysis Toolpak in Excel, and write down the regression equation for Model 2. (Enter in Excel the confidence level given in question II(5). Note: Excel requires that the independent variables be located in adjacent columns)

3) Explaining the variation of the dependent variable:

o What percentage of total variation in the Sales (Y ) can be explained by Model 2? How does this compare with the percentage you obtained with Model 1?

o What is the value of the adjusted multiple coefficient of determination, R2A? How does this compare with the one you obtained with Model 1?

4) Is the overall regression model (Model 2) significant using α = 0.04?

5) Are all the independent variables in Model 2 significant predictors using α = 0.01 or confidence level 99 % after accounting for the effects of the other independent variables?

(f) Prediction:

o Is Model 2 better than Model 1?

o Predict the sales (Y) with DayWeek = yes; Volume (Gallons) = 2931; Washes() = 76; Price (cents) = 145.7 using "the best" model (between Model 1 and Model 2). NOTE: you may or may not need to use all given values.

(g) Interpret regression coefficients.

o Interpret the coefficient of Washes.

III. Check the assumptions for regression analysis for the model you have chosen. Make necessary plots in Excel to justify.

1) Is the relationship between the dependent and independent variables linear?
2) Do the residuals exhibit some patterns across values of the independent variables?
3) Are the variations of the dependent variable the same across all values of the independent variables?
4) Do the residuals follow the normal probability distribution?
5) Conclusion: Are the results from the regression analysis reliable?

Format your assignment according to the following formatting requirements:

o The answer should be typed, using Times New Roman font (size 12), double spaced, with one-inch margins on all sides.

o The response also includes a cover page containing the title of the assignment, the student's name, the course title, and the date. The cover page is not included in the required page length.

o Also include a reference page. The Citations and references must follow APA format. The reference page is not included in the required page length.

Attachment:- Convenience-Shopping.rar

Request for Solution File

Ask an Expert for Answer!!
Applied Statistics: Stats301 applied statistical methods for business assignment
Reference No:- TGS03025405

Expected delivery within 24 Hours