PROJECT-
For this Course Project you will collect data, perform preliminary data analysis, build and analyze a model, and use the results of your analysis to make predictions, draw conclusions, and support decisions.
The Project will be conducted in three phases:
Phase I: Collect data and describe your data set. Please include: a description of what the data is, how it was collected (if known), type of variables (categorical/ continuous), unit of analysis, and business scenario.
Phase II: Perform preliminary analysis of your data, using descriptive statistics. Please include: central tendencies, variability, normality, and a visual representation for each variable as well as correlations between variables, and your preliminary thoughts on which variables will be included in the regression model (i.e. which are independent variables and which is the dependent variable; these can and probably will change!). IF you would like to include regression analysis for me to look at then I will give you feedback.
The first two phases will be graded based on a satisfactory submission. If the first submission is on-time and satisfactory, then full credit will be awarded. In the case of an unsatisfactory submission five points may be deducted for each required re-do.
Phase III: Build a multiple regression model from your data, and prepare a business report that includes all of your previous work, and that presents a recommendation to a decision-maker based on your model and analysis.
PROJECT DETAILS
Phase I, Data Collection
You may collect your data from (almost) any source(s). The objective is to include a numerical response (dependent) variable that can be predicted from some number of other (independent) variables. These data do not have to come from the same source, but should be compatible as data sets. Data should be cross-sectional (no time-series data).
The minimum requirement is 50 observations with ten independent variables. The requirement is to include a numerical response (dependent) variable that can be predicted from some other (independent) variables. Numerical dependent variables are better, but up to 3 may be categorical (max 3 categories) or Binary. These data do not have to come from the same source, but should be compatible as data sets (i.e., if your response is a monthly result over a ten-year period, your other data should cover the same time period and increments). The minimum requirement is 50 observations (50 countries, 50 companies, 50 counties, whatever) with ten independent variables and one dependent variable (11 overall). It is best not to have one the variables at 50 points in time, unless the points in time are quite close to each other. It would be better to have one or a few points in time with lots of observations at that time. [Beware the tautology: do not collect temperature and humidity to "predict" the heat index!] Ensure that your data set will allow you to draw relevant conclusions about something that matters. The data may be from any field (preferably business-related) and should be collected so that you can establish relationships among your data to support some sort of a conclusion or recommendation. Please explain your planned business scenario - i.e. who would need to predict this DV and what would they use results for?
The submission will be in the form of an Excel file submitted in Canvas with a summary of what it is and where it came from.
Phase II, Preliminary Data Analysis
Apply descriptive statistics to your data set. This can include graphical depictions as well as some basic calculated statistics. Since you will be building a multivariate model, the correlations between your independent variables should be included. You should, at this point, be able to make some preliminary observations about your data. These observations (and any others you come up with later) should make their way into your business report, but will generally appear in appendices unless you determine them to be critical to the decision you are recommending. This submission should be a word document with your excel file also attached.
For each variable separately:
- Variable Name
- Description (what is it?)
- Units
- Central Tendency (mean, median, mode - use the appropriate one!)
- Variability (range, standard deviation)
- Normal distribution? (continuous variables only)
- Outliers? What did you decide to do with the outliers?
- Correlation with your Dependent variable
- Concerning correlations with other Independent variables (.7 or higher)
- Visual representation of variable
Overview of Data
- After running all descriptive information, do you have any thoughts on which may be better predictors of your dependent variable or thoughts overall of how things look?
Phase III: Model Construction and Business Report
You will build a multiple regression model from your data using the techniques we have learned in the course. You should decide here how you intend to use your model to conduct analysis, make predictions, and support decisions.
You will wrap all of your work up in a business report. Remember that the target is an executive who you will ask for a decision based on your recommendation. Perform analysis with your model, interpret your model, include your calculations and the original data (in appendices) but present the bottom line to the decision-maker up front. The report will be submitted in paper copy at the beginning of class. The clear plastic binder is highly discouraged.
While many organizations suggest a format for a business report, there are as many that do not, so the presentation is up to you. However, the following page may be used as a guideline.
Business Report Format-
Cover Sheet
Title. Indicate who the report is for, and what the report is about. (Use this to establish the "setting" for your instructor to grade your submission.)
Your name and position. (Again to establish context for the grader.)
Executive Summary. A single paragraph that an executive can read and immediately know what decision you are recommending and why.
Main Body
A 2-3 page report that tells the executive what decision should be made, and why the decision should be what it is. This should reference (and may include) the model you are using to support the decision-making process, and may also describe how confident the executive should be when making this decision. (In extreme cases the report can go up to 5 pages. Business reports not intended for senior executives may be longer, based on the organization's needs.)
BLUF! (Bottom Line Up Front!) The decision should be clear after the first few sentences, and definitely by the end of the first paragraph.
Include only that information that will be critical to the executive's decision-making process.
Refer to all supporting data and analyses that are included in appendices. Appendices should appear in order of importance, and should be referenced in that order.
Appendices
(No page limit, whatever is appropriate to describe the following)
A. Model and Interpretation
Show the final model (Y=....) you developed to support the decision, and interpret it, to include discussing the effects of the ranges of your input variables. This is where you discuss the meaning and relationship between predictors and outcome (i.e. when Y increases, what happens to X?) there does not need to be "stats language" here. It can be very helpful to plug in values to demonstrate how the model works.
B. Model Statistical Analysis
Discuss the strength of the model in terms of how it supports the decision-making process. Include the relevant Excel output that supports the quality of the model.
- Correlation and multiple regression analyses were conducted to examine the relationship between Y and X(s)....
- Discuss normality, missing data problems (if any), outliers (if any), and correlations between Y and X(s) - strength, direction, and r^2.
- Explain the MR output - r, r^2, F, p. Explain significant beta weights - t, p, relationship
- Include final tables hereto refer to when discussing results.
C. Model Development
Explain the process you used to turn the data into a model. Explain predictors that you started with and did not include in your final model with rationale. Discuss how you checked for assumptions. Discuss variable elimination and transformation, as well as any other clever modeling techniques you used. You do not have to include every step of your process, but you should show critical analyses that led to important modeling decisions.
D. Data Analysis
Show your descriptive and graphical analysis of the data, to include all the observations that might contribute to the modeling process.
E. Data
Describe briefly the data set and include the sources. For very small data sets you may include them. For other data sets (hundreds of observations) or larger, do not waste your company's paper.
Attachment:- Assignment.rar