Question: Your submission must be your original work. No more than a combined total of 30% of the submission and no more than a 10% match to any one individual source can be directly quoted or closely paraphrased from sources, even if cited correctly.
You must use the rubric to direct the creation of your submission because it provides detailed criteria that will be used to evaluate your work. Each requirement below may be evaluated by more than one rubric aspect. The rubric aspect titles may contain hyperlinks to relevant portions of the course.
Part 1: Data Analysis
Prepare the data provided in the attached, "Raw Data and Linear Regression." Remove any potential errors or outliers, duplicate records, or data that are not necessary to address the problem or scenario.
1. Explain why you removed each column or row from the raw data file or why you imputed data in the empty fields as you prepared the data for analysis. Include a clean data set with your submission.
2. Create data sheets using the cleaned data. Provide the following tables with accurate counts, and vertical or horizontal bar graphs to represent the requested aggregated data. Be sure all tables are appropriately labeled.
- Table: date and number of events
- Bar graph: date and number of events
Table: number of incident occurrences by event type
- Bar graph: number of incident occurrences by event type
- Table: sectors and total number of events
- Bar graph: sectors and total number of events
3. Describe the fit of the linear regression line to the data, using the linear regression model that is provided in the attachment. Provide graphical representations or tables as evidence to support your description.
4. Describe the impact of the outliers on the data, using the linear regression model that is provided in the attachment. Provide graphical representations or tables as evidence to support your description.
5. Provide a residual plot and explain how to improve the linear regression model based on your interpretation of the plot.
Part 2: Simulation and Recommendation
Run a simulation (Monte Carlo) based on a normally distributed random variable of the same mean and standard deviation as the variable "Number of officers at the scene" in the clean data set.
1. Determine if the police department currently qualifies for the funding. Provide your simulation results as evidence to support your findings.
2. Calculate the probability that the department will or will not qualify for the funding in the future. Provide evidence to support your findings.
3. Describe the precautions or behaviors that should be exercised when working with and communicating about the sensitive data in this scenario.
4. Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.