Assignment: Probability
Overview
You will create an Excel .xlsx file with two tabs that perform data setup and analysis that mimic techniques used in the current module. The left tab will be called "Data" and contain the dataset described below (and nothing else). The right tab will be called "Analysis" and will contain the elements described in the Analysis section below. Be sure to name your file as described in the Turning In Your Assignment section below.
This is an individual assignment. No coordination, cooperation, or communication with your classmates is permitted. The penalty for detected unauthorized collaboration is a zero for all participants. Review my policy on cheating and the honor code which is also linked from the syllabus.
(The note above is not included because I expect students to set out to cheat on assignments. I worry more about inadvertently crossing the line where collaboration is encouraged in discussions.)
Data Set
This assignment uses the Area Resource File data extract available from the Course Resource Page (linked from the course's Canvas home page). This page explains the source of the data file, offers some hints on how to save it for reuse in multiple assignments, and includes descriptions of the columns in the data. You may wish to print the descriptions or otherwise save them so they will be available in this and subsequent assignments.
Data Modification and Workbook Preparation
Workbook Tabs
Your workbook will have two required tabs and one optional tab in the order (from left to right) listed below:
1. Data: This should start with the original data from the ARF_Extract.xlsx file available from the Course Download Page. The first entry on the page should be in Cell A1 just as the data came to you. You will modify this data as itemized in the Data Modifications section below.
2. Analysis: You will place the results of your analysis on this page. See the Analysis section below for content and layout information.
3. Comments (Optional): If there is anything you need (or want) me to be aware of when I am reviewing your assignment add a third tab called Comments and use a text box (from the Insert menu) to add your comments. Do not use the comments box on the upload/submission page as I do not have these available when I am grading assignments. Most students will not need to use this tab but, again, if you do want me to be aware of something at the time I am reviewing the assignment use this technique. Do not add this tab if you have nothing extra to tell me about the assignment.
Data Modifications
You will create four new columns and delete some unneeded columns from your data.
Column Deletions
If you started with a clean sample data download you should have columns A-AL in use on the data sheet. Delete the ranges of columns listed below from the spreadsheet. (Note that deleting these columns has no impact on your calculations for this assignment. It just reduces the spreadsheet size and makes uploading and downloading the assignment easier. In future assignments you will want to pay close attention to which columns you will use.)
You should start deleting columns from the right side of the spreadsheet. If you delete starting from the left you will end up renaming existing columns and the ranges listed below will not be valid after the first deletion.
1. Columns O (Surgeries) - AL (Med Home Value)
2. Columns K (Pop/Sq Mile) - L (Gen Hosp)
3. Columns D (Farming) - I (Persistent Poverty)
Column Additions
Add the following columns in the order specified from left to right to the right of the last remaining column after the column deletions (Hosp Beds).
1. "Admits / 1k" (you will not use the quotes). Fill this column with the annual hospital admissions per 1,000 in population for each county.
2. "Beds / 1k". Fill this column with the number of hospital beds per 1,000 in population for each county.
3. "Admit Sz". Use a nested IF function to create a categorical label based on the county's rank in admissions / 1,000 population according to the table below.
Cutoff Value Display Value†
< 80 1 Low Admits
< 150 2 Avg Admits
< 220 3 Med Admits
< 290 4 High Admits
>= 290 5 VH Admits
4. "Bed Sz". Use a nested IF function to create a categorical label based on the county's rank in beds / 1,000 population according to the table below.
Cutoff Value Display Value†
< 4 1 Low Beds
< 9 2 Avg Beds
< 14 3 Med Beds
< 19 4 High Beds
>= 19 5 VH Beds
† Include the numeric digits in the display values so that the first value's entry in the Excel formula will be "1 Low ". When Excel creates pivot tables it will sort the rows and columns by their labels so we are using the numbers to force the sort to correspond to the order we want the labels in. (Can you figure out what order the rows would be in if we omitted the digits?)
Analysis
Create each of the elements described below from top to bottom on the Analysis sheet. Don't cramp them. Put a bold title matching the description of each element below (the text in bold brown font at the beginning of each numbered point) left justified just above where you start each point's content.
1. Contingency Table of Bed Count and Admissions Ranks: Create a contingency table similar to that in Figure 5.6 with the Admission Rank numbers as row titles and the Bed Count Rank numbers as the column labels. Use this layout for this and the next part of the assignment. Show the Count of counties in each cell.
1. Hint: You will have to add one of the numeric values to the table and convert it to a Count as shown in the video.
2. Select the count cells as well as the total row and column cells in the table and use Excel's Format Cells to show the thousands separators.
2. Joint Probabilities for Bed Count and Admissions Ranks: Create a joint probabilities table similar to that in the lower half of Figure 5.7 for the contingency table data. Show all probabilities to 4 digits to the right of the decimal place using the Format Cells capability.
3. Binomial Low Bed Density Report Probabilities: The Centers for Disease Control monitor reports of certain highly infectious diseases and is prepared to offer advice on containment to county health officials. Because counties in the lower tier (#1) of hospital beds / 1k population tend not have the health resources available counties with more beds the CDC maintains a special team to support these counties in the event of a reportable condition. Assume that any county has an equal probability to have a reportable condition (which means to ignore any possible correlation between population size and density and the number of outbreaks). Build a table of probabilities similar to that in either Figure 5.15 or Figure 5.17 showing for eight county reports drawn from the list of counties the probabilities that zero, one, two, ..., through all eight are from counties in the lowest tier of beds / 1k population. (You will have nine rows of data plus column headers). Include a cumulative probability column.
1. In a separate line below the table identify the probability that more than three of the reports will be from counties with the lowest bed density.
4. Poisson Distribution of Low Bed Density Report Submissions: Assume that the CDC receives 100 total reports per day and, as above, that any county has an equal likelihood of having a reportable condition. Construct a table similar to that in Figure 5.20 (or 5.21, though that is a lot more work). Then report the probability that more than fifteen reports from a county in the lowest tier of bed density will arrive in a single day.
Attachment:- Data_Sheet_for_Homework.rar