Calculate the sample variance using the appropriate column


Question 1 -

The following table lists some variables that might be of interest in your next data analysis. For each variable, complete the associated table indicating whether it is categorical (and if so, is it nominal or ordinal) or numerical (and if so, is it discrete or continuous).

Variable

Categorical

Continuous

Example

Eye Color

nominal X

ordinal

discrete

continuous

1a

Sex





1b

Number of runs scored in a baseball game





1c

Profession





1d

Temperature, measured in Farenheit





1e

Confidence in one's ability to to statistics as measured by "yes/no" to the statement: "I will do well"





1f

Number of siblings





1g

Distance an individual can run in five minutes





1h

Ethnicity





1i

Number of MD's - who also have a PhD





1j

Lack of coordination as measured by time it takes an individual to complete a certain puzzle.





Question 2 -

Here is a hypothetical situation. In 2015 a program aimed at reducing infant mortality was implemented in two regions, Pepi and Quepi. The following table (this is hypothetical, sorry) shows the numbers of births and infant deaths in two regions (Pepi and Quepi) in each of two years: 2014 and 2016.

 

Pepi

Quepi

 

Births

Infant Deaths

Births

Infant Deaths

2014

100,000

300

1,000,000

5000

2016

100,000

60

1,000,000

4000

2a. In which region is there more convincing evidence that the reduction in mortality was caused by the program?

2b. If the program can be continued in one region ONLY, which would you choose? In developing your answer, you may assume that the reductions shown were in fact caused by the program.

Question 3 -

The following are some data on some famous statisticians. Yes! Florence Nightingale, among her other talents, was a statistician!

Statistician

Gender

Year of Birth

Year of Death

Sir Francis Galton

2

1822

1911

Karl Pearson

2

1857

1936

William Sealy Gosset

2

1876

1937

Ronald Aylmer Fisher

2

1890

1962

Harald Cramer

2

1893

1985

Prasanta Mahalanobis

2

1893

1972

Jerzy Neyman

2

1894

1981

Egon S. Pearson

2

1895

1980

Gertrude Cox

1

1900

1978

Samuel S Wilks

2

1906

1964

Florence Nightingale

1

1909

1995

David John Tukey

2

1915

2000

3a. By any means you like (by hand is just fine), create a stem-and-leaf summary of the data on the variable YEAR OF BIRTH. Display it here. Then use this visual summary to answer questions #3b - #3e below.

3b. Are there any outliers (i.e., extreme values) in this distribution? Explain.

3c. How would you describe the shape of this distribution? Explain.

3d. What is/are the most frequently occurring score(s) in this distribution? How many times does it/do they occur?

3e. Can we use this stem-and-leaf to obtain the original set of values for this variable? Explain.

Question 4 -

4a. When a distribution is skewed to the right

i) TRUE or FALSE: The median is greater than the mean.

ii) TRUE or FALSE: The distribution is uni-modal

iii) TRUE or FALSE: The majority of observations are less than the mean.

4b. The shape of a frequency distribution can be described using:

i) TRUE or FALSE: A box and whisker plot.

ii) TRUE or FALSE: A table of frequencies

iii) TRUE or FALSE: A histogram

4c. For the sample 3, 1, 7, 2 and 2:

i) TRUE or FALSE: The sample mean is 3

ii) TRUE or FALSE: The sample median is 7

iii) TRUE or FALSE: The range is 1

iv) TRUE or FA.LSE: The sample variance is 5.5

Question 5 -

The following table shows the numbers of geriatric admissions, each week from May through September, to a certain facility in each of two years, 2012 and 2013.

Week

# Admissions
2012

# Admissions
2013

Week

# Admissions
2012

# Admissions
2013

1

24

20

12

11

25

2

22

17

13

6

22

3

21

21

14

10

26

4

22

17

15

13

12

5

24

22

16

19

33

6

15

23

17

13

19

7

23

20

18

17

21

8

21

16

19

10

28

9

18

24

20

16

19

10

21

21

21

24

13

11

17

20

22

15

29

5a. By any means you like (by hand is just fine), summarize these data graphically. Display it here. Then use this visual summary to answer question #5b.

5b. Why do you think these two years were different? Note - There is no single correct answer here. I will accept any well-reasoned interpretation. I'm looking for you to think about what you see!

Question 6 -

6a. You read that the median income of U.S. households in 2010 was $49,455. In 1-2 sentences at most, explain in plain language what "the median income" is.

6b. The Census Bureau website gives several choices for "average income" in its historical income data. In 2010, the median income of American households was $49,455. The mean household income was $67,530. The median income of families was $60,395, and the mean family income was $78,361. The Census Bureau says, "Households consist of all people who occupy a housing unit. The term family' refers to a group of two or more people related by birth, marriage, or adoption who reside together". In at most 5 sentences, explain carefully why mean incomes are higher than median incomes and why family incomes are higher than household incomes.

6c. A January 2012 magazine article reported that the average income for readers of the business magazine Forbes was $217,000. In your opinion, is the median wealth of these readers greater or less than $217,000? In at most 1-2 sentences, explain your reasoning.

6d. The distribution of individual incomes in the United States is strongly skewed to the right. In 2008, the mean and median incomes of the top 1% of Americans were $558,726 and $1,137,680. Which of these numbers is the mean and which is the median? In at most 1-2 sentences, explain your reasoning.

6e. By any means you like (by hand is fine) which of the following two data sets is more spread out? Show your work. In at most 1-2 sentences, explain your reasoning.

Data set "A": 4  0  1  4  3  6

Data set "B": 5  3  1  3  4  2

Question 7 -

A box plot is the graph of a five number summary. The central box spans the quartiles. The line in the box mark the median. The size of the box is a measure of spread. The lines extending out from the box give an indication of extremes, if any. Side-by-side box plots are useful for comparing two distributions. As an example, consider the following table. It lists the average month's temperature (Farenheit) of Springfield, Massachusetts and San Francisco, California.

Month

Ave Temp (F)
Springfield

Month

Ave Temp (F)
San Francisco

January

32

January

49

February

36

February

52

March

45

March

53

April

56

April

55

May

65

May

58

June

73

June

61

July

78

July

62

August

77

August

63

September

70

September

64

October

58

October

61

November

45

November

55

December

36

December

49

7a. Obtain the five number summary for the average monthly temperatures, separately for each data set, Springfield versus San Francisco. Use these values to complete the following table.


Springfield

San Francisco

Minimum



Q1



Q2 = median



Q3



Maximum



7b. By any means you like (by hand is fine), produce a side-by-side box and whisker plot of the two distributions of average monthly temperatures. You will use this visual to answer question #7c.

7c. i) Are the 2 cities similar in their typical (median) average temp?

ii) Are the 2 cities similar in terms of temperature spread? Explain

iii) Which city requires owning a larger wardrobe of clothes?

Question 8 -

This last exercise gives you practice working with the fundamentals of calculations of the sample mean, the sample variance and the sample standard deviation. It also gives you practice producing and interpreting a histogram.

On the next page is a table of data on X = blood glucose levels (mmol/L) obtained from a simple random sample of n=40 first year medical students. The students are indexed using a subscript "i" that ranges from i = 1 to i = 40.

8a. First calculate the sample mean. To do this, obtain the sum of the individual blood glucose values and divide this by the sample size.

i) i=140 xi =

ii) n =

iii) Sample mean = i=140xi/n = fill in/fill in =

8b. Next, calculate the individual squared values of individual blood glucose levels. In developing your answer complete the entries to the 3rd column of the table. All done? Now obtain the sum of the squared values of the individual blood glucose levels. Enter this total at the bottom.

8c. Next, calculate the individual squared values of the deviations of the individual blood glucose levels about the sample mean. In developing your answer complete the entries to the 4th and 5th columns of the table. All done? Now obtain the sum of the individual squared values of the deviations of the individual blood glucose values about the sample mean. Enter this total at the bottom of the 5th column.

i

xi

xi2

(xi - x-)

(xi - x-)2

1

4.7




2

4.2




3

3.9




4

3.4




5

3.6




6

4.1




7

4.8




8

4.0




9

3.8




10

4.4




11

3.3




12

3.8




13

2.2




14

5.0




15

3.3




16

4.1




17

4.7




18

3.7




19

3.6




20

3.8




21

4.1




22

3.6




23

4.6




24

4.4




25

3.6




26

2.9




27

3.4




28

4.9




29

4.0




30

3.7




31

4.5




32

4.9




33

4.4




34

4.7




35

3.3




36

4.3




37

5.1




38

3.4




39

4.0




40

6.0




Total of column





8d. Calculate the sample variance using the appropriate column totals in TWO ways. Show your work. Tip - You should get the same answer, thus illustrating a shortcut when doing calculations by hand and clarifying the confusion you might have encountered when encountering more than one formula for this calculation.

i) s2 = i=140(xi -x-)2/(n-1)

ii) s2 = [i=140xi2] - [n][x-2]/(n-1)

8e. Finally, calculated the sample standard deviation.

8f. By any means you like (by hand is fine), produce a histogram of these data.

8g. Calculate the mean ±1 standard deviation and the mean ±2 standard deviations. Indicate these points on your histogram.

8h. What term best describes the shape of the distribution of blood glucose in this sample: symmetrical, skewed to the right, or skewed to the left?

Request for Solution File

Ask an Expert for Answer!!
Basic Statistics: Calculate the sample variance using the appropriate column
Reference No:- TGS02477753

Expected delivery within 24 Hours