Calculate the sample variance using the appropriate column, Basic Statistics

Calculate the sample variance using the appropriate column

Question 1 -

The following table lists some variables that might be of interest in your next data analysis. For each variable, complete the associated table indicating whether it is categorical (and if so, is it nominal or ordinal) or numerical (and if so, is it discrete or continuous).

Variable		Categorical		Continuous
Example	Eye Color	nominal X	ordinal	discrete	continuous
1a	Sex
1b	Number of runs scored in a baseball game
1c	Profession
1d	Temperature, measured in Farenheit
1e	Confidence in one's ability to to statistics as measured by "yes/no" to the statement: "I will do well"
1f	Number of siblings
1g	Distance an individual can run in five minutes
1h	Ethnicity
1i	Number of MD's - who also have a PhD
1j	Lack of coordination as measured by time it takes an individual to complete a certain puzzle.

Question 2 -

Here is a hypothetical situation. In 2015 a program aimed at reducing infant mortality was implemented in two regions, Pepi and Quepi. The following table (this is hypothetical, sorry) shows the numbers of births and infant deaths in two regions (Pepi and Quepi) in each of two years: 2014 and 2016.

	Pepi		Quepi
	Births	Infant Deaths	Births	Infant Deaths
2014	100,000	300	1,000,000	5000
2016	100,000	60	1,000,000	4000

2a. In which region is there more convincing evidence that the reduction in mortality was caused by the program?

2b. If the program can be continued in one region ONLY, which would you choose? In developing your answer, you may assume that the reductions shown were in fact caused by the program.

Question 3 -

The following are some data on some famous statisticians. Yes! Florence Nightingale, among her other talents, was a statistician!

Statistician	Gender	Year of Birth	Year of Death
Sir Francis Galton	2	1822	1911
Karl Pearson	2	1857	1936
William Sealy Gosset	2	1876	1937
Ronald Aylmer Fisher	2	1890	1962
Harald Cramer	2	1893	1985
Prasanta Mahalanobis	2	1893	1972
Jerzy Neyman	2	1894	1981
Egon S. Pearson	2	1895	1980
Gertrude Cox	1	1900	1978
Samuel S Wilks	2	1906	1964
Florence Nightingale	1	1909	1995
David John Tukey	2	1915	2000

3a. By any means you like (by hand is just fine), create a stem-and-leaf summary of the data on the variable YEAR OF BIRTH. Display it here. Then use this visual summary to answer questions #3b - #3e below.

3b. Are there any outliers (i.e., extreme values) in this distribution? Explain.

3c. How would you describe the shape of this distribution? Explain.

3d. What is/are the most frequently occurring score(s) in this distribution? How many times does it/do they occur?

3e. Can we use this stem-and-leaf to obtain the original set of values for this variable? Explain.

Question 4 -

4a. When a distribution is skewed to the right

i) TRUE or FALSE: The median is greater than the mean.

ii) TRUE or FALSE: The distribution is uni-modal

iii) TRUE or FALSE: The majority of observations are less than the mean.

4b. The shape of a frequency distribution can be described using:

i) TRUE or FALSE: A box and whisker plot.

ii) TRUE or FALSE: A table of frequencies

iii) TRUE or FALSE: A histogram

4c. For the sample 3, 1, 7, 2 and 2:

i) TRUE or FALSE: The sample mean is 3

ii) TRUE or FALSE: The sample median is 7

iii) TRUE or FALSE: The range is 1

iv) TRUE or FA.LSE: The sample variance is 5.5

Question 5 -

The following table shows the numbers of geriatric admissions, each week from May through September, to a certain facility in each of two years, 2012 and 2013.

Week	# Admissions 2012	# Admissions 2013	Week	# Admissions 2012	# Admissions 2013
1	24	20	12	11	25
2	22	17	13	6	22
3	21	21	14	10	26
4	22	17	15	13	12
5	24	22	16	19	33
6	15	23	17	13	19
7	23	20	18	17	21
8	21	16	19	10	28
9	18	24	20	16	19
10	21	21	21	24	13
11	17	20	22	15	29

5a. By any means you like (by hand is just fine), summarize these data graphically. Display it here. Then use this visual summary to answer question #5b.

5b. Why do you think these two years were different? Note - There is no single correct answer here. I will accept any well-reasoned interpretation. I'm looking for you to think about what you see!

Question 6 -

6a. You read that the median income of U.S. households in 2010 was $49,455. In 1-2 sentences at most, explain in plain language what "the median income" is.

6b. The Census Bureau website gives several choices for "average income" in its historical income data. In 2010, the median income of American households was $49,455. The mean household income was $67,530. The median income of families was $60,395, and the mean family income was $78,361. The Census Bureau says, "Households consist of all people who occupy a housing unit. The term family' refers to a group of two or more people related by birth, marriage, or adoption who reside together". In at most 5 sentences, explain carefully why mean incomes are higher than median incomes and why family incomes are higher than household incomes.

6c. A January 2012 magazine article reported that the average income for readers of the business magazine Forbes was $217,000. In your opinion, is the median wealth of these readers greater or less than $217,000? In at most 1-2 sentences, explain your reasoning.

6d. The distribution of individual incomes in the United States is strongly skewed to the right. In 2008, the mean and median incomes of the top 1% of Americans were $558,726 and $1,137,680. Which of these numbers is the mean and which is the median? In at most 1-2 sentences, explain your reasoning.

6e. By any means you like (by hand is fine) which of the following two data sets is more spread out? Show your work. In at most 1-2 sentences, explain your reasoning.

Data set "A": 4 0 1 4 3 6

Data set "B": 5 3 1 3 4 2

Question 7 -

A box plot is the graph of a five number summary. The central box spans the quartiles. The line in the box mark the median. The size of the box is a measure of spread. The lines extending out from the box give an indication of extremes, if any. Side-by-side box plots are useful for comparing two distributions. As an example, consider the following table. It lists the average month's temperature (Farenheit) of Springfield, Massachusetts and San Francisco, California.

Month	Ave Temp (F) Springfield	Month	Ave Temp (F) San Francisco
January	32	January	49
February	36	February	52
March	45	March	53
April	56	April	55
May	65	May	58
June	73	June	61
July	78	July	62
August	77	August	63
September	70	September	64
October	58	October	61
November	45	November	55
December	36	December	49

7a. Obtain the five number summary for the average monthly temperatures, separately for each data set, Springfield versus San Francisco. Use these values to complete the following table.

	Springfield	San Francisco
Minimum
Q1
Q2 = median
Q3
Maximum

7b. By any means you like (by hand is fine), produce a side-by-side box and whisker plot of the two distributions of average monthly temperatures. You will use this visual to answer question #7c.

7c. i) Are the 2 cities similar in their typical (median) average temp?

ii) Are the 2 cities similar in terms of temperature spread? Explain

iii) Which city requires owning a larger wardrobe of clothes?

Question 8 -

This last exercise gives you practice working with the fundamentals of calculations of the sample mean, the sample variance and the sample standard deviation. It also gives you practice producing and interpreting a histogram.

On the next page is a table of data on X = blood glucose levels (mmol/L) obtained from a simple random sample of n=40 first year medical students. The students are indexed using a subscript "i" that ranges from i = 1 to i = 40.

8a. First calculate the sample mean. To do this, obtain the sum of the individual blood glucose values and divide this by the sample size.

i) _i=1∑⁴⁰ x_i =

ii) n =

iii) Sample mean = _i=1∑⁴⁰x_i/n = fill in/fill in =

8b. Next, calculate the individual squared values of individual blood glucose levels. In developing your answer complete the entries to the 3rd column of the table. All done? Now obtain the sum of the squared values of the individual blood glucose levels. Enter this total at the bottom.

8c. Next, calculate the individual squared values of the deviations of the individual blood glucose levels about the sample mean. In developing your answer complete the entries to the 4^th and 5^th columns of the table. All done? Now obtain the sum of the individual squared values of the deviations of the individual blood glucose values about the sample mean. Enter this total at the bottom of the 5^th column.

i	x_i	x_i²	(x_i - x^-)	(x_i - x^-)²
1	4.7
2	4.2
3	3.9
4	3.4
5	3.6
6	4.1
7	4.8
8	4.0
9	3.8
10	4.4
11	3.3
12	3.8
13	2.2
14	5.0
15	3.3
16	4.1
17	4.7
18	3.7
19	3.6
20	3.8
21	4.1
22	3.6
23	4.6
24	4.4
25	3.6
26	2.9
27	3.4
28	4.9
29	4.0
30	3.7
31	4.5
32	4.9
33	4.4
34	4.7
35	3.3
36	4.3
37	5.1
38	3.4
39	4.0
40	6.0
Total of column

8d. Calculate the sample variance using the appropriate column totals in TWO ways. Show your work. Tip - You should get the same answer, thus illustrating a shortcut when doing calculations by hand and clarifying the confusion you might have encountered when encountering more than one formula for this calculation.

i) s² = _i=1∑⁴⁰(x_i -x^-)²/(n-1)

ii) s² = [_i=1∑⁴⁰x_i²] - [n][x^-2]/(n-1)

8e. Finally, calculated the sample standard deviation.

8f. By any means you like (by hand is fine), produce a histogram of these data.

8g. Calculate the mean ±1 standard deviation and the mean ±2 standard deviations. Indicate these points on your histogram.

8h. What term best describes the shape of the distribution of blood glucose in this sample: symmetrical, skewed to the right, or skewed to the left?

Request for Solution File

Ask an Expert for Answer!!

Basic Statistics: Calculate the sample variance using the appropriate column

Reference No:- TGS02477753

Have a Question? (oR Write a Review)

Recent Questions Asked Basic Statistics

Q : Business managers it professional and it users face a

Q : Conduct research using the library and the internet to find

Q : Peter tomaino the owner of an adult video store was

Q : Describes vulnerabilities related to the failure to harden

Q : Calculate the sample variance using the appropriate column

Q : Explain how these scientific laws apply to energy use

Q : Evaluate portfolio performance and update ips feedback loop

Q : What are some reasons for and against leaving decisions in

Q : James kimball the defendant was charged with and convicted

Understanding child aggression

Discuss psychopathology and deviant criminal behaviors

Discussion about infant daughter is hearing impaired

Examples of physical, cognitive, and social-emotional

Discuss socialized speech reflects preschoolers

Research the vb-mapp, ablls-r, and peak

What most interesting about this solution-focused theory