Q1. A researcher is interested in knowing the average height of the men in a village. To the researcher, the population of interest is the in the village, the relevant population data are the _________ in the village, and the population parameter of interest is the _________.
Q2. There are 600 men in the village, and the sum of their heights is 3,522.0 feet. Their average height is _________ feet.
Q3. Instead of measuring the heights of all the village men, the researcher measured the heights of 12 village men and calculated the average to estimate the average height of all the village men. The sample for his estimation is _________, the relevant sample data are the _________ and the sample statistic is the _________.
Q4. If the sum of the heights of the 12 village men is 68.4 feet, their average height is _________ feet.
Consider the following set of sample data.
18 26 30 42 50 52 52 60 78 84
Q5. For the given data, the mean is _________ , the median is _________ , and the mode is _________.
Q6. Suppose the value 60 in the data is mistakenly recorded as 72 instead of 60. For the sample with this error, the mean is _________, the median is _________, and the mode is _________. The mean _________, the median _________, and the mode _________.
Q7. Suppose the value 60 in the original sample is inadvertently removed from the sample. For the sample with this value removed, the mean is _________ , the median is _________ , and the mode is _________. The mean _________, the median _________, and the mode _________ .
The New York Stock Exchange (NYSE) is the largest stock exchange in the world in terms of the market capitalizations of its listed companies, and it's the third largest by number of company listings. Trading at the NYSE is done on a physical trading floor.
Consider a hypothetical company XYZ listed on the NYSE. A stockbroker made the following purchases of XYZ shares:
Table 1
|
Time of Trade
|
Purchase Price per Share
|
Number of Shares
|
9:40 AM
|
$104.78
|
500
|
10:34 AM
|
$94.72
|
3,000
|
11:15 AM
|
$111.17
|
4,000
|
12:37 PM
|
$91.52
|
5,500
|
Q8. The broker bought _________ shares for a total price of _________.
Q9. The weighted mean price per share for these purchases is _________.
The stockbroker made the following sales of the XYZ shares he had purchased earlier:
Table 2
|
Time of Trade
|
Selling Price per Share
|
Number of Shares
|
10:17 AM
|
$103.53
|
1,500
|
10:53 AM
|
$94.47
|
2,000
|
1:20 PM
|
$113.17
|
2,500
|
3:37 PM
|
$89.02
|
7,000
|
Q10. The broker sold _________ shares for a total price of _________.
Q11. The weighted mean price per share for these sales is _________.
Q12. The broker _________ from the above trades.
Consider the following table showing the percentage annual returns (growth rates) and the corresponding annual growth factors for a retirement fund over the past 5 years.
Year
|
Return (%)
|
Growth Factor
|
1
|
22.3
|
1.223
|
2
|
37.8
|
1.378
|
3
|
-15.7
|
0.843
|
4
|
12.2
|
1.122
|
5
|
17.1
|
1.171
|
Q13. The geometric mean of the 5 growth factors is x-g = _________ . The arithmetic mean of the growth factors _________ .
Q14. The geometric mean tells us that the annual percentage returns grew at an average annual rate of _________ . The arithmetic mean of the annual percentage returns is _________ .
A growth chart is a plot of the percentiles of growth measurements, such as weight and height, for a population of infants or children. It is used by pediatricians to assess a child's growth over time.
The Centers for Disease Control and Prevention (CDC) is a U.S. agency devoted to the protection and promotion of public health. Through one of its units, the National Center for Health Statistics (NCHS), the CDC has developed growth charts for clinical use by health professionals. The most recent charts were published in 2000.
The 2000 CDC growth charts were developed using a reference population of infants. A pediatrician looks up one of the charts and finds that the 5th percentile for weights of baby girls at 12-1/2 months is 19.3 pounds. This means that
Q15. _________ of the 12-1/2-month-old baby girls in the reference population weigh 19.3 pounds or less, and
Q16. _________ of these baby girls weigh 19.3 pounds or more.
The 2000 CDC growth charts use a reference population of both breast-fed and formula-fed infants. It has been observed that breast-fed babies tend to gain weight more rapidly than formula-fed babies in the first 2 to 3 months of their lives, but they tend to weigh less than formula-fed babies from 6 to 12 months.
Q17. Sarah is a healthy baby who was exclusively breast-fed for her first 12 months. Which of the following is most likely a description of her weights (at 3, 6, 9, and 12 months of age) as percentiles of the CDC growth chart reference population?
- 80th percentile at 3 months; 80th percentile at 6 months; 80th percentile at 9 months; 90th percentile at 12 months
- 40th percentile at 3 months; 40th percentile at 6 months; 40th percentile at 9 months; 40th percentile at 12 months
- 70th percentile at 3 months; 40th percentile at 6 months; 30th percentile at 9 months; 25th percentile at 12 months
- 20th percentile at 3 months; 50th percentile at 6 months; 80th percentile at 9 months; 90th percentile at 12 months
The following are weights (in pounds) for a sample of 11 baby girls at the age of 10 months:
17.2 18.4 18.8 20.3 20.6 21.1 22 23.5 24.6 25.4 26.8
Q18. The 80th percentile for these sample data is _________ pounds.
Q19. In the sample data, _________ out of 11 observations are less than or equal to the 80th percentile, and _________ out of 11 observations are greater than or equal to this value.
Q20. For any data set, the first quartile is the _________ percentile, the second quartile is the _________ percentile and the _________, and the third quartile is the _________ percentile.
Q21. In the prior sample data, the first quartile is _________ pounds, and the third quartile is _________. Therefore, the interquartile range is _________.
Consider a data set containing the following values:
92 84 85 93 95 89 86 91
Q22. The smallest value is _________, and the largest value is _________. Therefore, the range is _________.
Q23. The first quartile is _________. The third quartile is _________. Therefore, the interquartile range (IQR) is _________.
Q24. The mean of the preceding values is 89.375. The deviations from the mean have been calculated:
2.625 -5.375 -4,375 3.625 5.625 -0.375 -3.375 1.625
Q25. If this is sample data, the sample variance is _________, the sample standard deviation is _________, and the coefficient of variation is _________.
Q26. If this is population data, the population variance is _________, the population standard deviation is _________, and the coefficient of variation is _________.
Q27. Suppose the smallest value of 84 is misrecorded as 8. The range would _________, the IQR would _________ , and the variance and standard deviation would _________.
Use the tool to obtain the mean and median of the retirement ages. (Hint: Select the Variable sliding panel for the variable Retirement Age, and click the Statistics button.)
Q28. The mean is _________, and the median is _________ . The mean is _________ than the median.
Q29. When the distribution is symmetric, the mean is _________ the median. When the distribution is skewed to the right, the mean is usually _________ the median. When the distribution is skewed to the left, the mean is usually _________ the median.
Q30. The presence of extremely large or small values in the data affects the mean _________ than the median. Therefore the ________ is the preferred measure of location when the distribution is skewed.
The Bureau of Labor Statistics (BLS) is the main fact-finding agency of the U.S. government in the fields of labor economics and statistics. Data from the Current Population Survey (CPS), conducted by the BLS and the Census Bureau, have been used to indicate a downward trend in retirement age. [Source: Gendell, M. (October 2001). Retirement age declines again in the 1990s. Monthly Labor Review, 124(10), 12-21.]
The following DataView tool displays a hypothetical data set consisting of annual income (as measured in thousands of dollars) and age of retirement for 100 retirees.
Use the tool to view the histogram of the incomes of the retirees, and answer the questions that follow. (Hint: Click either one of the Variable sliding panels in the bottom left-hand corner of the tool screen. Click the downward-pointing arrow next to Select Variable, and select the variable Income. Click the Histogram button in the middle of the left-hand side of the screen to view a histogram of its distribution.)
Q31. The distribution of the incomes is ________ . The skewness value calculated for this distribution is ________.
Q32. Use the tool to obtain the mean and median of the retirees' incomes. (Hint: On the Variable sliding panel for the variable Income, click the Statistics button to view computed statistics for the variable.)
The mean is ________, and the median is ________. The mean is ________ than the median.
Use the tool to view the histogram of the retirement ages of the retirees. (Hint: Click a Variable sliding panel in the bottom left-hand corner of the tool screen. Click the downward-pointing arrow next to Select Variable, and select the variable Retirement Age. Again, click the Histogram button.)
Q33. The distribution of the retirement ages is ________. The skewness value calculated for this distribution is ________.
The Bureau of Transportation Statistics (BTS) collects, analyzes, and disseminates information on U.S. transportation systems, including data on airline on-time performance.
Consider departure-time data for all flights of an unspecified major airline out of New York City's John F. Kennedy Airport from December 1 through 7, 2007. These data, obtained from BTS, can be viewed in the following DataView tool.
For each flight in the sample, data was collected on two variables: its date and its departure delay (in minutes), which is computed as the difference between the actual and scheduled departure times. A negative value for the departure delay means that the flight departed early.
Q34. Use the DataView tool to obtain the mean and standard deviation of the departure delays. The mean departure delay is ________ minutes. The standard deviation of the departure delays is ________ minutes. (Hint: Click one of the Variable sliding panels on the left side of the tool screen, and select the variable named Departure Delay. Then click on the Statistics button. You will see a screen showing different statistics calculated for the variable.)
Q35. Observation 51 in the data set shows a flight that was scheduled to leave at 4:20 PM but was delayed. The z-score for its departure delay is ________ , which means that the departure delay is ________ standard deviations away from the mean. The departure delay for this observation can be considered an outlier, because it is ________ than 3 standard deviations away from the mean.
Hint: To obtain the departure delay for the flight corresponding to observation 51, click the Data Set panel and the Observations button. Scroll down until you reach observation 51.
Q36. For any set of data, Chebyshev's theorem tells you that at least ________ of the data values must lie within 1.58 standard deviations of the mean.
Use the Data view tool to determine the proportion of data values within 1.58 standard deviation of the mean.
Q37. When a data set has a symmetrical mound-shaped or bell-shaped distribution, the Empirical Rule tells you that ________ of the data values will be within 1 standard deviation of the mean and ________ will be within 2 standard deviations of the mean.
Q38. Use the DataView tool to examine the shape of the distribution of departure delays. The distribution of the departure delays is ________. Therefore, the Empirical Rule ________ hold for this distribution.
The return on a stock is the percentage profit or loss from investing in it. It is measured by the percentage change in the stock price. For example, a one-day return is the difference between prices today and yesterday, expressed as a percentage of yesterday's price.
It has been observed that stock returns exhibit extreme, outlying values. Consider a data set consisting of closing stock prices and returns for Google Inc. for the 100 trading days from August 8 through December 28, 2006.
You can view this data set in the following DataView tool. Use the tool to help you answer the questions that follow.
Q39. Which of the following forms the five-number summary of Google's one-day returns for the above time period?
- -4.01%, -0.68%, 0.00%, 0.83%, 7.89%
- -1.38%, -0.68%, 0.00%, 0.83%, 2.09%
- -4.01%, 0.00%, 0.21%, 7.89%, 100
- -1.38%, -0.93%, -0.57%, -0.32%, 0.00%
- -0.57%, -0.32%, 0.00%, 0.30%, 0.62%
Use the Box Plots feature in the tool to obtain a box plot of the one-day returns.
Q40. The box plot shows ________ data values identified as outliers. Low-valued outliers correspond to extreme ________, while high-valued outliers correspond to extreme ________.
Q41. These data values are considered outliers because they are either the ________ the lower limit of ________ or upper limit of ________ for the box plot.
Q42. The smallest data value that isn't an outlier is ________ .
Q43. The largest data value that isn't an outlier is ________ .
Consider a data set consisting of observations for three variables: x, y, and z. Their sample means, variances, and standard deviations are shown in Table 1.
Table 1
|
Sample Mean
|
x- = 5
|
y- = 4
|
z- = 5
|
Sample Variance
|
Sx^2 = 1
|
Sy^2 = 3
|
Sz^2 = 13
|
Sample Standard Deviation
|
Sx = 1
|
Sy = 1.732
|
Sz = 3.606
|
Table 2 shows the observations for x and y and their corresponding deviations from the sample means.
Table 2
|
xi
|
yi
|
xi - x-
|
yi - y-
|
6
|
5
|
1
|
1
|
5
|
2
|
0
|
-2
|
4
|
5
|
-1
|
1
|
Q44. The sample covariance between x and y is ________.
Q45. The sample correlation coefficient between x and y is ________.
Table 3 shows the observations for y and z and their corresponding deviations from the sample means.
Table 3
|
yi
|
zi
|
yi - y-
|
zi - z-
|
5
|
1
|
1
|
-4
|
2
|
6
|
-2
|
1
|
5
|
8
|
1
|
3
|
Q46. The sample covariance between y and z is ________.
Q47. The sample correlation coefficient between y and z is ________.
Table 4 shows the observations for x and z and their corresponding deviations from the sample means.
Table 4
|
xi
|
zi
|
xi - x-
|
zi - z-
|
6
|
1
|
1
|
-4
|
5
|
6
|
0
|
1
|
4
|
8
|
-1
|
3
|
Q48. The sample correlation coefficient between x and z is ________.
Q49. The sample variance coefficient between x and z is ________.
Q50. Select the best conclusion based on your calculations of the preceding correlation coefficients.
- The calculations show a negative relationship between x and y, a positive relationship between y and z, and a negative relationship between x and z.
- The calculations show no linear relationship between x and y, a negative linear relationship between y and z, and a strong negative linear relationship between x and z.
- The calculations show a negative linear relationship between x and y, a positive linear relationship between y and z, and no relationship between x and z.
- The calculations show a positive linear relationship between x and y, a negative linear relationship between y and z, and a negative linear relationship between x and z.
The correlation coefficients have been computed for five sets of sample data values and are shown in ascending order in the following table:
-0.92 -0.76 0.00 0.66 0.82
The following are scatter diagrams (including the trendline) for each set of data values. (Scales for the x- and y-axes are the same on all five plots.)
Study the plots, and then match each plot with its correlation coefficient. Click on each of the five boxes within the plots, and then enter the value of the correlation coefficient.
Hint: To match up the correlation coefficients to their corresponding plots, remember to consider the direction of the relationship (which of the preceding plots show a negative relationship and which show a positive relationship?) and how closely the points fit the line (correlations whose absolute values are near 1 indicate that the points fit the line very closely).