The games of the 28th Olympiad were held in Athens, Greece, in the summer of 2004. The data file contains information about 131 of the participating nations, including the following variables.
Medals (Won at Athens Olympics)
Gross Domestic Product ($ Billion)
Population (Million)
Area (Million Sq. Km.)
Infant Deaths Per 1,000 Births
Inflation Rate
Fertility Rate
GDP Growth Rate
Telephones (Million)
The final medal count is perhaps somewhat misleading as a measure of national athletic excellence. For example, the United States won the most medals in absolute terms (Exhibit 1), but isn't even among the top ten in terms of medals per capita (Exhibit 2).
Country Gold Silver Bronze Total
United States 35 39 29 103
Russia 27 27 38 92
China 32 17 14 63
Australia 17 16 16 49
Germany 14 16 18 48
Japan 16 9 12 37
France 11 9 13 33
Italy 10 11 11 32
South Korea 9 12 9 30
United Kingdom 9 9 12 30
Exhibit 1: Top Medal Counts, 2004 Athens Olympics
Country Medals Population Medals/Million Citizens
Bahamas 2 297,477 6.72
Australia 49 19,731,984 2.48
Cuba 27 11,263,429 2.40
Estonia 3 1,408,556 2.13
Slovenia 4 1,935,677 2.07
Jamaica 5 2,695,867 1.85
Latvia 4 2,348,784 1.70
Hungary 17 10,045,407 1.69
Bulgaria 12 7,537,929 1.59
Greece 16 10,665,989 1.50
Exhibit 2: Top Medal Counts per Capita, 2004 Athens Olympics
Add a dummy variable to the data set, representing whether the nation was the host of the Olympics. (Hint: The 2004 Athens Olympic Games were hosted by Greece.) Perform a correlation analysis on all of the variables in the data set, and show the results here in descending order of importance.
Some interesting countries, such as Russia and Jamaica, are excluded from the data set because information was not available for all variables.
1. Add a dummy variable to the data set, representing whether the nation was the host of the Olympics. (Hint: The 2004 Athens Olympic Games were hosted by Greece.) Perform a correlation analysis on all of the variables in the data set, and show the results here in descending order of importance. Explain your results.
2. Show scatter diagrams of the two most important predictors, showing their relationship to "Medals". Use labels to indicate interesting outliers, show a linear trend line, and write something intelligent about your graphs.
3. Create a multiple regression model, based on your analysis above. Try to find the model that has the highest adjusted R-square value for predicting the number of Olympic medals won by a country.
4. Using your model from above, perform a hypothesis test to see whether the true effect of "Area" is less than one medal for every 1,000,000 square kilometers at the 5% level of significance, all other factors taken into account. Find the p-value of your test and explain its meaning.
5. Using your model from above, and assuming the same test, alpha, and variance as in Part 10, what would be the risk of a Type II error if the true effect of "Area" were in fact known to be 0.5 medals per 1,000,000 square kilometers?
6. Using your model from Part 9 above, calculate the residual error in "Medals" for each country. Show the residual errors for the ten countries who most "overperformed" in the Athens Olympics (i.e. won more medals than your model predicted they would) and the ten countries who most "underperformed". Explain your results.
7. Discuss the residual errors in this model. Use charts as appropriate.
8. Do these data provide evidence of a "home field advantage" in the Olympics? In other words, can we conclude that Greece won medals above and beyond what would otherwise been expected because it was the host country?