1) Read the attached news article from the BBC (Circumcision ‘reduces HIV risk' ) and answer the questions that follow.
(a) In experimental terms, what is the outcome (Y) of interest?
(b) What was the pre-measurement and when was it performed?
(c) What was the planned post-measurement and when was it to be performed?
(d) In experimental terms, what is the treatment (X) of interest?
(e) What is proposed causal mechanism for the link the researchers found?
(f) Why did the researchers not simply perform an observational study? That is, why not simply compare the outcome rate between men who were already treated and untreated before the study began?
(g) Was the treatment assigned randomly? How does this affect our inferences from the experiment?
(h) It is difficult to ensure such experiments are ‘double-blind'. Aside from any selection or ethnical issues, why is this?
(i) What was the outcome for the treated group?
(j) What was the outcome for the control group?
(k) The experiment ran into an ethical concern and was therefore stopped. Briefly explain that concern and how it relates to the Tuskegee syphilis experiment.
Relative to the location of the experiment, in Western Europe, this treatment is very rare (neonatal rate is 0-5%), yet the outcome rate is also much lower. In the US, this treatment is much more common (neonatal rate is 60%-70%), yet the outcome rate is on a par with Europe.
(i) How does this fact affect our judgement of the internal validity of the experiment?
(ii) The Centers for Disease Control (CDC) is contemplating mandating the treatment for all male babies born in the US. What type of validity (internal or external) should have most bearing on their decision? Should we expect a causal effect of the same magnitude when the treatment is applied in the US? Briefly explain with reference to the current observable outcome rates from the US and Europe.
2) For the following relationships, the conclusion is possibly spurious. For each relationship (i) identify a plausible ‘lurker' Z and (ii) very briefly describe how it affects the relationship in question.
(a) There is a high correlation between the number of police officers patrolling the streets (X) and number of crimes (Y).
Police departments should reduce the number of police officers patrolling the streets in order to reduce the number of crimes.
(b) Those without a passport (X) in the United States are more likely to have Type 2 diabetes (Y) than those with a passport.
The U.S. government should make acquiring a passport a requirement for all U.S. citizens in order to reduce rates of Type 2 diabetes.
(c) People with Medicare insurance (X) are much more likely to die (Y) than people with other types of insurance.
To prevent deaths, we should get rid of Medicare.
(d) Construction workers (X) as groups are much less likely to suffer from breast cancer during their lifetime (Y) than individuals that are not construction workers.
Thus, we conclude that working construction is good for reducing the risk of breast cancer and should increase the number of construction jobs.
(e) Skipping breakfast (X) is highly correlated with a high body-mass index (BMI) (Y).
The U.S. government should encourage everyone to eat breakfast regularly in order to combat obesity.
3) On the NBC show ‘Biggest Loser', overweight contestants attempt to lose weight for a cash prize. They are put on various training and eating regimes over the course of many weeks. Despite the efforts made by the nutritionists and trainers, three years af- ter the season in which they featured ended; around 50% of the contestants are heavier than they were when they began the show. Can you conclude from this fact that the show makes no difference in the lives of its contestants? In your answer, be sure to mention the relevant ‘counterfactual' and ‘control' group.
4) In this problem you will be asked to analyze real data on Boston voters and their homes. This data set contains a sample of Boston homeowners who are currently registered to vote in Boston. To access this data, login to the course website, and download the file called Bostonvoters2014.dta.
This dataset contains nine columns of information. The first six columns contain data on voters from the Boston voter file: the voter's occupation, age, gender and party registration are indicated by Occpt, Age, Gender, and Party, respectively. The variables Voted 2012 and Voted 2013 are indicators for whether the individual voted in 2012 and 2013, respectively. The final three columns contain information from the Boston Tax Assessor about each voter's home, including the 2013 assessed value in thousand U.S. dollars (Home_value), the home value (in U.S. dollars) divided by the number of square feet of living space in the home (Value_persqft), and the neighborhood in which the home is located (Neighborhood).
(a) How many individuals are included in the dataset (i.e., how large is our sample?)
(b) What is the mean home value in the sample? (Include correct units)
(c) What is the median home value in the sample? (Include correct units)
(d) Comment on the skewness of the home value variable Do prices reflect a leftward skew, a rightward skew, or is there relatively little skew in the data? Justify your answer.
(e) Generate a histogram of home values with 6 bins. According to this graphic, what is the modal category of home values in the dataset? You may approximate the range of values included in this category based on the histogram you generated.
(f) Generate a histogram of home values with 16 bins. According to this graphic, what is the modal category of home values in the dataset? You may approximate the range of values included in this category based on the histogram you generated. Comment on how your answer here compares to the modal category in part (e).
(g) Now consider the Value_persqft variable. What is the sample variance of Value_persqft? Be sure to include the correct units in your response.
Attachment:- Article.rar