1. Book questions 9.10, 9.12, 9.16, 9.20, 9.42, 9.44
2. Book questions (remember Chapter 16 is downloadable) 16.45, 16.46, 16.48, 16.50, 16.52
For the rest of the class we will start dealing with some real data! This homework will focus on some survey data from Pew on online dating from 2013. In order to do the assignment, you will need to recode some of the data into new variables.
• Be very careful of missing values!!!
• You will need to set ”Don’t Know” and ”Refused” to a missing value in almost all cases format labels when coding.
• The data are formatted. So you need to use the underlying values (1,2,3) instead of the before analyzing the data.
– recode command or create new variable
– use ”tab variable, missing” to check the number of missing.
Your boss at an online graduate education firm is considering advertising on a online dating site. There are concerns that the online dating population may already be either too lowly educated for your graduate program or too highly educated to need a masters. Your boss asks you to investigate so you download a survey on online dating from Pew.
3. Using the non-parametric KolmogorovSmirnov test (ksmirnov), examine to see if the sampling distributions of people who use online dating services (date1a) is the same across education level (educ2). (remember to recode ”don’t know” and ”refused” responses in the variables first) Plot the distribution of people who use online dating services versus those who don’t on education levels. Please examine the plots and any warnings and determine if you think the test is valid.
hint: kdensity on_educ if on_date == 0, plot(kdensity on_educ if on_date == 1) }$
4. Using the a t-test, test the hypotheses that the means for the two groups (those who use online dating and those who do not) have the same education levels. Also plot the data in a side by side boxplot and properly label the axis. Does the t-test meet the assumptions and is this test appropriate given the data?
5. Using a nonparametric test, test the hypotheses that the two medians are equal. Do the results of this test agree with the t-test? Is this nonparametric test appropriate given the data?
Your boss decides that they do want to advertise on an online dating site. She has seen many commercials for Match.com but also sees that it is the most expensive site to buy ads. Your company can either advertise on Match.com or multiple other sites. She asks you to see if advertising solely on Match.com would make since based on the proportion of people using the site.
6. Test the hypotheses that the proportion of Match.com users is the same as all other sites combined.
Evaluate the assumptions for the test, are they met?
In order to do this test, you will need to:
• Create a new variable to see if Match.com is listed in any row of the following 4 columns
• You will need to make sure that missing values are retained and that don’t know and
• The statistical test is easy, but doing the data manipulation for this problem (date1bm1, date1bm2, date1bm3, date1bm4). Use the value of 1 for column containing
Match.com, 0 otherwise, and missing=.
refused are set to missing.
will take you time!!!
Here are some tips:
Say i want to make a dummy variable for Midwest Region. I would use the following code:
gen midwest=.
replace midwest=0 if cregion != ”Midwest”
replace midwest=1 if cregion == ”Midwest”
Attachment:- pew_may_2013_online_dating.zip