What is the probability new probability of the patient


1. Please note the following definitions: The prevalence of a disease is the proportion of individuals in a population or subpopulation who have a disease. The results for a diagnostic test for a disease will be called positive if the test classifies the subject as having the disease and negative if it does not. Please note that the test can be wrong and it can classify a subject as having the disease when the subject does not and vice a versa. The sensitivity of a diagnostic test is the probability of the test result will come up positive (for the present of a disease) if the person does indeed have the disease. (This could be signified by P(T+|D+) or called the "true positive" rate.). The specificity of a disease is the probability that the test will give a negative result if the subject does not have the disease. (This could be signified by P(T-|D-) or called the "true negative" rate.). Answer the following questions:

(a) A doctor is concerned that her patient has systemic lupus erythematous (SLE or lupus). Since the patient has several symptoms, the doctor knows from the literature that there is a 13% chance that the patient has lupus. So the doctor orders a test called an ANA test. This test has a sensitivity of 99% and a specificity of 80%. If the patient has a positive result from this test, what is the probability new probability of the patient having lupus?

(b) Now, with the ANA test positive, in the above question you know what the present probability of the patient having lupus. However, the doctor was not satisfied with this probability and would like to be more certain. So, the doctor order another test (called the Anti-dsDNA test) which has a sensitivity of 73% and specificity of 98%. If this test also comes back positive, what is the probability that the patient has lupus?

(c) The above sequence of test is fairly common. The first test of this nature is often called the screening test and a second test like the above test is used to, hopefully, confirm the diagnosis. Please discuss what properties might make a good screening test and what might make a good confirmatory test and why.

(d) suppose there is a family practice physician who decides to test all his patients for lupus. For males in the general population, the population prevalence of lupus is about 1 in 25 000. If a male had a positive ANA test for lupus, what is his probability of actually having lupus?

2. Let's calculate the posterior distribution of the average height (in inches) of a group of people. Suppose that we have two different datasets. The first dataset is:

D1=(50, 47, 65, 74, 59, 64), and the second is: D2=(30, 25, 35, 45, 23, 33). Consider the following three priors for this data and then answer the question below.

Prior 1: Assume the heights Xi of people from a particular population follows a normal distribution with μ and precision τ. Also, assume that the parameter τ is known and that μ has a normal prior distribution with mean μ0 and precision τ0. Assume that it is believed that the "average" height is 66 inches (about 182cm). Also, assume that it is generally thought that the 95% of the Xi's are between 54 and 78 inches. Then the posterior standard deviation is about one forth of that range and so the standard deviation of the Xi's given μ is about 6 inches, so the prior belief is that the fixed parameter τ is about 1/36. For the value of μ0 it is believed that the true value μ is between 63 inches and 69 inches (with about 95% probability). So, the value of τ0 is about 4/9.

Prior 2: Assume the heights, Xi, are distributed with mean μ and precision τ . Also, assumeθ that τ has gamma prior distribution with parameters (shape and rate) of α and β, and μ has a normal distribution given that τ = t with mean μ0 and precision θτ . As with prior 1, we let μ0 equal 66 inches. Also, from the reasoning in prior 1, we will assume that the prior mean of τ is 1/36. Since the mean of this gamma distribution is α=β, then one possibility is to have α=1 and β=36. We can use R or Splus to check the distribution of of the standard deviation of Xi given μ which we figured should be about 6. The following R/Splus command can be used to check the definition:

992_Normal distribution.png

We see that their is very good support in the neighbor of 6 for the prior distribution. To finish this prior, we need a value for the parameter θ. Now it could be argued that the values Xi given μ should be much more spread out than the values of μ around its mean. So, we will keep θ greater than one. Here, let us use θ to be about 4 and that leads to the following summary statistics for the prior distribution of the standard deviation of μ given mu0:

185_Normal distribution1.png

Prior 3: This prior uses the same basic model as prior as was used in Prior 2, but μ0 equals 66, θ
equals 0.1, α=.001, and β equals .001.

(a) For each of the three priors and for the two data sets (so there are 6 combinations in all), calculate the posterior mean, standard deviation, a 95% credible region, and a density plot for average height. (Note: the standard precision of the t-distribution is not 1=variance.) You may calculate these values by finding the exact values using analytica methods (algebra) or by performing simulations. (You can also do it both ways if you wish.)

(b) Please comment on the differences in the results obtained by using the different priors. To be more specific, try and imagine what kind of personal beliefs each of the three priors represent.

So, each of the priors assumes that the "a prior" average height is 66 inches and then we are imagining that the person sees two different types of data. So, is there a person who might be describes as "not believing" the data when they see one set of data. So, perhaps that person might revise their estimate of the average height and perhaps have 95% credible region of the average which might not even contain any of the observed values. Also, note how their prior beliefs are re?ected in the size of the posterior credible regions after seeing either of the two data sets. So, to answer this question, describe the beliefs of the three different people who would have each of the three different priors and state how they react to seeing each of the two data sets. (So, perhaps they were surprised by the data, they might have been suspicious of the data, or they might have seen the data as "confirming" their belief, etc.) More specifically, describe when you might or might not consider using the different priors. When do you think that they are "valid" or "invalid" in terms of representing your beliefs.

3. In this question, we will consider developing priors and using these priors to calculate posterior distributions. The main subject of this investigation is the birth weight of babies. There has been a lot of work looking at birth weight as a health indicator. For simplicity, assume that the distribution of birth weights is approximately normally distributed with mean μ and precision τ . Therefore, the purpose of this question is to learn about the values of μ and τ . Do the following:

(a) (Priors) Write down two sets of priors. For the first prior, choose a very non-informative prior. For the second prior, consider a somewhat "honest vague" prior. In considering this second prior, please consider information about what you know about birth weights. That is, consider the value of birth weights that you know about. Perhaps what the value was for yourself and other members of your family. (If you don't know anyone, some weights of people that I can remember are 9, 8.5, 6.5, and 7, all in pounds). Also, you might want to know that 1 pound = 16 ounces = 453.6 grams.

When developing your priors, please explain your reasoning for the priors that you come up. Note 1: there are many different "correct" answers, but you need to be clear in your explanation. Note 2: For these priors, I do not want you to do "research" on the internet, etc. I want you to give what your priors are. Even if it is wide, it is okay. The point of the excercise is to quantify what you believe before you see the data and what you learn from the data.

(b) Assume that for a sample of 30 birth weights (in grams), we have ¯ X = 2750:63gm, ∑(Xi  ¯ X)2 = 421721gm2. Give the marginal posterior distribution of the parameters of the model for birth weight given these sampled values. Use the two different priors that you specified above.

When specifying these distributions, calculate the posterior mean, standard deviation, and a 95% credible region for these parameters under each of the two priors. (Note: the standard precision of the t-distribution is not 1=variance.)

(c) Generate a histogram of values for the parameters of your model from the marginal posterior distribution. (That is, generate samples from the joint posterior distribution of the parameter and then provide the histogram for each of the parameters separately.)

(d) Generate 400 new sampled values of the predictive distribution. Give a histogram of these values and simple summary statistics of these 400 sampled values (such as mean and standard deviation.) (Hint: First generate a parameter value from your posterior, then generate a new sampled value from your probability model with the new parameter value which was generated. When providing your answer, you should be providing at least a sketch of the computer program used if not the actual R code.)

4. Preamble: You have been hired to predict the results of an election. The election is for town selectperson. There are three selectpersons who make up the the town council and who run the town. There is a division between the purple party and the brown party. If one side has a majority, they will control the issues in the town. Each selectperson is elected in one of three town districts and both parties have candidates running in each district. If a candidate wins a majority of votes in a district, that candidate will be the selectperson for that district. Also, assume that there will be 5001 citizens voting in each district.

(a) Prior distributions. Create two different analyses. One based on a "non-informative prior" and the other on an informative prior. Justify your answer. For the informative prior consider the following information. In the past, the two parties have been quite balanced. In the past, each party would get between 40% to 60% of the votes. Use this information to create an informative prior.

(b) Posterior distribution for the voting percentage for each district. For each district, a simple random sample of the citizen of each district is asked whom they plan to vote for. In district one: 85 said purple and 65 said brown; in district two: 70 said purple and 80 said brown; and in district three: 50 said purple and 100 said brown. Calculate the posterior probability for the percent who will vote for purple in each district. In reporting your results, provide the distribution of the posterior and the value of the parameters of the parameters and also report appropriate statistics for these distribution (which includes the posterior mean, standard deviation, and some kind of 95% interval). (Note: do this for each of the two priors specified in the first part.)

(c) Given the above information, provide the probability that the purple party will have a majority in the town council. Also, provide the probability that purple will win each of the districts. Do this by simulation. That is, simulate 10000 elections. For each simulated election, generate a probabilty, θi, which is the probability of a citizen voting for the purple party in district i. Then, generate the number of citizens voting for purple in each district. From there, one can elect either the purple candidate or the brown candidate for each district for each simulated election. Finally, one can see which party had a majority in the town council in the simulated election. (Note: do this for each of the two priors specified in the first part.)

Solution Preview :

Prepared by a verified Expert
Advanced Statistics: What is the probability new probability of the patient
Reference No:- TGS0909371

Now Priced at $50 (50% Discount)

Recommended (99%)

Rated (4.3/5)