Homework Assignment Stata Exercises
Question 1 - Consider the model of house price we discussed in class. The dependent variable was the price of the house in dollars. However, real estate economists have found that for many data sets, a more appropriate model has the dependent variable ln(price).
a) Using the data in the file utown.dta, estimate the following model using ln(price) as the dependent variable.
ln(PRICE) = β1 + β2 UTOWN+ β3 SQFT + β4 (SQFT×UTOWN) + β5 AGE + β6 POOL + β7 FPLACE + e
b) Find an expression as the marginal effect of SQFT on ln(PRICE).
c) Find an expression as the marginal effect of UTOWN on ln(PRICE).
d) Interpret the estimates of β5 ,β6 and β7.
e) Compute the marginal effect of SQFT on ln(PRICE) for a home near the university. [Hint: using the equation from part b.]
Question 2 - Data on the weekly sales of a major brand of canned tuna by a supermarket chain in a large Midwestern U.S. city during a mind-1990s calendar year are contained in the file tuna.dta. There are 52 observations on the variables.
SAL1 = unit sales of brand No.1 canned tuna
APR1 = price in dollar per can of brand No. 1 canned tuna
APR2, APR3 = price in dollar per can of brands Nos. 2 and 3 of canned tuna
DISP = an indicator variable that takes the value one if there is a store display for brand No.1 during the week but no newspaper ad; zero otherwise
DISPAD = an indicator variable that takes the value one if there is a store display and a newspaper ad during the week; zero otherwise
a) The prices APR1, APR2 and APR3 are expressed in dollars. Multiply the observations on each of these variables by 100 to express them in terms of cents. Call the new variables APR100, APR200, and APR300. Estimate the following log-linear model:
ln (SAL1) = β1 + β2 APR100 + β3 APR200 + β4 APR300 + β5 DISP + β6 DISPAD + e
b) Interpret the estimates of β2, β3 and β4.
c) Are the signs and relative magnitudes of the estimates of β5 and β6 consistent with economic logic? Interpret these estimates.
d) Are your estimates of β5 and β6 statistically different from zero at the 5% level of significance? Discuss the relevance of these two variables for the supermarket chain's executives.
Question 3 - Mortgage lenders are interested in determining borrower and loan factors that may lead to delinquency or foreclosure. In the file of lasvegas.dta are 1000 observations on mortgages for single-family homes in Las Vegas, Nevada, during 2008. The variable of interest is DELINQUENT, an indicator variable = 1 if the borrower missed at least three payments (90 or more days late), but zero otherwise. Explanatory variables are as follows:
LVR = the ratio of the loan amount to the value of the property
REF =1 if purpose of of the loan was a "refinance" and 0 if loan was for a purchase
INSUR =1 if mortgage carries mortgage insurance, zero otherwise
RATE = initial interest rate of the mortgage
AMOUNT = dollar value of mortgage (in $100,000)
CREDIT = credit score
TERM = number of years between disbursement of the loan and the date it is expected to be fully repaid
ARM = 1 if mortgage has an adjustable rate and =0 if mortgage has a fixed rate.
a) Estimate the linear probability (regression) model explaining DELINQUENT as a function of the remaining variables.
b) What is your estimated equation?
c) Are the signs of the estimated coefficients reasonable? Why?
d) Interpret the coefficients on INSUR and CREDIT.
e) If CREDIT increases by 50 points, what is the estimated effect on the probability of a delinquent loan?
f) Compute the predicted value of DELINQUENT for all 1000 observations. How many were less than zero? How many were greater than 1? Explain why such predictions are problematic.
Question 4 - It has been conjectured that workplace smoking bans induce smokers to quit by reducing their opportunities to smoke. In this exercise, you will estimate the effect of workplace smoking bans on smoking, using data on a sample of 10,000 U.S. indoor workers from 1991-1993, smoking.dta. The data set contains information on whether individuals were or were not subject to a workplace smoking ban, whether the individuals smoked, and other individual characteristics. A detailed description is given in the file Smoking_Description.pdf.
a) Use a linear probability model to determine the difference in the probability of smoking between workers affected by a workplace smoking ban and workers not affected by a workplace smoking ban? Is the difference statistically significant?
b) Estimate a linear probability model with smoker as the dependent variable and the following independent variables: smkban, female, age, age2, hsdrop, hsgrad, colsome, colgrad, black and Hispanic.
c) What is your estimated equation?
d) Interpret your estimates on smkban, female and black.
e) Mr. A is White, Non-Hispanic, 20 years old, and a high school dropout, and not subject to a workplace smoking ban. Using your estimated equation in part d, calculate the probability that Mr. A smokes.
f) Repeat the analysis for Ms. B, a female, Black, 40-year old, college graduate, and not subject to a workplace smoking ban.
g) What proportion of the variations in smoker is explained by your model in part b? Besides the determinants of smoking specified in the model, what else could be possible determinants that can be used as independent variables in the regression? Provide at least two (2) of them.
Attachment:- Assignment Files.rar