STATISTICS
A statistical post-mortem of subprime mortgages
Subprime mortgages were prominent business news in 2007 and 2008 during the meltdown in the financial markets. A subprime mortgage is a home loan made to a risky borrower. Banks and hedge funds plunged into the subprime housing market because these loans earn higher interest payments - so long as the borrower keeps paying. Defaults from loans from subprime mortgages brought down several financial institutions (Lehman Brothers, Bear Stearns and AIG) and led to federal bailouts totaling hundreds of billions of dollars.
For this analysis, a banking regulator would like to verify how lenders are using credit scores to determine the rate of interest (APR) paid by subprime borrowers. A credit score (FICO) around 500 indicates a subprime borrower, one who might not repay the loan. A score around 800 indicates a low risk borrower. The regulator wants to isolate the effect of credit score from other variables that might affect the interest rate. For example, the loan-to-value ratio (LTV) captures the exposure of the lender to default. As an illustration, if LTV = 0.80 (80%), then the mortgage covers 80% of the value of the property. The higher the LTV, the more risk the lender faces if the borrower defaults. The data also include the stated income of the borrower and the value of the home, both in thousands of dollars. In addition, the data identifies the borrower as African American, Hispanic or white since there is concern that minority groups may have been discriminated against. The data set contains 372 mortgages obtained from a credit bureau. These loans are a sample of mortgages within the geographic territory of this regulator.
(If excel is needed please just copy and paste it into this word document. Thank you)
Question 1.
(a) How many variables are in the data set? What is the scale of measurement for each variable?
(b) Provide a point estimate of the population mean APR? What is the sampling distribution of this point estimate?
(c) Can you improve on your estimate in (b) that provides you with a best case/worst case type estimate? Provide that estimate and interpret that in a complete sentence that can be understood by someone who may not have taken a course in statistics.
(d) At the 1% level of significance, is there any evidence to suggest that the population mean APR is greater than 10%. Explain what "1% level of significance means" clearly.
(e) At the 5% level of significance, is there evidence that "Whites" received lower mean interest rates than Af-Am and Hispanics? (I.e. combine African Americans and Hispanics into one group).
(f) Briefly (a paragraph of 4 - 5 sentences) summarize your findings from (a) - (e).
Question 2.
a.) What type of relationship (i.e. direct, inverse, or no relationship) should you expect to see between APR and each of the rest of the variables? This should be based on your understanding of how interest rates are determined (using your knowledge of economics, and general knowledge).
b.) Perform a MULTIPLE regression with APR as the dependent variable, and ALL the other independent variables in your data.
c.) Clearly interpret EACH of the coefficients you obtained in (b).
d.) How well does your regression fit the data?
e.) At the 5% level conduct the following regression tests. In each case, clearly state the null and alternative hypothesis and the rest of the steps.
a. Test whether the regression as a whole is significant
b. Test whether the coefficient on LTV is significantly below 1.
c. Test whether Stated Income is positively related to APR.
d. Test whether African-Americans obtain a higher APR than Whites
e. Test whether Hispanics obtain a different APR that Whites.
f.) In one or two paragraphs, clearly summarize your findings from (a) - (e). Specifically, explain how credit scores, LTV, and income explain the APR that borrowers receive. Additionally, based on your analysis, is there any evidence of racial discrimination in mortgage rates? Explain why or why not?