Question 1:
a) Define the term multicollinearity.
b) Describe why it is significant to guard against the multicollinearity.
Question 2:
a) Sometimes we encounter missing values in the databases with a big number of fields. A general method of handling missing values is simply to omit from the analysis the records or fields with missing values. Describe why this might be dangerous.
b) Data analysts have turned to techniques that would replace the missing value with a value substituted according to different criteria. In brief give a choice of three possible replacement values for missing data.
Question 3: Variables tend to have ranges that differ greatly from each other. Data miners must normalize the numerical variables to standardize the scale of effect each variable has on the results.
Name the two methods for normalization and distinguish between each one of them.
Question 4: The usual measure used to assess estimation and prediction models is the mean square error (MSE). Write down the expression for the MSE.
a) Describe in brief the term measures of variability.
b) Give four illustrations of typical measures of variability.