The data to be used in this problem are contained in the S-Plus script file crude.asc. Running this script will create the data frame CRUDE containing a numerical matrix with 3325 rows and 12 columns, and the numeric vector COma of length 3325.
In both data structures, each row corresponds to a date, the first one being 4/18/1989, and the last one being 8/12/2002. Each row of the vector COma contains the average of the crude oil spot price over the period of 5 days starting on (and including) the date indexing the row. Each row of the matrix CRUDE gives the prices of the 12 futures contracts of crude oil as traded the day before. Form a data frame TRGCRUDE and a vector TRGCOma with the first 2500 rows of CRUDE and COma respectively. You shall also need to form a data frame TSTCRUDE and a vector TSTCOma with the last 825 rows of CRUDE and COma respectively. The goal is to predict the values of the average spot price over the next five days from the prices of the crude oil futures contracts traded the day before, by fitting a regression model to the training data contained in the data sets TRGxxx, and using the model to compute predictions for the values of the response in the testing sample TSTCOma
Warning. For all the predictions considered in this problem, the figure of merit should be the square root of the mean squared error. It is very important that you explain your work in detail. In particular, explain your choices for the order of the models, the kernel functions, and the bandwidths you use.
1. Fit a least squares linear regression model for COma against the 12 explanatory variables given by the prices of the futures contracts the day before using the data in TRGCRUDE and TRGCOma, use this model to predict the values of COma in TSTCOma from the corresponding values of the explanatory variables, and compute the figure of merit.
2. Fit a projection pursuit regression model for COma against the 12 explanatory variables given by the prices of the futures contracts the day before using the data in TRGCRUDE and TRGCOma, use this model to predict the values of COma in TSTCOma from the corresponding values of the explanatory variables, and compute the figure of merit.
3. Perform the PCA of the data in TRGCRUDE, plot the first four loadings, give the proportions of the variance they explain, and compute the first two principal components.
4. Fit a one dimensional kernel regression model for COma against the first principal component using the data in TRGCRUDE and TRGCOma, use this model to predict the values of COma in TSTCOma from the corresponding values of the explanatory variables, and compute the figure of merit.
5. Fit a two dimensional kernel regression model for COma against the first two principal components using the data in TRGCRUDE and TRGCOma, use this model to predict the values of COma in TSTCOma from the corresponding values of the explanatory variables, and compute the figure of merit.
6. Compare the numerical results obtained with the various methods, and explain why they could have been expected.