Note: For this assignment, all coding has to be done in R. The results of your experiments (including plots), and your comments for all problems should take the form of a report. See the guidelines at the end of this document for reporting and submission instructions.
For the portion of this assignment requiring SCA data, use the full SCA data set found in the file "hemo.csv." Your code should be written to load the hemo.csv file. Use the 15% criteria to determine the responders and non-responders. So, ignore the last column, labeled "Class".
For the portion of this assignment requiring Leukemia data, use the reduced microarray data set found in golubtrain.csv and golubtest.csv. These files should be combined with in R, as new training and testing divisions will be made between the patients in both datasets.
For the purpose of classification experiments, you will NOT use the leave-one-out methodology. Instead, divide the data set randomly into a training set and a testing set. For Leukemia data, this has already been done for you. You may re-randomize it if you prefer. You may also use a validation set if you so choose for part 1. You will need to specify a way for keras to split your training data into training and validation data,, as discussed later in the Keras part of the assignment.
Single-layer networks:
1. Use R's built-in neural network functions for classification of the SCA patients. For this problem, use the highest ranked parameter. For each type of network below, report the training error value at the end of the training session for all training runs that you make. Plot the error function of the bestANN run. Report its training and testing accuracy.
1. Use a Delta-rule network with sigmoidal activation functions.
2. Use a two-stage Back-propagation network with sigmoidal activation functions.
Keras Multilayer Networks: Background research, Code Implementation & Comparison to Single-layer:
3. Use the keras library in R for the coding implementation portions of these problems, with the tensorflow library as a backend. Please give detailed but concise answers to any questions asked. Links for the sources you used should be included at the end of the report. Do not use Wikipedia or Stack Overflow and Quora answers as sources (but you can use them to help you understand the concepts).
a. Build the model: Using the 80 highest ranked parameters selected from the Golub dataset, build a network using Keras with an appropriate model for the data we are classifying and type of classification: one input layer with 64 neurons with the appropriate number of inputs (specified using the input_shape parameter), and 3 hidden layers with 64, 32, and 16 neurons, respectively (maintain the order listed when adding these layers to your network!). Choose a proper activation function for your layers. Add a final output layer, and choose the activation function you think is best for classifying the data given (Golub set, binary classification).
c. Compile the model: Choose an appropriate loss function and an appropriate optimization function for for the network you have designed.
e. Training and testing: Re-divide the Golub training and testing data so that the first 45 patients are in the training set (validation will be sampled from here), and the last 27 patients in the testing set. Return the class labels and final training accuracy and testing accuracy of ten epochs of your final model. State these accuracies in your report. During your model fitting, make sure you specify a value for validation_split that is appropriate for our small set of patients. Choose at least two in your code, and briefly explain what these metrics mean.
4. Use the same predictors and setup created in the last problem, but this time, use 16 neurons in your first middle hidden layer,32 neurons in the second middle hidden layer, and 64 neurons in the third middle hidden layer. Generate accuracies and compare this to what you saw in problem 3.
5. Play around with the size of the training and testing sets, the number of layers, the number of neurons in each layer, and choice of final layer activation function in your keras network, and report the accuracies of 10 runs of your best-performing network for the golub data.Repeat this for the hemo data. Make sure your cv accuracy is less than your training accuracy for your neural network runs; be skeptical of extremely high training accuracy!
Attachment:- data.zip