The spreadsheet contains data on 250 individuals: 90 normal individuals from San Diego (the controls), and 160 individuals from Korea and China, all of whom were diagnosed with hepatocellular carcinoma (HCC).
Serum samples were taken from the controls and from the cases at time of diagnosis of HCC. Levels of a panel of 12 tumor-associated antigens (TAAs) were assessed via immunoassays in all individuals;
The levels are given in the columns with headings Ab14, HCC1, IMP1, KOC, MDM2, NPM1, P16, P53, P90, RaIA, and Survivin. (These are the designations of the 12 TAAs, all of which were thought to be potentially predictive of cancer.)
The underlying question is whether we can effectively discriminate between the cases and controls on the basis of the levels of these TAAs. This is sometimes termed a classification problem in the statistics and biostatistics literature: we wish to classify individuals as normal or cancer patients on the basis of their TAA levels.
We will examine these data in Statdisk. Use the MHA610_Week 3_Assignment_Data.CSV file to upload this information into Statdisk.
If you choose the latter option, Start Statdisk, then choose File>Open and select the .csv file you created (unless you changed the name, it ought to be MHA610_assignment_3_data.csv)
Check the box that specifies the data contains column titles or headers, select Comma separated for how the data are delimited, click finish, and the dataset will have been successfully imported into Statdisk.
NOTE: you may want to read through the remainder of the assignment first, before proceeding with this step. This may save you some work afterwards!
Note that Statdisk operates on columns of data, and that both cases and controls are contained in each column of TAA levels. It will be necessary to separate the cases and controls for further analyses. This can be accomplished either by copying within Statdisk or by reverting to the original Excel workbook, copying in Excel, exporting as a .csv file, and then importing into Statdisk. (Don't say you weren't warned!)
Explain if you would characterize any or all of the TAA levels as approximately normally distributed for the controls and for the cases.
Provide plots and statistics in support of your conclusions.
Explain if any of the TAAs are useful for discriminating between the cases and controls.
Provide plots and statistics in support of your conclusions.
All writing assignments should be at least 250-500 words in APA format supported by scholarly sources.
BONUS. In the above, we pooled all cases together.  Summarize whether you think this is legitimate or whether the levels of any of the TAAs appear to differ significantly between the cases from China and the cases from Korea. Provide evidence in support of your conclusion.