Topic: Bioinformatics Case Control study using Perl
Considering the following sample results of a case-control study for a disease, write a Perl script named calcAlleleOdds.pl that study each position (assuming all positions are SNP positions) reporting the allele with the maximum odds ratio. Although Chi-square test is a usual test model for association studies, but we will skip it in this assignment.
You need to calculate the odds ratio for each allele, and then report the allele with the maximum odds ratio). Calculating odds ratio is provided in the Supplement slides and was discussed in the lecture/recording of Wed class.
You may face a known problem in calculating odds ratio, i.e. when the cells of the 2x2 table (that is used to calculate odds ratios) contain one or more of zero values, the solution for this problem is provided in the Supplement slides too.
In case, an allele with an odds ratio > 1.5 is found for a position, report that this position is associated with the disease.
You may copy-paste the data to a local text file so that you may read the data using the Perl script.
You will need to submit the Perl script that you made for the assignment. The output of the Perl script should be a tab-separated file named alleles.tsv that provides the following information:
| Position (1-based index) | Allele | Odds ratio | Associated (Yes or No) | 
|   |   |   |   | 
Cases:
TACGCAGTCGACAGGTATAGCCTACATGTACTCGACATGTACTCGGT
TACGCCGTCGACATGTATAGTCTACATGTACTCGACATGTACTCGGT
TACGCAGTCGACAGGTATAGTCTACATGTACTCGACATGTACTCGGT
TACGCAGTCGACAGGTATAGCCTACATGTACTCAACATGTACTCGGT
TACGCCGTCGACATGTATAGCCTACATGTACTCGACATGTACTCGGT
TACGCCGTCGACATGTATAGCCTACATGTACTCGACATGTACTCGGT
TACGCCGTCGACAGGTATAGCCTACATGTACTCGACATGTACTCGGT
TACGCAGTCGACAGGTATAGCCTACATGTACTCGACATGTACTCTGT
Controls:
TAGGCAGTCGATAGGTATAACCTACATGTCCTCGACAGGTACTCGGT
TAGGCGGTCGATATGTATAATCTACATGTCCTCGACAGGTACTCGGT
TAGGCAGTCGATAGGTATAATCTACATGTCCTCGACAGGTACTCGGT
TAGGCAGTCGATAGGTATAACCTACATGTCCTCAACAGGTACTCGGT
TAGGCGGTCGATATGTATAACCTACATGTCCTCGACAGGTACTCGGT
TAGGCGGTCGATATGTATAACCTACATGTCCTCGACAGGTACTCGCT
TACGCGGTCGATAGGTATAACCTACATGTCCTCGACAGGTACTCGCA
TACGCAGTCGACAGGTATAACCTACATGTACTCGACATGTACTCTCA