Assignment
Instructions: Read lectures notes and sample codes I used in class carefully before starting your homework. You need to submit a .sas file on ANGEL
(Lessons > Homework > dropbox for Homework2).
1. (Column Input and Assignment Statement). Below is a small dataset of grades for 6 students.
Name
|
Gender
|
Homework
|
Midterm
|
Final
|
Beck
|
Male
|
90
|
88
|
75
|
Christal
|
Female
|
100
|
85
|
80
|
Gavetti
|
Female
|
80
|
77
|
71
|
Easton
|
Male
|
90
|
82
|
76
|
Mattiace
|
Male
|
60
|
67
|
53
|
Chen
|
Female
|
85
|
83
|
68
|
Table 1: Grades
(a) Use column input to read the dataset above called Grades and store it in the folder MyLibrary1
(b) Define Two new variables Score1 and Score2, where
Score1 = 40% Homework +20% Midterm + 40% Final, and Score2 = 40% Midterm + 60% Final.
(c) Calculate the final score. The final score, represented by a new variable fs, is defined to be the maximum of the following three variables:
Final, Score1 and Score2.
(d) Curve the final score fs. The curved new final score, represented by newfs, is defined as
newfs = 0.7 × fs + 30
(e) Round the new final score newfs to the nearest multiple of 5. (Hint: Use round function,
round(argument1, argument2), read SAS documentation for more details.)
(f) Print out the whole dataset, including all the new variables you created except the variable
Gender. (Usevarstatement).
2. (Formmated Input). Below is a dataset of all the games between Argentina and Germany in the FIFA World Cup. The descriptions of the variables are:
• game: the number of the game
• year: the year of the World Cup
• date: the date of the game
• stage: the stage of the game (a character variable)
• Arg: the score of Argentina
• Ger: the score of Germany
• Attendence: the number of audience
game
|
year
|
date
|
stage
|
Arg
|
Ger
|
Attendance
|
1
|
1958
|
08Jun1958
|
1st Round, Group1
|
1
|
3
|
31,156
|
2
|
1966
|
16Jul1966
|
1st Round, Group2
|
0
|
0
|
46,587
|
3
|
1986
|
29Jun1986
|
Final Game
|
3
|
2
|
114,600
|
4
|
1990
|
08Jul1990
|
Final Game
|
0
|
1
|
73,603
|
5
|
2006
|
30Jun2006
|
1/4 Finals
|
1
|
1
|
72,000
|
6
|
2010
|
03Jul2010
|
1/4 Finals
|
0
|
4
|
64,100
|
7
|
2014
|
13Jul2014
|
Final Game
|
0
|
1
|
74,738
|
Table 2: Germany vs Argentina in World Cup
(a) Use formatted input to read the dataset above called games. When reading the dataset (1) use an absolute pointer control to read year and date, and use a relative pointer controls to read stage and Attendance; (2) Use informat dateW. for variable 'date' and use informat commaW. for variable 'Attendance', where W is an integer which specifies the number of columns that variable will occupy.
(b) Set the page size to 50, line size to 80, center the output, and suppress the date and time when the output was created.
(c) Print the data set games. (1) Remember formatted input does not imply formatted output. Use format statement to format your output; (2) Add a title for this dataset.
3. (IF-THEN-ELSE Statement and Create a List Report). We have the following small dataset called HealthReport
Subj
|
Fname
|
Lname
|
Gender
|
Height
|
Weight
|
Date
|
Cal
|
1024
|
Alice
|
Smith
|
Female
|
65
|
125
|
12/01/95
|
2,036
|
1167
|
Maryann
|
White
|
Female
|
68
|
140
|
12/02/95
|
1,800
|
1168
|
Thomas
|
Jones
|
Male
|
|
190
|
01/05/96
|
2,302
|
1201
|
Benedictine
|
Arnold
|
Male
|
68
|
190
|
11/30/95
|
2,432
|
1302
|
Felicia
|
Ho
|
Female
|
63
|
115
|
01/01/96
|
1,972
|
1344
|
James
|
Hunter
|
Male
|
70
|
188
|
12/15/95
|
1,869
|
Table 3: Health Examination Report The columns, from left to right, represent the following variables:
Variable Name
|
Descriptions
|
Variable Type
|
Subj
|
Subject number
|
Numeric
|
Fname
|
First name
|
Character
|
Lname
|
Last name
|
Character
|
Gender
|
Gender
|
Character
|
Height
|
Height (lbs.)
|
Numeric
|
Weight
|
Weight (inches)
|
Numeric
|
Date
|
Date of measurement
|
Numeric
|
Cal
|
Calorie
|
Numeric
|
Table 4: Descriptions of variables
(a) Read the dataset above into SAS (pay attention to length and format of variables), and change the SAS system settings as follows:
linesize = 64, no current date and time, no page number
(b) Use assignment statement to create a new numeric variable BMI. BMI (Body mass index) is a person's weight divided by the square of his/her height. In other words, it is defined to be
BMI = Weight / Height2
where the weight unit is kilogram and the height unit is meter. So first you need to convert pound into kilogram and convert inch into meter (1 kilogram = 2.205 pounds, 1 inch = 0.0254 meter), and then calculate BMI.
(c) Use IF-TEHN-ELSE statement to create a new character variable Result. The value of variable
Result is determined by the value of BMI according to the following table.
Note: the length of the variable Result can be longer than 8. You should specify an appropriate length for variable Result with length statement. The length statement should be placed before the input statement.
Value of BMI Value of Result
Missing 'Missing'
≤ 25 'Underweight'
> 25 and ≤ 30 'Normal'
> 30 'Overweight'
(d) Use SORT procedure to sort the dataset Health Report first by the variable Gender in an ascending order, and then by the variable Lname in a descending order. The sorted dataset should be saved in another SAS dataset named Srtd Health Report.
(e) Print the the sorted dataset Srtd Health Report. When printing the dataset,
1. Use variable Lname and Fname as the identification variables (Use ID statement).
2. Split the dataset into two groups by variable Gender. Calculate the sum of Cal for each group.
3. Use LABEL statement to assign labels to all variables to make your output more readable. Particularly, use the descriptions in Table 2 as the labels. You can place the LABEL statement either in the DATA step or in the PRINT procedure. Use labels rather than the variable names when printing the dataset.
4. Add a TITLE ('Health Examination Report Sorted by Last Name') and FOOTNOTE for this dataset. The content of the FOOTNOTE is: ' BMI = Weight(kg)/(Height(m))^2'.
5. Use the correct FORMAT for variables Date and Cal.
(f) Use Where statement to print the first and last names of female students who are underweight.