The goal this week is to gain an understanding of our data set - what kind of data we are looking at, some descriptive measurse, and a
look at how the data is distributed (shape).
1. Measurement issues. Data, even numerically coded variables, can be one of 4 levels - nominal, ordinal, interval, or ratio. It is important to identify which level a variable is, as this impact the kind of analysis we can do with the data. For example, descriptive statistics such as means can only be done on interval or ratio level data.
Please list under each label, the variables in our data set that belong in each group.
Nominal Ordinal Interval Ratio
b. For each variable that you did not call ratio, why did you make that decision?
2 The first step in analyzing data sets is to find some summary descriptive statistics for key variables.
For salary, compa, age, performance rating, and service; find the mean, standard deviation, and range for 3 groups: overall sample, Females, and Males.
You can use either the Data Analysis Descriptive Statistics tool or the Fx =average and =stdev functions.
(the range must be found using the difference between the =max and =min functions with Fx) functions.
Note: Place data to the right, if you use Descriptive statistics, place that to the right as well.
Some of the values are completed for you - please finish the table.
|
|
Salary |
Compa |
Age |
Perf. Rat. |
Service |
Overall |
Mean |
|
|
35.7 |
85.9 |
9.0 |
|
Standard Deviation |
|
|
8.2513 |
11.4147 |
5.7177 |
|
Range |
|
|
30 |
45 |
21 |
Female |
Mean |
|
|
32.5 |
84.2 |
7.9 |
|
Standard Deviation |
|
|
6.9 |
13.6 |
4.9 |
|
Range |
|
|
26.0 |
45.0 |
18.0 |
Male |
Mean |
|
|
38.9 |
87.6 |
10.0 |
|
Standard Deviation |
|
|
8.4 |
8.7 |
6.4 |
|
Range |
|
|
28.0 |
30.0 |
21.0 |
3. What is the probability for a: Probability
a. Randomly selected person being a male in grade E?
b. Randomly selected male being in grade E?
Note part b is the same as given a male, what is probabilty of being in grade E?
c. Why are the results different?
4 A key issue in comparing data sets is to see if they are distributed/shaped the same. We can do this by looking at some measures of where
some selected values are within each data set - that is how many values are above and below a comparable value.
For each group (overall, females, and males) find: Overall Female Male
A The value that cuts off the top 1/3 salary value in each group "=large" function
i The z score for this value within each group? Excel's standize function
ii The normal curve probability of exceeding this score: 1-normsdist function
iii What is the empirical probability of being at or exceeding this salary value?
B The value that cuts off the top 1/3 compa value in each group.
i The z score for this value within each group?
ii The normal curve probability of exceeding this score:
iii What is the empirical probability of being at or exceeding this compa value?
C How do you interpret the relationship between the data sets? What do they mean about our equal pay for equal work question?
5. What conclusions can you make about the issue of male and female pay equality? Are all of the results consistent?
What is the difference between the sal and compa measures of pay?
Conclusions from looking at salary results:
Conclusions from looking at compa results:
Do both salary measures show the same results?
Can we make any conclusions about equal pay for equal work yet?
Attachment:- assign.rar