DIRECTIONS: PLEASE READ THE FOLLOWING STUDENT DISCUSSION AND RESPOND. CITE REFERENCES USING APA STYLE FORMAT.
Discussion
In this week's discussion we learned about two test theories that will help evaluate if a test is fair and unbiased. "Fairness is a fundamental validity issue and requires attention throughout all stages of test development and use" (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014, p. 49). Two theories that address fairness is Classical Test Theory (CTT) and Item Response Theory (IRT). Classical Test Theory (CTT) is also known as true score theory or true score model.
CTT takes into account a "testtaker true score on a test that would be obtained but for the action of measurement error" (Cohen, Swerklik, &Sturman, 2013, p. 123). The second theory to evaluate test fairness is IRT. It should be noted that IRT is not a "single theory or method" (Cohen et al., 2013, p. 168). They are over 100 types of IRT models that differentiate specific approaches (Cohen et al., 2013). Item Response Theory is also referred as latent-trait theory or latent-trait model. IRT is a system of expectations "about measurement and the extent to which each test item measures the trait" (Cohen et al., 2013, p. I-15).
When dealing with issues of test fairness, the IRT is the preferred theory for several reasons. The IRT is able to utilize information curves to adapt tests more precisely to the individuals being tested (Cohen et al., 2013). In this way, the most effective information can be garnered from a test without the laborious superfluity of a prescribed format.
IRT information curves assist in removing the biases of the CTT wherein difficulty levels fluctuate depending upon the people being tested (Cohen et al., 2013). Also, differential item functioning (DIF) helps to increase fairness be reducing variables resulting from culture, age, gender, linguistics, etc. (Cohen et al., 2013). Rather than an overarching formula being applied to vastly different populations, IRT allows test developers to tailor tests around the traits of the individuals being tested.
IRT tests also undergo a rigorous process of developing item banks, reviews, and tests before the finished test is implemented (Figure 8-7) ( Cohen et al., 2013, p. 284). In sum, the mutability and soundness of IRT in shaping, implementing, and analyzing tests helps to reduce biases which could be inherent in standardized classical tests. Therefore, the IRT is the preferred theory for responding to questions about fairness in testing.
Classical Test Theory (CTT) has been advantageous and broadly applicable because "most researchers are familiar with this basic approach" and "many data analysis and statistics-related software packages are built" from its perspective (Cohen et al., 2013, p. 281). Aside from its relative ubiquity in the realm of testing, CTT benefits from small samples sizes and "simple mathematical models." (Table 8-4) (Cohen et al., 2013, p. 282). On the other hand, CTT tests can often be long with each test item contributing unequally to the final test score.
Moreover, "item statistics and overall psychometric properties of the test are dependent on the samples which have been administered" (Cohen et al., 2013, p. 281). These disadvantages greatly increase the difficulty of acquiring a fair and simple test with little room for false interpretations.
Item Response Theory (IRT) benefits from being adapted to ability levels of test subjects. Consequently, IRT-based tests can be "relatively short" and "still reliable" (Cohen et al., 2013, p. 281). Also, IRT based tests feature statistics which are independent of the samples administered by the test, unlike CTT based tests. The advantages of IRT help to circumvent typical biases which might corrupt test results.
Furthermore, IRT helps tests to me more accurate through several stages of review and accountability. Conversely, IRT is less familiar to researchers than CTT and the tests are less likely to be applied widely (Cohen et al., 2013). IRT-based tests can be considered complicated and can require large sample sizes (Cohen et al., 2013). In addition to the fact that there is much less IRT-based software packages than CTT, IRT is less likely to be utilized generally across populations.
Although many tests can still be implemented using "paper-and-pencil methods," technological advances are helping to enhance test development (Cohen et al., 2013, p. 283-284). For example, differential item functioning (DIF) analysis helps developers respond to test takers by shaping the tests to their abilities (Cohen et al., 2013).
This has technological implications because tests can differ whether they are taken on paper or by computer. Despite the fact that CTT has benefitted from widespread familiarity by having more software packages than IRT-based packages, programming such as "CAT" caters to IRT information curves by structuring questions based on a test taker's previous response (Cohen et al., 2013).
Test developers are currently employing various apparatuses "ranging from Internet to handheld devices to computer-assisted telephone interviewing" (Cohen et al., 2013, p. 289). By using "CAT" to build more precise item banks for tests automatically, new implementations of technology in IRT-based tests help to ensure fairness while reducing error.
Reference
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Cohen, R. J., Swerdlik, M. E., &Sturman, E. D. (2013). Psychological testing and assessment: An introduction to tests and measurement (8th ed.). New York, NY: McGraw Hill.Dayane De LeonMS in Psychology, ABA