CLASSICAL TEST THEORY’S AND ITEM RESPONSE THEORY’S PERCEPTIONS OF THE VALIDITY OF TEST SCORES
In testing, performance is that which results from the confrontational interaction between the amount of ability under measurement possessed by a testee and the amount of trait demanded by the tasks developed to evoke such ability during testing. The interpretation of such performance given the prevailing values and standards gives achievement. Hence, test scores have meaning or test scores are valid to the extent that it was only the ability under measurement that sustained the responses to each item in the test through which the scores were generated. While classical test theory thrives on reliability, that is, how well a test repeatedly gives same score; item response theory sets to ensure that the score generated through a test represents the truth. That is, CTT is tilted towards satisfying the dictates of reliability, the repeatedness of the same test score across time, form, and parts of the same test, IRT is tilted towards ensuring a valid test, that is, ensuring that it is only what the test was designed to measure that underlies responses to its items. In other words, reliability is closer to the heart of CTT than that of IRT hence CTT is oftentimes referred to as reliability theory. For CTT, validity is faintly defined because there is no attempt to ensure that what underlies responses to items in a test is only that which the test was designed to measure, whereas for IRT the assumption of unidimensionality implies its recognition and step taken to ensure that items that fit any of its models measure one and only one ability.
H. Johnson Nenty (2016)
CLASSICAL TEST THEORY’S AND ITEM RESPONSE THEORY’S PERCEPTIONS OF THE VALIDITY OF TEST SCORES
In testing, performance is that which results from the confrontational interaction between the amount of ability under measurement possessed by a testee and the amount of trait demanded by the tasks developed to evoke such ability during testing. The interpretation of such performance given the prevailing values and standards gives achievement. Hence, test scores have meaning or test scores are valid to the extent that it was only the ability under measurement that sustained the responses to each item in the test through which the scores were generated. While classical test theory thrives on reliability, that is, how well a test repeatedly gives same score; item response theory sets to ensure that the score generated through a test represents the truth. That is, CTT is tilted towards satisfying the dictates of reliability, the repeatedness of the same test score across time, form, and parts of the same test, IRT is tilted towards ensuring a valid test, that is, ensuring that it is only what the test was designed to measure that underlies responses to its items. In other words, reliability is closer to the heart of CTT than that of IRT hence CTT is oftentimes referred to as reliability theory. For CTT, validity is faintly defined because there is no attempt to ensure that what underlies responses to items in a test is only that which the test was designed to measure, whereas for IRT the assumption of unidimensionality implies its recognition and step taken to ensure that items that fit any of its models measure one and only one ability.