Teste Adaptativo Computadorizado Bayesiano

Bernard P. Veldkamp, Mariagiulia Matteucci

Resumo


O teste adaptativo Computadorizado (CAT) chega com muitas vantagens. Infelizmente, ainda e bastante caro para desenvolver e manter um CAT operacional. Neste artigo, descreve-se varias etapas envolvidas no desenvolvimento de um CAT operacional e faz-se uma revisao da literatura nesse topico. O CAT Bayesiano e introduzido como uma alternativa, e propoe-se o uso de prioris empiricas para estimar parâmetros de itens e de individuos com o objetivo de reduzir os custos de CAT. Apresenta-se metodos para obtencao de prioris empiricas e dois pequenos exemplos para ilustrar a vantagem do CAT Bayesiano. Discute-se algumas implicacoes no uso de prioris empiricas, menciona-se limitacoes e formula-se algumas sugestoes para novas pesquisas.

Palavras-chave


Modelagem Bayesiana da TRI; Teste Adaptativo Computadorizado; Obtencao de priors; Teoria de Resposta ao Item; Selecao de itens; Estimação de parâmetros

Referências


ABRAMOWITZ, M.; STEGUN, I. Handbook of mathematical functions with formulas, graphs, and mathematical tables. Washington: Dover Publications, 1964.

ALBERT, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, Washington, DC, v. 17, p. 251-269, 1992. DOI: http://dx.doi.org/10.3102/10769986017003251.

ARIEL, A.; VAN DER LINDEN, W.J.; VELDKAMP, B.P. A strategy for optimizing item pool management. Journal of Educational Measurement, Washington, DC, v. 43, n. 2, p. 85-96, 2006. DOI: http://dx.doi.org/10.1111/j.1745-3984.2006.00006.x.

ARIEL, A.; VELDKAMP, B.P.; VAN DER LINDEN, W. J. Constructing rotating item pools for constrained adaptive testing. Journal of Educational Measurement, Washington, DC, v. 41, p. 345-360, 2004. DOI: http://dx.doi.org/10.1111/j.1745-3984.2004.tb01170.x.

BAKER, F. B.; KIM, S. H. Item Response Theory Parameter Estimation Techniques. New York: Marcel Dekker, 2004.

BARRADA, J. R.; ABAD, F.; VELDKAMP, B. P. Comparision of methods for controlling maximum esposure rates in computerized adaptive testing. Psicothema, Oviedo (Spain), v. 21, p. 313-320, 2009.

BARRADA, J. R.; VELDKAMP, B. P.; OLEA, J. Multiple maximum exposure rates in computerized adaptive testing. Applied Psychological Measurement, Thousand Oaks, CA, v. 33, p. 58-73, 2009. DOI: http://dx.doi.org/10.1177/0146621608315329.

BAYES, T. An essay towards solving a problem in the doctrine of chances, communicated by M. Price in a letter to John Canton. Phil. Trans. Royal Society London, [S.l.], v. 53, p. 269-271, 1763.

BA‰GUIN, A. A.; GLAS, C. A. W. MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, New York, v. 66, p. 541-562, 2001. DOI: http://dx.doi.org/10.1007/BF02296195.

BEJAR, I. I. A generative approach to psychological and educational measurement. In: FREDERIKSEN, N.; MISLEVY, R. J.; BEJAR, I. I. (Ed.). Test theory for a new generation of tests. Hillsdale, NJ: Lawrence Erlbaum, 1993. p. 323-357.

BIRNBAUM, A. Some latent trait models and their use in inferring an examinee’s ability. In: LORD, F. M.; NOVICK, M.R. (Ed.). Statistical theories of mental test scores. Reading, MA: Addison-Wesley, 1968. p. 397-479.

BOWLES, R.; POMMERICH, M. An Examination of Item Review on a CAT Using the Specific Information Item Selection Algorithm. Paper presented at the Annual Meeting of the National Council on Measurement in Education (Seattle, WA, April 11-13, 2001).

BREIMAN, L. et al. Classification and Regression Trees. Belmont, CA: Wadsworth International, 1984.

CHANG, H.; YING, Z. Global information approach to computerized adaptive testing. Applied Psychological Measurement, Thousand Oaks, CA, v. 20, p. 213-229, 1996. DOI: http://dx.doi.org/10.1177/014662169602000303.

CHANG, H.; YING, Z. (1999). A-stratified multistage computerized adaptive testing. Applied Psychological Measurement, Thousand Oaks, CA, v. 23, p. 211-222. DOI: http://dx.doi.org/10.1177/01466219922031338.

EGGEN, T. J. H. M. What is the purpose of CAT. Presidential address at the 2nd Conference of the International Association for Computerized Adaptive Testing, Pacific Grove, CA, October 4th, 2011.

FOX, J. P.; GLAS, C. A.W. Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, New York, v. 66, p. 271-288, 2001. DOI: http://dx.doi.org/10.1007/BF02294839.

FOX, J. P. Bayesian item response modeling: theory and applications. New York: Springer, 2010. DOI: http://dx.doi.org/10.1007/978-1-4419-0742-4.

GEMAN, S.; GEMAN, D. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, New York, v. 6, p. 721-741, 1984.

GLAS, C. A. W. The derivation of some tests for the Rasch model from the multinomial distribution. Psychometrika, New York, v. 53, p. 525-546, 1988.

GLAS, C.A.W.; VAN DER LINDEN, W. J. Computerized adaptive testing with item cloning. Applied Psychological Measurement, Thousand Oaks, CA, v. 27, p. 247-261, 2003. DOI: http://dx.doi.org/10.1177/0146621603254291.

GUYER, R. D. (2008). Effects of early misfit in computerized adaptive testing on the recovery of theta. 2008. Unpublished doctoral dissertation of the University of Minnesota (MN).

HAMBLETON, R. K.; JONES, R. W. Item parameter estimation errors and their influence on test information functions. Applied Measurement in Education, Lincoln, NE, v. 7, n. 3, p. 171-186, 1994.

KINGSBURY, G. G.; ZARA, A. R. Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, Lincoln, NE, v. 2, p. 359-375, 1989. DOI: http://dx.doi.org/10.1207/s15324818ame0204_6.

LORD, F. A. Theory of Test Scores. Richmond, VA: Psychometric Corporation, 1952. (Psychometric Monograph, n. 7).

LORD, F. M. Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates, 1980.

LUECHT, R. M. Computer-assisted test assembly using optimization heuristics. Applied Psychological Measurement, Thousand Oaks, CA, v. 22, p. 224-236, 1998. DOI: http://dx.doi.org/10.1177/01466216980223003.

LUECHT, R.M. Adaptive computer-based tasks under an assessment engineering paradigm. In: WEISS, D. J. (Ed.). Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing, 2009. Available from: . Accessed: 05th April 2012.

MAIJ-DE MEIJ, A. M. et al. Connector Ability; Professional Manual. Utrecht, The Netherlands: PiCompany B. V., 2008.

MAKRANSKY, G. An automatic online calibration design in adaptive testing. In: WEISS, D. J. (Ed.). Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing, 2009. Available from: . Accessed: 05th May 2012.

MATHWORKS. MATLAB 7.1: user manual. Natick, MA: Math Works Inc., 2005.

MATTEUCCI, M.; MIGNANI, S.; VELDKAMP, B. P. Issues on item response theory modeling. In: BINI, M.; MONARI, P., PICCOLO, D.; SALMASO, L. (Ed.). Statistical models for the evaluation of educational services and quality of products. Berlin Heidelberg: Springer Verlag, 2009. p. 29-45.

MATTEUCCI, M.; MIGNANI, S.; VELDKAMP, B. P. Prior distributions for item parameters in IRT models. Communications in Statistics, Theory and Methods, Philadelphia, PA, v. 41, p. 2944-2958, 2012.

MATTEUCCI, M.; VELDKAMP, B. P. The use of MCMC CAT with empirical prior information to improve the efficiency of CAT. Statistical Methods and Applications, Heidelberg, 2012. In press.

OSTINI, R.; NERING, M. L. Polytomous Item Response Theory Models. Thousand Oaks, CA: Sage Publications, 2006.

OWEN, R. J. A Bayesian sequential procedure for quantal response in the context of adaptive testing. Journal of the American Statistical Association, [S. l.], v. 70, p. 351-356, 1975.

RECKASE, M. D. Multidimensional item response theory. New York: Springer, 2009. DOI: http://dx.doi.org/10.1007/978-0-387-89976-3.

REVUELTA, J.; PONSODA, V. A comparison of item-exposure control methods in computerized adaptive testing. Journal of Educational Measurement, Washington, DC, v. 38, p. 311-327, 1998. DOI: http://dx.doi.org/10.1111/j.1745-3984.1998.tb00541.x.

RUDNER, L. M. Implemention the Graduate Management Admission Test Computerized Adaptive Test. In: VAN DER LINDEN, W. J.; GLAS, C. A.W. (Ed.). Elements of adaptive testing. New York: Springer, 2010. p. 151-165. DOI: http://dx.doi.org/10.1007/978-0-387-85461-8¬¬_8.

SANDS, W. A.; WATERS, B. K.; MCBRIDE, J. R. Computerized adaptive testing: from inquiry to operation. Washington, DC: American Psychological Association, 1997. DOI: http://dx.doi.org/10.1037/10244-000

SCHEERENS, J.; GLAS, C. A.W.; THOMAS, S.A. Educational evaluation, assessment, and monitoring: a systemic approach. Lisse, The Netherlands: Swets and Zeitlinger, 2003.

SEGALL, D. O. Multidimensional adaptive testing. Psychometrika, New York, v. 61, p. 331-354, 1996. DOI: http://dx.doi.org/10.1007/BF02294343.

SHEEHAN, K. M. A tree based approach to proficiency scaling and diagnostic assessment. Journal of Educational Measurement, Washington, DC, v. 34, p. 333-352, 1997. DOI: http://dx.doi.org/10.1111/j.1745-3984.1997.tb00522.x.

SMITH, B. J. Boa: an R package for MCMC output convergence assessment and posterior inference. Journal of Statistical Software, Los Angeles, CA, v. 21, p. 1-37, 2007.

STOCKING, M. L. Three practical issues for modern adaptive item pools. Princeton: Educational Testing Service, 1994. (Research Report 94-5).

STOCKING, M. L.; LEWIS, C. Controlling item exposure conditional on ability in computerized adaptive testing. Journal of Educational and Behavioral Statistics, Washington, DC, v. 23, p. 57-75, 1998. DOI: http://dx.doi.org/10.3102/10769986023001057.

STOCKING, M. L.; SWANSON, L. A method for severely constraint item selection in adaptive testing. Applied Psychological Measurement, Thousand Oaks, CA, v. 17, p. 277-292, 1993. DOI: http://dx.doi.org/10.1177/014662169301700308.

SYMPSON, J. B.; HETTER, R. D. Controlling item-exposure rates in computerized adaptive testing. In: Proceedings of the 27th Annual Meeting of the Military Testing Association. San Diego, CA: Navy Personnel Research and Development Center, 1985. p. 973–977.

THISSEN, D.; MISLEVY, R. J. Testing algorithms. In: WAINER et al. (Ed.). Computerized Adaptive Testing: a primer. Mahwah, NJ: Lawrence Erlbaum Associates, 2000.

VAN DER LINDEN, W. J. Empirical initialization of the trait estimation in adaptive testing. Applied Psychological Measurement, Thousand Oaks, CA, v. 23, p. 21-29, 1999. DOI: http://dx.doi.org/10.1177/01466219922031149.

VAN DER LINDEN, W. J. Bayesian item selection criteria for adaptive testing. Psychometrika, New York, v. 63, p. 201-216, 1998. DOI: http://dx.doi.org/10.1007/BF02294775.

VAN DER LINDEN, W. J. Linear Models for Optimal Test Design. New York: Springer, 2005.

VAN DER LINDEN, W. J. A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, New York, v. 72, p. 287-308, 2007. DOI: http://dx.doi.org/10.1007/s11336-006-1478-z .

VAN DER LINDEN; W. J., ARIEL; A.;VELDKAMP, B. P. Assembling a CAT pool as a set of linear tests. Journal of Educational and Behavioral Statistics, Washington, DC, v. 31, p. 81-100, 2006. DOI: http://dx.doi.org/10.3102/10769986031001081.

VAN DER LINDEN, W. J., & GLAS, C. A. W. Elements of adaptive testing. New York: Springer, 2010. DOI: http://dx.doi.org/10.1007/978-0-387-85461-8.

VAN DER LINDEN, W.J.; PASHLEY, P. J. Item selection and ability estimation in adaptive testing. In: VAN DER LINDEN, W. J.; GLAS, C. A. W. (Ed.). Elements of adaptive testing (pp. 3-30). New York: Springer, 2010. DOI: http://dx.doi.org/10.1007/978-0-387-85461-8_1.

VAN DER LINDEN, W. J.; REESE, L. M. A model for optimal constrained adaptive testing. Applied Psychological Measurement, Thousand Oaks, CA, v. 22, p. 259-270, 1998. DOI: http://dx.doi.org/10.1177/01466216980223006.

VAN DER LINDEN, W. J.; VELDKAMP, B. P. Constraining item exposure in computerized adaptive testing with shadow tests. Journal of Educational and Behavioral Statistics, Washington, DC, v. 29, p. 273-291, 2004. DOI: http://dx.doi.org/10.3102/10769986029003273.

VAN DER LINDEN, W. J.; VELDKAMP, B. P. Conditional item exposure control in adaptive testing using item-ineligibility probabilities. Journal of Educational and Behavioral Statistics, Washington, DC, v. 32, p. 398-417, 2007. DOI: http://dx.doi.org/10.3102/1076998606298044.

VEERKAMP, W. J. J.; BERGER, M. P. F. Some new item selection criteria for computerized adaptive testing. Journal of Educational and Behavioral Statistics, Washington, DC, v. 22, p. 203-226, 1997. DOI: http://dx.doi.org/10.3102/10769986022002203.

VELDKAMP, B. P. Bayesian item selection in constrained adaptive testing using shadow tests. Psicologica, v. 31, p. 149-169, 2010.

VELDKAMP, B. P. Application of robust optimization to automated test assembly. Annals of Operations Research, New York, 2012. DOI: http://dx.doi.org/10.1007/s10479-012-1218-y.

VELDKAMP, B. P.; VAN DER LINDEN, W. J. Designing item pools for Computerized Adaptive Testing. In: VAN DER LINDEN, W. J.; GLAS, C. A. W. (Ed.). Computerized Adaptive Testing: theory and practice. Boston, MA: Kluwer Academic Publishers, 2000. p. 149-162.

VELDKAMP, B. P.; VAN DER LINDEN, W. J. Multidimensional constrained adaptive testing. Psychometrika, New York, v. 67, p. 575-588, 2002. DOI: http://dx.doi.org/10.1007/BF02295132.

VELDKAMP, B. P.; VERSCHOOR, A. J.; EGGEN, T. J. J. M. A multiple objective test assembly approach for exposure control problems in Computerized Adaptive Testing. Psicologica, Valencia, v. 31, p. 335-355, 2010.

VERSCHOOR, A. J.; STRAETMANS, G. J. J. M. MathCAT: a flexible testing system in Mathematics Education for Adults. In: VAN DER LINDEN, W.J.; GLAS, C.A.W. (Ed.). Elements of adaptive testing. New York: Springer, 2010. p. 137-156. DOI: http://dx.doi.org/10.1007/978-0-387-85461-8¬¬_7.

WARM, T. A. Weighted Likelihood Estimation of Ability in Item Response Theory. Psychometrika, New York, v. 54, p. 427-450, 1989. DOI: http://dx.doi.org/10.1007/BF02294627.

WAINER, H. Some practical considerations when converting a linearly administered test to an adaptive format. Educational Measurement: Issues and Practice, Hoboken, NJ, v. 12, p. 15-20, 1993. DOI: http://dx.doi.org/10.1111/j.1745-3992.1993.tb00519.x.

WAINER, H. et al. Computerized adaptive testing: a primer. Mahwah, NJ: Lawrence Erlbaum Associates, 2000.

WAY, W. D.; STEFFEN, M.; ANDERSON, G. S. (1998). Developing, maintaining, and renewing the item inventory to support computer-based testing. In: MILLS, C. N. (Ed.) et al. Computer-based testing: building the foundation for future assessments. Hillsdale, NJ: Lawrence Erlbaum Associates, 1998. p. 89-102.

WEISS, D. J. The stratified adaptive computerized ability test. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory, 1973. (Research Report 73-3).

WEISSMAN, A. Mutual information item selection in adaptive classification testing. Educational and Psychological Measurement, Thousand Oaks, CA, v. 67, p. 41-58, 2007. DOI: http://dx.doi.org/10.1177/0013164406288164.

ZIMOWSKI, M. F. et al. BILOG-MG: multiple-group IRT analysis and test maintenance for binary items. Chicago: Scientific Software, 1996.

ZIMOWSKI, M. F. et al. BILOG-MG3 User guide. Chicago: SSI Central, 2003.

ZWINDERMAN, A. H. A generalized Rasch model for manifest predictors. Psychometrika, New York, v. 56, p. 589-600, 1991. DOI: http://dx.doi.org/10.1007/BF02294492.

ZWINDERMAN, A. H. Response models with manifest predictors. In: VAN DER LINDEN, W. J.; HAMBLETON, R. K. (Ed.). Handbook of modern item response theory. New York: Springer-Verlag, 1997. p. 245-256.


Apontamentos

  • Não há apontamentos.




Direitos autorais 2016 Revista Ensaio: Avaliação e Politicas Públicas em Educação

Licença Creative Commons
Este obra está licenciado com uma Licença Creative Commons Atribuição-NãoComercial 4.0 Internacional.

SCImago Journal & Country Rank