Abstract
This paper compares three methods of item calibration—concurrent calibration, separate calibration with linking, and fixed item parameter calibration—that are frequently used for linking item parameters to a base scale. Concurrent and separate calibrations were implemented using BILOG-MG. The Stocking and Lord in Appl Psychol Measure 7:201–210, (1983) characteristic curve method of parameter linking was used in conjunction with separate calibration. The fixed item parameter calibration (FIPC) method was implemented using both BILOG-MG and PARSCALE because the method is carried out differently by the two programs. Both programs use multiple EM cycles, but BILOG-MG does not update the prior ability distribution during FIPC calibration, whereas PARSCALE updates the prior ability distribution multiple times. The methods were compared using simulations based on actual testing program data, and results were evaluated in terms of recovery of the underlying ability distributions, the item characteristic curves, and the test characteristic curves. Factors manipulated in the simulations were sample size, ability distributions, and numbers of common (or fixed) items. The results for concurrent calibration and separate calibration with linking were comparable, and both methods showed good recovery results for all conditions. Between the two fixed item parameter calibration procedures, only the appropriate use of PARSCALE consistently provided item parameter linking results similar to those of the other two methods.
Similar content being viewed by others
References
Baldwin, S. G., Baldwin, P., & Nering, M. L. (2007). A comparison of IRT equating methods on recovering item parameters and growth in mixed-format tests. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Hanson, B. A., & Béguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26, 3–24.
Keller, R. R., Keller, L. A., & Baldwin, S. (2007). The effect of changing equating methods on monitoring growth in mixed-format tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.
Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43, 355–381.
Kolen, M. J., & Brennan, R. L. (1995). Test equating: Methods and practices. New York: Springer.
Linacre, J. M. (2003). WINSTEPS [Computer Program]. Chicago: MESA Press.
Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179–193.
Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139–160.
Paek, I., & Young, M. J. (2005). Investigation of student growth recovery in a fixed-item linking procedure with a fixed-person prior distribution for mixed-format test data. Applied Measurement in Education, 18, 199–215.
Skorupski, W. P., Jodoin, M. G., Keller, L. A., & Swaminathan, H. (2003). An evaluation of item response theory equating procedures for capturing growth with tests composed of dichotomously scored items. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.
Thissen, D. (1991). Multilog user’s guide: Multiple categorical item analysis and test scoring using item response theory [Computer program]. Chicago: Scientific Software International.
Acknowledgments
This work was supported by the Sungshin Women’s University Research Grant of 2011.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
-
1.
Concurrent Calibration for both base and target groups (BILOG-MG)
>COMMENT
>GLOBAL DFNAME = ’BLMcom200.dat’,NPARM = 3,SAVE;
>SAVE PARM = ’BLMcom200.par’;
>LENGTH NITEMS = 60;
>INPUT NTOT = 60, NID = 4, NGROUP = 2, NFNAME = ’c:\FIPC\simu\keynot.txt’;
>ITEMS INUM = (1(1)60),INAMES = (OL01(1)OL10, CO01(1)CO40, NE01(1)NE10);
>TEST TNAME = Math;
>GROUP1 GNANE = ’BASE’, LENGTH = 50, INUM = (1(1)50);
>GROUP2 GNANE = ’TARGET’, LENGTH = 50, INUM = (21(1)60);
(4A1,1X,I1,1X,60A1)
>CALIB NQPT = 11, cycles = 3000, CRIT = 0.001, REF = 1, TPRIOR;
>SCORE NOPRINTS;
-
2.
Separate Calibration for a target group (BILOG-MG)
>COMMENT
>GLOBAL DFNAME = ’new200.dat’,NPARM = 3,SAVE;
>SAVE PARM = ’BLMnew200.par’;
>LENGTH NITEMS = 50;
>INPUT NTOT = 50, NALT = 5, NID = 4;
>ITEMS INUM = (1(1)50),INAMES = (CO01(1)CO40, NE01(1)NE10);
>TEST TNAME = Simulation;
(4A1,T1,50A1)
>CALIB NQPT = 11, cycles = 3000, CRIT = 0.001, TPRIOR;
>SCORE NOPRINTS;
-
3.
Fixed Item Parameter Calibration for a target group (BILOG-MG)
>COMMENT
>GLOBAL DFNAME = ’new200.dat’, PRNAME = ’BLMOLD200.PRM’, NPARM = 3, SAVE;
>SAVE PARM = ’BLMfix200.par’;
>LENGTH NITEMS = 50;
>INPUT NTOT = 50, NALT = 5, NID = 4;
>ITEMS INUM = (1(1)50),INAMES = (CO01(1)CO40, NE01(1)NE10);
>TEST TNAME = Math, FIX = (1(0)40,0(0)10);
(4A1,T1,50A1)
>CALIB NQPT = 11, cycles = 3000, CRIT = 0.001, TPRIOR, NOADJUST;
>SCORE NOPRINTS;
-
4.
Fixed Item Parameter Calibration for a target group (PARSCALE)
>COMMENT
>FILE DFNAME = ’new200.dat’, IFNAME = ’PSLold200.prm’, SAVE;
>SAVE PARM = ’fix200.par’;
>INPUT NIDCH = 4, NTOTAL = 50, NTEST = 1, LENGTH = 50, NFMT = 1;
(4A1, T1, 50A1)
>TEST TNAME = Math, ITEM = (01(1)50), NBLOCK = 50,
INAMES = (
CO01, CO02, CO03, CO04, CO05, CO06, CO07, CO08, CO09, CO10,
CO11, CO12, CO13, CO14, CO15, CO16, CO17, CO18, CO19, CO20,
CO21, CO22, CO23, CO24, CO25, CO26, CO27, CO28, CO29, CO30,
CO31, CO32, CO33, CO34, CO35, CO36, CO37, CO38, CO39, CO40,
NE01, NE02, NE03, NE04, NE05, NE06, NE07, NE08, NE09, NE10);
>BLOCK1 BNAME = COMMON, NITEM = 1, NCAT = 2,
ORI = (0,1), MOD = (1,2), GPARM = 0.2, GUESS = (2, EST), REP = 40, SKIP;
>BLOCK2 BNAME = UNIQUE, NITEM = 1, NCAT = 2,
ORI = (0,1), MOD = (1,2), GPARM = 0.2, GUESS = (2, EST), REP = 10;
>CALIB PARTIAL, LOGISTIC, SCALE = 1.7, NQPT = 41, CYCLE = (3000,1,1,1,1),
FREE = (NOADJUST, NOADJUST), POSTERIOR, NEWTON = 0, CRIT = 0.001, ITEMFIT = 10, SPRIOR, GPRIOR;
>SCORE;
Rights and permissions
About this article
Cite this article
Kang, T., Petersen, N.S. Linking item parameters to a base scale. Asia Pacific Educ. Rev. 13, 311–321 (2012). https://doi.org/10.1007/s12564-011-9197-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12564-011-9197-2