Skip to main content
Log in

Linking item parameters to a base scale

  • Published:
Asia Pacific Education Review Aims and scope Submit manuscript

Abstract

This paper compares three methods of item calibration—concurrent calibration, separate calibration with linking, and fixed item parameter calibration—that are frequently used for linking item parameters to a base scale. Concurrent and separate calibrations were implemented using BILOG-MG. The Stocking and Lord in Appl Psychol Measure 7:201–210, (1983) characteristic curve method of parameter linking was used in conjunction with separate calibration. The fixed item parameter calibration (FIPC) method was implemented using both BILOG-MG and PARSCALE because the method is carried out differently by the two programs. Both programs use multiple EM cycles, but BILOG-MG does not update the prior ability distribution during FIPC calibration, whereas PARSCALE updates the prior ability distribution multiple times. The methods were compared using simulations based on actual testing program data, and results were evaluated in terms of recovery of the underlying ability distributions, the item characteristic curves, and the test characteristic curves. Factors manipulated in the simulations were sample size, ability distributions, and numbers of common (or fixed) items. The results for concurrent calibration and separate calibration with linking were comparable, and both methods showed good recovery results for all conditions. Between the two fixed item parameter calibration procedures, only the appropriate use of PARSCALE consistently provided item parameter linking results similar to those of the other two methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Baldwin, S. G., Baldwin, P., & Nering, M. L. (2007). A comparison of IRT equating methods on recovering item parameters and growth in mixed-format tests. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.

  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149.

    Google Scholar 

  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

    Google Scholar 

  • Hanson, B. A., & Béguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26, 3–24.

    Article  Google Scholar 

  • Keller, R. R., Keller, L. A., & Baldwin, S. (2007). The effect of changing equating methods on monitoring growth in mixed-format tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

  • Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43, 355–381.

    Article  Google Scholar 

  • Kolen, M. J., & Brennan, R. L. (1995). Test equating: Methods and practices. New York: Springer.

    Google Scholar 

  • Linacre, J. M. (2003). WINSTEPS [Computer Program]. Chicago: MESA Press.

    Google Scholar 

  • Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179–193.

    Article  Google Scholar 

  • Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139–160.

    Article  Google Scholar 

  • Paek, I., & Young, M. J. (2005). Investigation of student growth recovery in a fixed-item linking procedure with a fixed-person prior distribution for mixed-format test data. Applied Measurement in Education, 18, 199–215.

    Article  Google Scholar 

  • Skorupski, W. P., Jodoin, M. G., Keller, L. A., & Swaminathan, H. (2003). An evaluation of item response theory equating procedures for capturing growth with tests composed of dichotomously scored items. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

  • Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.

    Article  Google Scholar 

  • Thissen, D. (1991). Multilog user’s guide: Multiple categorical item analysis and test scoring using item response theory [Computer program]. Chicago: Scientific Software International.

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Sungshin Women’s University Research Grant of 2011.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taehoon Kang.

Appendix

Appendix

  1. 1.

    Concurrent Calibration for both base and target groups (BILOG-MG)

>COMMENT

>GLOBAL DFNAME = ’BLMcom200.dat’,NPARM = 3,SAVE;

>SAVE PARM = ’BLMcom200.par’;

>LENGTH NITEMS = 60;

>INPUT NTOT = 60, NID = 4, NGROUP = 2, NFNAME = ’c:\FIPC\simu\keynot.txt’;

>ITEMS INUM = (1(1)60),INAMES = (OL01(1)OL10, CO01(1)CO40, NE01(1)NE10);

>TEST TNAME = Math;

>GROUP1 GNANE = ’BASE’, LENGTH = 50, INUM = (1(1)50);

>GROUP2 GNANE = ’TARGET’, LENGTH = 50, INUM = (21(1)60);

(4A1,1X,I1,1X,60A1)

>CALIB NQPT = 11, cycles = 3000, CRIT = 0.001, REF = 1, TPRIOR;

>SCORE NOPRINTS;

  1. 2.

    Separate Calibration for a target group (BILOG-MG)

>COMMENT

>GLOBAL DFNAME = ’new200.dat’,NPARM = 3,SAVE;

>SAVE PARM = ’BLMnew200.par’;

>LENGTH NITEMS = 50;

>INPUT NTOT = 50, NALT = 5, NID = 4;

>ITEMS INUM = (1(1)50),INAMES = (CO01(1)CO40, NE01(1)NE10);

>TEST TNAME = Simulation;

(4A1,T1,50A1)

>CALIB NQPT = 11, cycles = 3000, CRIT = 0.001, TPRIOR;

>SCORE NOPRINTS;

  1. 3.

    Fixed Item Parameter Calibration for a target group (BILOG-MG)

>COMMENT

>GLOBAL DFNAME = ’new200.dat’, PRNAME = ’BLMOLD200.PRM’, NPARM = 3, SAVE;

>SAVE PARM = ’BLMfix200.par’;

>LENGTH NITEMS = 50;

>INPUT NTOT = 50, NALT = 5, NID = 4;

>ITEMS INUM = (1(1)50),INAMES = (CO01(1)CO40, NE01(1)NE10);

>TEST TNAME = Math, FIX = (1(0)40,0(0)10);

(4A1,T1,50A1)

>CALIB NQPT = 11, cycles = 3000, CRIT = 0.001, TPRIOR, NOADJUST;

>SCORE NOPRINTS;

  1. 4.

    Fixed Item Parameter Calibration for a target group (PARSCALE)

>COMMENT

>FILE DFNAME = ’new200.dat’, IFNAME = ’PSLold200.prm’, SAVE;

>SAVE PARM = ’fix200.par’;

>INPUT NIDCH = 4, NTOTAL = 50, NTEST = 1, LENGTH = 50, NFMT = 1;

(4A1, T1, 50A1)

>TEST TNAME = Math, ITEM = (01(1)50), NBLOCK = 50,

 INAMES = (

 CO01, CO02, CO03, CO04, CO05, CO06, CO07, CO08, CO09, CO10,

 CO11, CO12, CO13, CO14, CO15, CO16, CO17, CO18, CO19, CO20,

 CO21, CO22, CO23, CO24, CO25, CO26, CO27, CO28, CO29, CO30,

 CO31, CO32, CO33, CO34, CO35, CO36, CO37, CO38, CO39, CO40,

 NE01, NE02, NE03, NE04, NE05, NE06, NE07, NE08, NE09, NE10);

>BLOCK1 BNAME = COMMON, NITEM = 1, NCAT = 2,

 ORI = (0,1), MOD = (1,2), GPARM = 0.2, GUESS = (2, EST), REP = 40, SKIP;

>BLOCK2 BNAME = UNIQUE, NITEM = 1, NCAT = 2,

 ORI = (0,1), MOD = (1,2), GPARM = 0.2, GUESS = (2, EST), REP = 10;

>CALIB PARTIAL, LOGISTIC, SCALE = 1.7, NQPT = 41, CYCLE = (3000,1,1,1,1),

 FREE = (NOADJUST, NOADJUST), POSTERIOR, NEWTON = 0, CRIT = 0.001, ITEMFIT = 10, SPRIOR, GPRIOR;

>SCORE;

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, T., Petersen, N.S. Linking item parameters to a base scale. Asia Pacific Educ. Rev. 13, 311–321 (2012). https://doi.org/10.1007/s12564-011-9197-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12564-011-9197-2

Keywords

Navigation