Ackerman, T.A., Simpson, M.A., & de la Torre, J. (2000). A comparison of the dimensionality of TOEFL response data from different first language groups. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New Orleans, Louisiana.
Alderman, D., & Holland, P. (1981). Item performance across native language groups on the TOEFL. TOEFL Research Report Series, 9, 1-106. Princeton, NJ: Educational Testing Service.
American Education Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). The standards for educational and psychological testing.Washington, DC: AERA Publications.
Aryadoust, V., Goh, C., & Lee, O. K. (2011). An investigation of differential item functioning in the MELAB Listening Test. Language Assessment Quarterly, 8(4), 361–385. DOI: 10.1080/15434303.2011.628632
Aryadoust, V. (2012). Differential Item Functioning in While-Listening Performance Tests: The Case of the International English Language Testing System (IELTS) Listening Module. International Journal of Listening, 26(1), 40–60. DOI: 10.1080/10904018.2012.639649.
Barati, H., & Ahmadi, A. R. (2010). Gender-based DIF across the subject area: A study of the Iranian national university entrance exam. The Journal of Teaching Language Skills, 2(3), 1-26.
Brati, H., Ketabi, S., & Ahmadi, A. (2006). Differential item functioning in high stakes tests: The effect of field of study. Iranian journal of applied linguistics, 9(2), 27-49.
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences. London, UK: Erlbaum.
Camilli, G., & Shepard, L. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.
Du, Y. (1995). When to adjust for differential item functioning. Rasch Measurement Transactions, 9(1), 414.
Ferne, T., & Rupp, A. A. (2007). A synthesis of 15 years of research on DIF in language testing: Methodological advances, challenges and recommendations. Language Assessment Quarterly, 4(
2), 113-148. https://doi.org/10.1080/15434300701375923
Geranpayeh, A., & Kunnan, A. J. (2007). Differential item functioning in terms of age in the certificate in advanced English examination. Language Assessment Quarterly
(2), 190–222. https://doi.org/10.1080/15434300701375758
Ginther, A., & Stevens, J. (1998). Language background and ethnicity, and the internal construct validity of the Advanced Placement Spanish Language Examination. In A.J. Kunnan (Ed.), Validation in language assessment (pp. 169–94). Mahwah, NJ: Lawrence Erlbaum.
Gipps, C., & Stobart, G. (2009). Fairness in assessment. In C. Wyatt-Smith & J. Cumming (Eds.), Educational assessment in 21st century: Connecting theory and practice (pp. 105-118). Netherlands: Springer Science+Business Media.
Hale, G.A., Rock, D.A., & Jirele, T. (1989). Confirmatory factor analysis of the Test of English as a Foreign Language. TOEFL Research Report, 32, 89-42. Princeton, NJ: Educational Testing Service.
Kunnan, A.J. (1994). Modelling relationships among some test-taker characteristics and performance on EFL tests: an approach to construct validation. Language Testing, 11
(3), 225–52. https://doi.org/10.1177%2F026553229401100301
Linacre, J. M. (1998a). Detecting multidimensionality: Which residual data-type works best? Journal of Outcome Measurement, 2, 266–283.
Linacre, J. M. (2002). What do infit and outfit, mean-square and standardized mean? Rasch Measurement Transactions, 16, 878.
Linacre, J. M. (2010). A user’s guide to WINSTEPS. Chicago, IL: Winsteps.com.
Linacre, J. M. (2012). A user’s guide to WINSTEPS. Chicago, IL: Winsteps.com.
Linacre, J.M. (2021). Winsteps® Rasch measurement computer program (Version 5.1). Winsteps.com.
Linacre, J. M., & Wright, B. D. (1994). Chi-square fit statistics. Rasch Measurement Transactions, 8, 350.
McNamara, T., & Ryan, K. (2011). Fairness versus Justice in Language Testing: The Place of English Literacy in the Australian Citizenship Test. Language Assessment Quarterly, 8, 161-78.
Mazor, K. M., Clauser, B. E., & Hambleton, R. K. (1994). Identification of non-uniform differential item functioning using a variation of the Mantel–Haenszel procedure. Educational and Psychological Measurement
(2), 284–291. https://doi.org/10.1177%2F0013164494054002003
Oltman, P.K., Stricker, L.J. & Barrows, T. (1988). Native language, English proficiency, and the structure of the Test of English as a Foreign Language for several language groups. TOEFL Research Report. 27, 88-26. Princeton, NJ: Educational Testing Service.
Prieto Maranon, P., Barbero Garcia, M. I., & San Luis Costas, C. (1997). Identification of nonuniform differential item functioning: a comparison of Mantel–Haenszel and item response theory analysis procedures. Educational and Psychological Measurement
(4), 559–569. https://doi.org/10.1177%2F0013164497057004002
Roussos, L. A., & Stout, W. F. (2004). Differential item functioning analysis. In D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 107–116). Thousand Oaks, CA: Sage.
Roznowski, M., & Reith, J. (1999). Examining the measurement quality of tests containing differentially functioning items: Do biased items result in poor measurement? Educational and Psychological Measurement
(2), 248–269. https://doi.org/10.1177%2F00131649921969839
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group differences and detects test bias/DIF as well as item bias/DIF. Psychometrika
(2), 159–194. https://doi.org/10.1007/BF02294572
Smith, R. M. (1996). Polytomous mean-square fit statistics. Rasch Measurement Transactions, 10(3), 516–517.
Swaminathan, H. (1994). Differential item functioning: A discussion. In D. Laveault, B. D. Zumbo, M. E. Gessaroli, & M. W. Boss (Eds.), Modern theories of measurement: Problems and issues (pp. 63–86). Ottawa, Ontario, Canada: University of Ottawa.
Swinton, S.S., & Powers, D.E. (1980). Factor analysis of the Test of English as a Foreign Language for several language groups. TOEFL Research Report, 6, 80-32. Princeton, NJ: Educational Testing Service.
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In Holland, P. W. and Wainer, H. W., (Eds.), Differential item functioning. (pp. 67-113). Hillsdale, NJ: Lawrence Erlbaum.
Wright, B. D. (1996). Reliability and separation. Rasch Measurement Transactions, 9, 472.
Wright, B. D. (1994b). Local dependency, correlations, and principal components. Rasch Measurement Transactions, 10(3), 509–511.
Wright, B. D., & Stone, M. H. (1988). Identification of item bias using Rasch measurement. (Research Memorandum No. 55). Chicago, IL: MESA Press.
Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370.
Zenisky, A., Hambleton, R., & Robin, F. (2003). Detection of differential item functioning in large scale state tests: A study evaluating a two-stage approach. Educational and Psychological Measurement
(1), 51–64. https://doi.org/10.1177%2F0013164402239316
Zhang, Y., Matthews-Lopez, J., & Dorans, N. (2003). Using DIF dissection to assess effects of item deletion due to DIF on the performance of SAT I: Reasoning sub-populations. Educational testing Service.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Ontario, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. Retrieved from the British Colombia University Web site: http://educ.ubc.ca/faculty/zumbo/DIF/handbook.pdf
Zumbo, B. D. (2007). Three generations of DIF analysis: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly
(2), 223–233. https://doi.org/10.1080/15434300701375832
Zeidner, M. (1986). Are English language aptitude tests biased towards culturally different minority groups? Some Israeli findings. Language Testing, 3(1), 80–98.