Measurement of beliefs about parenting and what babies need is important because these beliefs, and how they change during the transition to parenthood, are an important window into understanding parenting behaviours, wellbeing, and their impact on child outcomes (Galbally et al.,
2018; Hughes et al.,
2012; Smetana,
2017; Tikotzky & Sadeh,
2009). The Baby Care Questionnaire (BCQ; Winstanley & Gattis,
2013) is a valid and reliable psychometric assessment tool to measure parenting beliefs about infant care across fundamental care domains. However, this parenting belief measure is an ordinal scale, which has limitations (Tennant & Conaghan,
2007). Technically, scores collected from ordinal measurement are inappropriate for statistical parametric testing because ordinal data cannot be added, subtracted, multiplied or divided, so the calculation of means and standard deviations are invalid (Linacre,
2004; Merbitz et al.,
1989; Norquist et al.,
2004). Rasch methodology (Rasch,
1960,
1961) is a solution to enhance the psychometric properties of an ordinal scale to estimate an interval-level measurement if the strict assumptions of the unidimensional Rasch model are met (Hobart & Cano,
2009; Merkin et al.,
2020; Tennant & Conaghan,
2007). This method can be used to develop ordinal-to-interval conversion algorithms to transform ordinal scores into interval-level scores, which provide a better way to monitor changes in parenting beliefs about infant care and more confidently use parametric statistics to evaluate the relationships between parenting beliefs and other key variables (Linacre,
2004).
The Baby Care Questionnaire
The BCQ (Winstanley & Gattis,
2013) measures parenting beliefs about infant care using two scales, structure and attunement, across different contexts such as sleeping, feeding, and soothing. The attunement scale measures belief in the value of reading and responding to infant cues and identifying infants’ needs and states (e.g., hunger versus satiety, distress versus calm). The structure scale measures belief in the value of regularity and routines in infant care (e.g., time schedules for breastfeeding and sleeping). Parents are asked to rate their agreement versus disagreement using a four-point Likert scale (1 =
strongly disagree to 4 =
strongly agree) on each individual item of the two scales, such as “When babies cry in the night to check if someone is near, it is best to leave them” (negatively coded attunement item) and “It is important to introduce a sleeping schedule as early as possible” (positively coded structure item). Studies demonstrated good psychometric properties of the BCQ (Gattis et al.,
2022; Winstanley & Gattis,
2013). The exploratory factor analysis and confirmatory factor analysis confirmed the structural validity of the two independent scales of the BCQ which were attunement and structure (Winstanley & Gattis,
2013). Studies also demonstrated that each scale achieved strong reliability and validity as measures of parenting beliefs about infant care (Gattis et al.,
2022; Mascheroni et al.,
2022; Winstanley & Gattis,
2013; Winstanley et al,
2014). Specifically, research indicated that both structure and attunement belief scores captured by the BCQ were related to the frequency and duration of responsive and demanding parenting behaviours (Gattis et al.,
2022). In addition, attunement and structure beliefs were related to parenting experience. For attunement, pregnant women who were experienced mothers (i.e., already had at least one child) had higher scale scores compared to pregnant women expecting their first child, while for structure, pregnant women expecting their first child had higher scale scores compared to pregnant women who were experienced mothers (Mascheroni et al.,
2022).
Even though both scales of the BCQ are well-validated measures of parenting beliefs about infant care, these scales are still ordinal measures and therefore have some limitations (Tennant & Conaghan,
2007). That is, the distance between ordinal response categories of individual items, such as 1 and 2 versus 2 and 3, are not the same, meaning that an ordinal scale is unable to reflect actual change as accurately as an interval scale does (Masters,
1982; Truong et al.,
2021). In addition, when requiring parametric statistics (e.g., means and standard deviations), ordinal scores cannot be used as they do not meet their arithmetic assumptions. Studies have shown that using parametric statistics with ordinal scores raises concerns about whether correct inferences can be drawn, which potentially impairs the control of Type I and Type II errors, and statistical power (Zumbo & Zimmerman,
1993; Verhulst & Neale,
2021). The use of interval-level data can minimise these concerns by improving reliability and internal validity of measurement (Jamieson,
2004; Merbitz et al.,
1989). Rasch analysis (Rasch,
1960) is a method to resolve such issues and has been increasingly used to investigate psychometric properties of measures and improve their accuracy to approximate an interval-level scale (Hobart & Cano,
2009; Lundgren-Nilsson & Tennant,
2011).
Rasch Analysis
Unlike other statistical methods, such as classical test theory and generalisability theory (Cronbach et al.,
1963), Rasch methodology can accurately estimate the unique contributions of individual items to the overall latent variable (e.g. structure or attunement beliefs) based on sample parameters (Fox & Jones,
1998; Rasch,
1960,
1961). A Rasch model is unidimensional and assumes that the response to a specific scale item is a function of that item’s difficulty and its respondent’s ability (Rasch,
1960,
1961). Rasch analysis also assumes scale invariance, which is tested by investigating Differential Item Functioning (DIF) due to personal characteristics (e.g., mother age, baby sex). DIF is useful to test whether an item of the measure works equally well across sub-groups within the population (Hagquist & Andrich,
2004).
Importantly, when the Rasch model fit is satisfactory, the ordinal scores of a psychometric measure can be converted into interval-level scores (Barber et al.,
2022; Linacre,
2004; Norquist et al.,
2004; Truong et al.,
2023). This is the key advantage of the Rasch model over classical test theory methods because the interval-transformed data will reflect changes on a latent trait more accurately, similar to other interval-level scales such as blood pressure or height (Hobart & Cano,
2009; Rasch,
1960; Tennant & Conaghan,
2007; Wilson,
2004; Wright & Stone,
1979). Additionally, an item-person threshold distribution plotted from the Rasch analysis is another useful tool. This graph is useful to detect possible significant ceiling or floor effects (Medvedev & Krägeloh,
2022). Thus, Rasch analysis can be considered the most advanced statistical methodology to precisely evaluate the reliability and validity of an ordinal measure, as well as enhance its precision to approximate an interval-level scale.
Discussion
This study utilised Rasch methodology to evaluate and improve the psychometric properties of the structure and attunement scales of the Baby Care Questionnaire (BCQ), a well-established ordinal measure of parenting beliefs about infant care. The use of Rasch methodology allowed for the identification of locally dependent items that were combined into testlets to enhance the precision of both scales of the BCQ. We demonstrated that these two BCQ scales, reorganised into testlets, met expectations of the Rasch model, which then allowed us to enhance the accuracy of each scale comparable to interval-level measurement using the ordinal-to-interval conversion algorithms presented in Tables
4 and
5 for the structure and attunement scales respectively. The resulting conversion algorithms can be used to transform the ordinal scores obtained from the BCQ into interval data, which can increase the precision of the measure without changing its original response format. This can be particularly useful in longitudinal studies that track changes in parenting beliefs over time, as well as in cross-cultural research that compare parenting beliefs across different populations.
These ordinal-to-interval conversions are important because individual items of each scale contribute differently to the total score of each scale, which should be accounted for (Stucki et al.,
1996). Sandham et al. (
2019) used a metaphor of squeezing Vitamin C from different fruits to explain how the Rasch model works. Each fruit represents an item on the scale, and each fruit contains differing levels of Vitamin C, which represents the latent trait being measured (e.g., parenting beliefs about infant care). Just as different fruits contribute different amounts of Vitamin C to a smoothie, different items on a scale contribute different amounts of the latent trait to the overall score. Rasch analysis allows for the measurement of the latent construct to the extent that it is contained within each item while filtering out other constructs, manifesting as fit residuals in the model. This increases the precision of measurement and allows for comparisons of accuracy between original ordinal-level scores and their Rasch-converted interval equivalents (Norquist et al.,
2004).
Literature reviews on Rasch methodology concluded that using Rasch interval-transformed scores can reduce measurement error associated with ordinal scores (Truong et al.,
2023). Therefore, interval-transformed scores may reflect the levels of parenting beliefs about infant care more accurately, as well as avoiding the violation of arithmetic assumptions when conducting parametric statistical tests (Leung,
2011). In addition, achieving adequate reliability (PSI = 0.78) for the structure scale and excellent reliability (PSI = 0.92) for the attunement scale adds empirical evidence to support the robust reliability of these BCQ scales in measuring parenting beliefs about infant care. Interval-transformed data can also be appropriate for conducting statistical comparisons with other interval data, such as electrophysiological or neuroimaging data, contributing to improvement of reliability and validity of research results. Therefore, the assumptions of parametric statistical tests can be met using the interval BCQ data (Leung,
2011). Transforming the ordinal scores of both structure and attunement scales of the BCQ into interval scores does not require users to be experts in statistics because we have developed conversion algorithms based on Rasch model estimates. To interpret ordinal-level scores for each scale in our tables (i.e., structure scale in Table
5, attunement scale in Table
6), researchers can use corresponding interval scale scores found on the right-hand side of each table (Tables
5 and
6).
The findings from this study have significant implications for research in the field of child development and developmental psychology. Parenting beliefs about infant care can potentially influence parental behaviours, which can in turn impact child development. Therefore, having a reliable and valid measure of parenting beliefs is crucial for understanding the complex interactions between parenting beliefs and behaviours and child outcomes. After establishing scale invariance, we have confirmed that NZ parents score significantly higher on attunement beliefs compared to the UK sample, while there were no significant differences on structure beliefs between countries. This finding may be due to demographic differences in the samples as there were significantly more first-time mothers in the UK sample. Previous research indicates that experienced mothers report stronger beliefs in attunement compared to first-time parents (Mascheroni et al.,
2022).
The main strength of this study was the application of robust Rasch methodology to evaluate and enhance the psychometric properties of the structure and attunement scales of the BCQ using a randomly selected subsample from two countries (UK and NZ), and then the sample size (
n = 450) satisfied the optimal sample size for Rasch analyses to minimise Type I and Type II errors (Hagell & Westergren,
2016). In addition, the initial analyses detected several items has DIF issues by personal factors (i.e., mother age, infant age, and infant gender), and especially by countries. However, this study found that the modified structure and attunement scales of the BCQ are invariant, working equally well across personal factors (i.e., mother age, infant age, and infant gender), as well as across NZ and UK mothers, which had not been investigated in previous studies. Scale invariance (no DIF) refers to the property of a measurement tool that ensures that the relationships between the items on the scale are consistent across different groups or populations. In other words, it ensures that the same construct is being measured in the same way across different groups, regardless of their culture, language, or background. When a measurement tool is invariant across groups, it means that the scores obtained from different groups can be compared and interpreted meaningfully. Moreover, in our analysis, we observed that interval-transformed Rasch scores showed a smaller effect size compared to ordinal raw scores. This observation reflects the impact of Rasch transformation on the standard deviation component of effect size calculation, rather than indicating a direct reduction in measurement error. Rasch transformation, by converting ordinal data to an interval scale, affects the distribution and variability of scores, which in turn influences the effect size. This transformation enhances the interpretability and linearity of the scores but does not inherently imply a reduction in measurement error.
We noted that there were significant differences on both ordinal raw scores and interval-transformed Rasch attunement scores between the original UK and NZ samples with interval-transformed attunement scores showed smaller effect size compared to its ordinal raw scores. However, there were no significant differences on both ordinal raw scores and interval-transformed Rasch structure scores between the original UK and NZ samples with interval-transformed structure scores showed slightly lower effect size compared to its ordinal raw scores. These may indicate that structure is more similar across cultures (UK vs NZ) compared to attunement, where cultural differences may affect the accuracy of the ordinal assessment scale. However, invariance was established by Rasch modifications for both scales, which is reflected by differences in effect sizes between ordinal and Rasch scores for attunement but not for structure.
When we compare parenting beliefs about structure and attunement between different countries, we need to ensure that the measurement tool (e.g., BCQ) is scale-invariant, so that we can trust that any differences between countries are real and not due to measurement bias. Even if the mean scores between countries are significantly different, we cannot conclude that there is a real difference unless we can demonstrate that the measurement tool is invariant across countries, which we achieved in this study. For example, suppose we are comparing the parenting beliefs of parents from two different countries (UK and NZ) using a scale that has been shown to be invariant across these cultures. If the mean score for parents from the UK is significantly higher than that of parents from NZ, we can interpret this difference as reflecting a real difference in parenting beliefs between the participants from the two samples. As acknowledged earlier, there were more experienced mothers in the NZ sample, potentially accounting for the higher attunement scores in the NZ sample. Therefore, the country difference observed in this study may not reflect a real difference between NZ and UK mothers at large. However, if the scale is not invariant across cultures and DIF occurs, then any differences in mean scores may be due to measurement bias rather than actual differences in parenting beliefs between samples or countries. Scale invariance is essential when comparing parenting beliefs or any other constructs across different cultures or populations. Only when we can demonstrate that the measurement tool is invariant across cultures can we reliably compare mean scores between groups and draw meaningful conclusions about differences in parenting beliefs.
This study is not without limitations. Participants in this study came from convenience samples in two countries and may not be representative of mothers within those two countries. In addition, there are other dimensions that might be relevant to parenting beliefs, besides parent and infant age and infant gender. Therefore, replications of this study should be conducted in samples from other English-speaking countries (and possibly involving translations to other languages), and should consider other potentially relevant personal factors, such as differences among cultures, parenting experience, levels of educational attainment, partnership status of parents, and gender of parents.
In conclusion, the findings of this study demonstrated the reliability and internal validity of both scales of the BCQ in measuring parenting beliefs about infant care. Our minor modifications implemented in the scoring of both scales of the BCQ satisfied expectations of the unidimensional Rasch model and scale invariance across different country samples, infant and parent age and infant gender, as well as resolved local dependency issues. This allowed us to enhance the accuracy of each scale to approximate interval-level scores using the ordinal-to-interval conversion tables. Researchers can use the conversion tables published here to enhance precision of scores on the structure and attunement scales of the BCQ without changing their original questionnaire format. Overall, the findings from this study contribute to the enhancement of precision in measuring parenting beliefs about infant care, which is important for understanding the complex interactions between parents and their infants and for developing interventions to promote positive parenting practices and support healthy child development.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.