FormalPara Key Points for Decision Makers

The findings of 24 included quantitative studies suggest that symptoms tend to be reported as more severe and health as poorer when reported over the last seven days compared to the last day.

The 33 included qualitative studies found that respondents had mixed preferences towards the different recall periods with a slight preference for seven-day recall where symptoms and health impacts varied a lot.

There are research gaps in understanding the impact of a one-day versus seven-day recall period for patients with mental health conditions and when asking positively framed questions.

1 Introduction

Patient-reported outcome measures (PROMs) are validated instruments or questionnaires used to collect information on a patient’s health condition directly from the patient. One class of PROM instrument is that designed to assess the multi-dimensional construct health-related quality of life (HRQoL). Patient-reported outcome measures frame questions to patients within a particular recall period, such as asking about the severity of a symptom experienced or the presence of a symptom within, for example, ‘today’ or ‘the last four weeks’. The choice of recall period may impact upon the answer. Short recall periods may not pick up symptoms or problems if they have not been experienced in that specified short period whereas long recall periods may suffer from recall bias and introduce uncertainty regarding what information respondents draw upon to answer them for example, they may use an assessment of their average symptoms over the time period, their worst symptoms or their recent symptoms [1, 2].

There is ongoing uncertainty around the most suitable recall period for assessing HRQoL [1,2,3]. The optimal recall period is driven by a number of concerns including: the objective of collecting PROM data, the nature and stability of the condition being assessed [4, 5] and the domain of assessment [2].

Patient-reported outcome measures may be collected in order to (i) gain knowledge about a disease trajectory; (ii) monitor and assess individual patients to support clinical decision making; (iii) evaluate care quality; and (iv) assess the effectiveness or cost effectiveness of treatments. The purpose of collecting the PROM data and the information needs of the decision at hand may influence the appropriate recall period [6,7,8,9].

The recall period selected for a PROM may influence the way in which respondents interpret questionnaire items and select relevant information to formulate a response. Poor memory may influence responses when individuals are asked to respond using longer recall periods, and this may differ by health domain (e.g., pain versus fatigue) [3]. For domains influenced by events (e.g., episodes or activities), recall may be impacted by the tendency to remember events as happening more recently than they actually did (referred to as “forward telescoping”), which can influence whether events are considered relevant to the recall period [10].

Longer recall periods may also lead participants to pay increased attention to salient events that are not representative of their general health state throughout the period, which may increase symptom severity reports (see: Kahneman et al. 1993; Stone et al. 2008) [9, 11]. Alternately, longer recall periods may result in reliance upon overall symptom or domain evaluations, rather than drawing upon specific episodes [12]. Reporting of mood-related symptoms may be influenced by longer recall periods that change the interpretation of emotion frequency questions [9]. For example, when referring to anger symptoms, more serious and intense episodes have been reported over a longer time frame [13].

Characteristics of the questionnaire item format may also interact with the influence of recall period on symptom reports. Participants may be influenced by positive or negative framing of questionnaire items (e.g., feeling energetic vs feeling tired) [14] and framing of outcomes in response options (e.g., symptom severity vs frequency). Repeated questionnaire administration may have carry-over effects where current responses influence future responses [15], which is relevant where the use of a short recall period requires repeated administration.

This review updates and refines the scope of previous reviews by adopting a targeted approach to the comparison of a one-day versus seven-day recall period on PROMs. While previous reviews (Schmier et al. 2004 [6]; Stull et al. 2009 [9]) suggest the presence of recall duration effects, they included little evidence specifically on the one-day versus seven-day recall comparison. A particular motivation of this review was to understand the potential impact of recall period on differences between the EQ-5D [17], which adopts the recall period of ‘today’, and the EQ-HWB (EQ Health and Wellbeing [18]) which adopts a recall period of ‘the last 7 days’. Both measures are generic measures used to estimate utility scores for input into economic evaluations of healthcare [19], with EQ-5D focused on health and EQ-HWB on health and wellbeing or broader quality of life. Thus, the primary aim of this review was to determine recall period effects of a one-day versus seven-day recall period for domains included within the EQ-5D or the EQ-HWB.

2 Methods

This systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [20] and was prospectively registered with the PROSPERO database (ID: CRD42021251857).

2.1 Data Sources and Searches

The following sources were searched from database inception to 30 November 2021: MEDLINE, EMBASE, PsycINFO, Web of Science, EconLit, CINAHL Complete, Cochrane Library, and Sociological Abstracts. Search keywords were developed in consultation with an Academic Librarian and included ‘Patient Report’ OR ‘PROMs’ AND ‘Recall Duration’ and related terms, in addition to the recall duration comparator condition (see Supplementary Information SISearch terms). Findings were limited to the English language. Manual searches were conducted across the reference lists of recovered articles and relevant systematic reviews. Unpublished studies were sought from researchers affiliated with the EuroQol Research Group who own the Intellectual Property Rights to the EQ-5D and the EQ-HWB.

2.2 Study Selection

2.2.1 Inclusion Criteria

This review included studies that compared a one-day (or ‘24-h’) versus seven-day (or weekly) recall period condition on patient-reported scores for PROM and HRQoL instrument scores in adult populations (aged ≥ 18 years) or combined paediatric and adult populations with a majority of respondents aged over 18. The one-day recall condition included recall instructions of “over the last (or past) day”, “over the last (or past) 24 hours”, or “today”. The seven-day recall condition included recall instructions of “over the last (or past) seven days”, “over the last (or past) week”, or “this week”.

Four categories of studies were included:

  1. a)

    Studies which made comparisons between single or multiple items on the domains covered in the EQ-HWB or EQ-5D. These included: physical functioning (mobility, self-care [or personal care], daily activities, meaningful activities, hearing, vision), pain (pain and discomfort), cognition (memory and concentration), psychosocial wellbeing (loneliness, belonging, support, coping, self-worth, anxiety, depression, hope, safety/fear, anger/frustration), sleep-related symptoms (sleep disturbance, fatigue).

  2. b)

    Studies which made comparisons based on overall summary scores of HRQoL.

  3. c)

    Studies which used multi-item instruments to measure disease specific signs and symptoms where measures had an aggregate score (e.g., respiratory symptom severity in chronic obstructive pulmonary disease) [21]. Although less relevant to the central interest of comparing generic domains from the EQ-5D and EQ-HWB instruments summary symptom scores were included as a means of confirming findings.

  4. d)

    Qualitative studies exploring patient-reported perspectives of the suitability of a one-day versus seven-day recall duration period in PROM and HRQoL instruments.

2.2.2 Exclusion Criteria

Studies were excluded if the sample of participants was aged ≤ 18 years exclusively. Studies were excluded if they did not compare both a one-day and seven-day recall period, or if they assessed health behaviours only (e.g., tobacco smoking, physical activity). Studies were excluded if ecological momentary assessment (EMA [22]) was used to derive an index of daily recall, or if studies incorporated clinician reports of patient symptoms. Single-item condition-specific symptom reports (i.e., vomiting in the context of cancer treatment) [23] were considered less relevant for recall period comparisons of generic domains of health and wellbeing that are included in the EQ-HWB and EQ-5D instruments, which is the focus of interest for this review, and were therefore excluded from the scope of this review. Studies not available in the English language were excluded.

2.3 Data Extraction

After removal of duplicates, two researchers (JC and TP) independently screened titles and abstracts. Both authors applied eligibility criteria, and a final list of included articles was developed through consensus. Data were extracted from the included articles using a predetermined data extraction form by JC and cross-checked for accuracy by TP. Data extracted from quantitative studies included: participant and study characteristics (diagnosis, symptoms assessed, method of recall condition comparison, analytical approach), questionnaire characteristics (domains assessed, recall instruction, number of items, item framing, response options, score range, administration mode, time of responding, response rates) and recall condition effects on outcome scores as per instrument scales (score means and standard deviations for both recall conditions, and score differences between recall conditions). Where studies included the correlation of scores between recall conditions, this was only extracted where score differences were not reported. In studies where data were reported for more than one time period and results were not averaged, only baseline data were extracted.

Data extracted from the qualitative studies included participant and study characteristics, questionnaire characteristics (including recall instructions), study methods (whether the study was nested in a qualitative study), the study context (instrument development, instrument validation or clinical trial), the methodology (focus group, interviews), the interview technique (cognitive debriefing, concept elicitation, think aloud), the analysis approach and summary results as reported by the author. In addition, any participant quotes relating to the recall period were also extracted.

2.4 Study Quality

As there was no suitable single quality check list that could be applied to studies comparing recall period, study quality was assessed using a subset of the relevant criteria extracted from the COSMIN checklist for assessing the risk of bias of PROMs (see Supplementary Information ‘SI5.Quality Assessment’ for a full list of criteria used) [24]. To evaluate risk of bias in quantitative studies, we assessed aspects of structural validity (e.g., sample size adequacy and statistical methods) and reliability (e.g., test conditions and study design validity). Some study designs use a standard instrument with a non-standard recall period. While recall period adjustment may have interfered with instrument validity, this was considered preferable to using a different instrument with a different recall period.

For qualitative studies, we assessed the quality of the study design and analysis (e.g., sample size adequacy, probing techniques, and data analyses). There were no exclusion criteria based on quality indicators.

2.5 Data Analysis and Synthesis

Extraction and analysis for the quantitative and qualitative data were undertaken by JC and TP. Characteristics of the quantitative and qualitative studies were summarised from the extracted information including the clinical group, the outcome assessed, sample size and the main findings relating to recall period comparisons. For the quantitative studies, findings were summarised within domains (physical function, pain, cognition, psychosocial wellbeing, fatigue & sleep), overall scores and aggregate measures of disease-specific signs. Within each, extracted scores from the seven-day versus one-day recall (mean of daily scores, maximum of daily scores and the same-day score) were assessed to identify if there were differences and whether these were statistically significant. For the qualitative studies, the extracted information on preferences or views related to the seven-day versus one-day recall period were summarised descriptively within similar domains to the quantitative studies. The extracted quotes were coded thematically.

Comparison of recall period effects on instrument scores were not meta-analysed due to high levels of variability in patient groups, instruments, and methods of data collection and analysis. Instead, the statistical results of recall condition comparisons were synthesised into summary tables to gain insight into the presence of trends and systematic differences within assessment domains.

To help present a broad visual overview of any trends in the direction of the differences between scores differences were flagged where one-day scores (for mean of the daily score or the same-day score) were 10% lower for symptoms than weekly scores (10% higher for quality of life or functioning, re-scaling where necessary to start the scoring from zero). A convenient level of 10% was chosen to communicate a difference between scores, given the many different instruments and scales included.

To facilitate comparisons, on the summary table flagged results are coded green. Studies that found the opposite direction of difference, with daily scores higher for symptoms (lower for quality of life) than weekly scores are coded in red. Studies which found daily scores to be lower for symptoms (higher for quality of life) but with a difference of under 10% in the score or for which a percentage change from the weekly score is not possible to calculate (e.g., scores represented as T scores) are coded in amber. No differences between the maximum daily score and the seven-day scores were flagged as this represents a different type of comparison.

Conclusions drawn from qualitative studies assessing patient perspectives on the suitability of a one-day versus seven-day recall period for PROMs and HRQoL instruments were integrated into a narrative synthesis. Qualitative findings were summarised on a table showing each study’s conclusion of their respondents most preferred recall period (one day, seven days or more than seven days) and whether this was drawn from close-ended questions (which asked for endorsement of a given recall period) or open-ended questions in which multiple given recall periods were discussed or questions were asked about the ideal recall period in that context.

3 Results

3.1 Search Results

In total, 945 records (excluding duplicates) were identified, and the titles and abstracts were screened. Full text versions were retrieved for 82 articles, of which 57 were eligible for inclusion. Of these, 24 reported quantitative comparisons of one-day versus seven-day recall scores. The remaining 33 studies reported patient perspectives of optimal recall duration and were included in the narrative synthesis of qualitative studies. Figure 1 shows the flow of studies through the review and reasons for exclusion.

Fig. 1
figure 1

Adapted from Page et al. [20]

Identification and selection of quantitative and qualitative studies for this review.

3.2 Characteristics of Included Studies

The quantitative and qualitative studies included in this review assessed adults with a diverse range of clinical conditions and from the general population (see Supplementary Information S12 and SI3).

3.2.1 Characteristics of Quantitative Studies Included

A total of 4701 participants were included across the 24 quantitative studies assessed. Sample sizes of individual studies ranged from 32 to 800 participants (median = 113; mean = 196 [SD = 206]), with 57.9% of the total sample being women. Most (23 of 24) studies included only adults aged ≥ 18 years, while one study [25] included a blended sample comprising 34% paediatric participants aged between 12 and 18 years. Most (22 of 24) studies included participants diagnosed with a clinical condition, while four studies included individuals selected from the general population [26,27,28,29].

The instruments used to evaluate the effect of recall duration assessed either the signs, symptoms and impacts of a disease and its treatment, or quality of life generally. Instruments were mostly disease-specific (21 of 24, e.g., Psoriasis Signs and Symptoms Diary), but also included generic HRQoL instruments (3 [30,31,32] of 24, e.g., EQ-5D). Outcomes assessed in participants selected from the general population included pain; [27,28,29, 33] fatigue [27, 28, 33]; emotional states [27,28,29]; and physical functioning [26].

Recall instructions for the one-day recall condition included “today” or “during the day” (3 of 24 studies), “over the last (or past) 24 hours” (17 of 24), and “over the last (or past) day” (4 of 24). Recall instructions for the seven-day recall condition included “over the last (or past) seven days” (15 of 24) and “over the last (or past) week” (9 of 24). Most (17 of 24) studies used the same instrument adjusted only for recall period instruction.

The data collection period and number of assessments conducted differed between studies. Study data collection periods ranged from 1 [26, 29, 31, 34,35,36,37] to 100 [38] days with the number of daily recall assessments also ranging from 1 to 100, and weekly recall assessments from 1 to 14. Questionnaire administration formats included paper forms (10 of 24) [30, 31, 36,37,38,39,40,41,42,43], online (9 of 24) [23, 25,26,27,28,29, 32, 33, 38], electronic tablet or palm pilot (4 of 24) [21, 34, 35, 44], or by telephone (2 of 24) [45, 46]. Overall, response rates for weekly recall questionnaires ranged from 52% [38] to 100% [26, 29, 31, 34] (based on the lowest reported rate within each study: median = 94%, mean = 90% [SD = 13]), and response rates for daily recall questionnaires ranged 52% [38] to 100% [26, 29, 31, 34] (median = 95%, mean = 89% [SD = 14]). Nine of 24 studies did not report weekly questionnaire response rates [23, 30, 35,36,37, 40, 42, 43, 45] 6 of 24 studies did not report daily questionnaire response rates [23, 30, 35,36,37, 45].

Three methods were used to index one-day recall scores for comparison with seven-day recall scores. These different approaches are shown in Figure 2b. First, in 11 of 24 studies [21, 23, 28, 32, 33, 39, 41, 42, 44,45,46], daily recall scores were averaged over seven consecutive days and compared to the seven-day recall score reported on the final assessment day (i.e., “mean” index). Second, for 6 of 24 studies [21, 23, 25, 38, 42, 46], the single highest daily recall score reported over seven consecutive days was compared to the seven-day recall of maximum (i.e., most severe, or worst) symptoms across the week (i.e., “maximum” index). Third, in 9 of 24 studies [21, 30, 31, 34,35,36, 40, 43, 78] one-day recall scores were compared to scores for the seven-day recall instrument issued on the same day (i.e., “same day” index). One study [27] compared two separate days in which seven-day and one-day recall were asked in a random order to half the sample and compared; this was classified as ‘same-day’ index.

Fig. 2
figure 2

Quantitative comparison of daily and weekly recall scores. In most studies, weekly and daily symptoms were assessed over the same consecutive seven-day period (part a in the figure). Three methods of daily symptom indexation were used in the quantitative comparison of daily versus weekly recall scores (i) the mean of daily recall scores over the seven days (dmean), (ii) the maximum of the daily recall scores (dmax), and (iii) the score reported on the same day as the weekly recall score (d7) (part b in the figure). In some studies, the data collection period was extended beyond seven days to calculate an average of the chosen indexation method (part c in the figure)

For these three methods of recall period comparison, if the one-day recall condition did not differ significantly from weekly recall scores across the sample on average, then the recall period was assumed to not have had a statistically significant effect on patient-reported outcomes. In some studies, the data collection period was extended beyond seven days to calculate an average of the chosen indexation method (see Extended Data Collection Schedule in Fig. 2). For example, in studies that assessed symptoms over 28 consecutive days, the weekly recall score was calculated by averaging across the four consecutive weeks of data collection (i.e., mean of W1, W2, W3, and W4). For the mean daily symptom index, the mean daily score was averaged from Day 1 to Day 28. For the maximum daily symptom index, the maximum daily score for each week was averaged over the four weeks. For the same-day symptom index, scores were averaged across Days 7, 14, 21, and 28.

Some studies report only the intraclass correlation (ICC) between scores; where this was the case, using guidelines from Koo and colleagues for ICCs [47] we judged 0.5–0.75 as moderate agreement, 0.75–0.9 as good, and above 0.9 as excellent.

In 2 [26, 27] of 24 studies, one-day and seven-day recall scores collected on different respondents were assessed for Differential Item Functioning (DIF) within an Item Response Theory (IRT) framework [27]. This method considers whether the responses to items using different recall periods are predicted equally well by knowledge of the underlying construct of interest (e.g., estimated level of pain or mobility).

Overall, across the 24 quantitative studies identified, the unique combinations of clinical condition (e.g., type 2 diabetes, psoriasis), symptom domain (e.g., physical functioning, psychosocial wellbeing), symptom descriptive (e.g., frequency, severity/intensity, impact/interference), and daily recall comparison method (e.g., mean, maximum, same-day, DIF) gave rise to 158 unique results for data extraction.

Most of the 24 quantitative studies reviewed were considered of reasonable quality with only minor methodological flaws (see Supplementary Information ‘SI5. Quality Assessment’). Three studies used different instruments or items to assess the recall condition. Most studies did not control for the effect of repeated questionnaire administration or recall period order. In the studies that did control for effects of repeated administration through study design, participants completed the daily questionnaires and weekly questionnaires across separate time periods, with participants randomly allocated to the order in which they receive each recall period. Only four studies randomised participants to recall period order. For the nine studies comparing one-day recall scores with seven-day recall scores reported on the same day, 44% (4 of 9) assessed one-day recall scores after repeated administration, while the remaining assessed one-day recall scores from only a single questionnaire administration.

In half of the studies (12/24) the sample size was judged inadequate to support statistical analyses. Test conditions were similar between environments in most studies; one study did not have similar test conditions and nine had some uncertainty, mostly relating to the evidence provided on the time of day in which questionnaires were completed.

3.2.2 Characteristics of Qualitative Studies Included

In total, 1244 participants were included across the 33 qualitative studies reviewed. Sample sizes of individual studies ranged from 7 [48] to 207 [8] participants (median = 25; mean = 39 (SD = 41). Of the 33 qualitative studies reviewed, five assessed fatigue and sleep-related symptom, three assessed pain-related symptoms, and one assessed physical functioning, eight assessed HRQoL and 17 assessed disease-specific signs and symptoms.

Qualitative methods included: one-on-one interviews (91%, 30 of 33), focus groups (21%, 7 of 33), and online survey (3%, 1 of 33). Data collection methods included: cognitive debriefing (76%, 25 of 33), concept elicitation (76%, 25 of 33), “think aloud” (15%, 5 of 33), and Delphi consensus (3%, 1 of 33).

Detailed responses to the COSMIN checklist criteria [24] used to assess study quality are provided in the Supplementary Information ‘SI5. Quality Assessment’. Most of the qualitative studies reviewed were considered as high quality. All 33 studies used appropriate qualitative study methods (e.g., individual interviews, focus groups, Delphi survey); 48% (16 of 33) of studies used open-ended probing techniques to elicit participant perspectives of recall duration. In contrast, 52% (17 of 33) studies used closed-ended probes to assess participant endorsement of a predetermined recall period, which may have been subject to framing effects. 97% (32 of 33) studies were conducted with an appropriate number of participants according to the COSMIN criteria (i.e., N ≥ 7 [24]), while one study was not conducted in an adequate sample size (N = 2 [8]).

For the 32 studies that involved participant interviews or focus group, 41% (13 of 32) indicated the use of skilled moderators or interviewers; however, the majority (59%, 19 of 32) provided no indication of interviewer training or expertise. All 32 studies that involved participant interviews or focus groups indicated using an interview guide, and the majority (94%, 30 of 32) indicated audio recording and verbatim transcription of interviews. Most studies (31 of 33) used appropriate analysis techniques (e.g., thematic or content analyses), and 59% (19 of 33) clearly indicated involvement of at least two researchers in analyses.

3.3 Assessment of Recall Duration Effects

3.3.1 Physical Functioning

Eleven studies compared one-day and seven-day recall on instruments assessing physical functioning providing 20 unique results for data extraction (see Table 1). For the nine results using the mean daily recall indexation method, the majority (7 [21, 33, 34, 39, 44]) found weekly recall scores were lower than mean daily recall scores and 2 [27, 28] found no evidence of a significant difference. The single result using the maximum daily recall indexation method found that weekly scores were less than maximum daily recall scores [21]. Nine of the 10 results using the same-day recall indexation method found no significant difference between weekly and same-day recall scores [26, 31, 34, 35, 37, 37] with one study finding that the same-day score was lower [21].

Table 1 Study results assessing the effect of a 7-day versus one-day recall period on patient-reported outcomes

Where the daily scores are at least 10% lower than the weekly score (re-scaling where necessary to start the scoring from zero) for health problems or 10% higher for quality of life (excluding comparisons based on maximum problems) results are colour coded as green, regardless of significance level. Coral indicates less than 10% difference in recall duration score in the same direction, or comparisons in the same direction but for which a percentage increase from the weekly score is not possible to calculate (e.g., scores represented as T scores). Orange flags results showing the reverse relationship.

3.3.2 Pain-Related Symptoms

Sixteen studies compared one-day and seven-day recall on instruments assessing pain symptoms with 37 unique data extraction points (see Table 1). For the 24 results using the mean daily recall indexation method, the majority (79%, 19 of 24 results) found weekly recalled scores were higher than mean daily recalled scores for pain-related symptoms; 21% (5 of 24 results) found no evidence of a significant difference. For the single study that assessed correlations between weekly and mean daily recall scores, a moderate association was identified [43]. For the seven results using the maximum daily recall indexation method, majority (4 [23, 38, 46] of 7 results) found weekly recalled scores were lower than maximum daily recalled scores. The remaining 3 [23, 42, 46] results found no evidence of a significant difference between weekly and maximum daily recall scores. Of the 5 results using the same-day recall indexation method, 2 [34, 36] found same-day recall scores to be lower than weekly recall scores, 2 found no significant difference [29, 37], while 1 [35] identified a positive (excellent) correlation between same-day and weekly recall scores.

3.3.3 Cognition-Related Symptoms

Five studies compared one-day and seven-day recall on instruments assessing cognition-related symptoms, providing eight unique results for data extraction (see Table 1). For the three results using the mean daily recall indexation method, one found weekly recalled scores were higher than mean daily scores for concentration difficulties [39] but the remaining two results (drawn from one study [27]) found no evidence of a significant difference. The three results using the same-day daily recall indexation method [27, 38] found no evidence of a significant difference between weekly and same-day recall scores for difficulties in remembering and understanding. For the two results using the maximum daily recall indexation method (both drawn from the same study [38]), one found weekly recalled scores were lower than maximum daily recalled concentration problems while the other found no evidence of a significant difference between weekly and maximum daily recalled memory problems.

3.3.4 Psychosocial Wellbeing

Thirteen studies provided 51 unique results comparing one-day and seven-day recall on instruments assessing aspects of psychosocial wellbeing (see Table 1). For the 22 results using the mean daily recall indexation method, the majority (14) found weekly recalled scores were lower than mean daily recalled scores and eight found no evidence of a significant difference between weekly and mean daily recalled psychosocial symptom scores. All 10 results using the maximum daily recall indexation method, found weekly recall scores were lower than maximum recall scores for psychosocial symptoms. Majority (14 of 19) of results using the same-day daily recall indexation method found no evidence of a significant difference between weekly and same-day recalled psychosocial symptom scores, while five found weekly recalled scores were higher than same-day recall scores [21, 29, 34]. Three of the same-day to weekly comparisons involved items which were framed positively, two (happy, excited) followed the pattern of weekly scores being higher than the daily recall score, but the item asking about feeling ‘calm’ showed daily recall as greater than weekly, although all three differences were not significant.

3.3.5 Fatigue and Sleep-Related Symptoms

Thirteen studies provided 25 unique results comparing one-day and seven-day recall on instruments assessing sleep-related symptoms (see Table 1). For the 14 results comparing daily recall scores averaged over seven consecutive days with seven-day recall scores, majority (13) found weekly recall scores to be higher than mean daily recall scores. The single study using DIF to assess recall period effects identified non-systematic item-level differences between weekly and daily recalled fatigue frequency scores [27]. All six results comparing the maximum daily recall with weekly recall scores found maximum daily scores to be higher than weekly recall scores. No significant effect of recall period was found for the three results comparing the daily recall score with seven-day recall scores reported on the same day. Two studies assessed correlations between same-day and weekly recall scores: one identified a negative (good) correlation between same-day and (oppositely scored) weekly recalled sleep adequacy scores [40], while the other identified a positive (excellent) correlation between same-day and weekly recalled pain interference with sleep [35].

3.3.6 HRQoL Scores

Three studies provided five unique results comparing one-day and seven-day recall on instruments assessing HRQoL (see Table 7) [30,31,32]. The one study comparing mean daily recall scores (using the Short Form 6 Dimensions [SF-6D [49]] measure of utility) averaged over seven consecutive days with seven-day recall scores found that weekly recall HRQoL was significantly lower than mean daily recall scores [32]. Two studies comparing daily HRQoL scores assessed on the same day as seven-day HRQoL scores. In one study, controlling for non-recall instrument differences (EQ-5D with a recall of ‘today’ vs Health Utilities Index 2 and 3 [HUI-2 and HUI-3 [50]] with a recall of last week), weekly recall score was less than daily recall in participants with advanced HIV where patients had an unresolved event during the week [30]. In the other study, no significant difference was identified in participants with brain metastases using Functional Assessment of Cancer Therapy Brain (FACT-Br), or the FACT-General with different recall periods.

3.3.7 Aggregate Measures of Disease-Specific Signs and Symptoms

Seven studies compared one-day and seven-day recall on instruments assessing aggregated disease-specific sign and symptom scores, providing 12 unique results for data extraction (see Table 1) [21, 23, 25, 31, 34, 40, 43]. For the four results using the mean daily recall indexation method [21, 23, 25, 43], two [21, 25] found that weekly recall scores were lower than mean daily scores, while one [23] found no evidence for a significant difference between weekly and mean daily recall scores. One result using a correlational approach identified an excellent positive association between weekly and mean of daily recall scores [43]. All three results using the maximum daily recall indexation method found that weekly scores were lower than maximum daily scores [21, 25]. For the five results using the same-day daily recall indexation method, two [23, 34] found no significant difference between mean and same-day recall scores, while one [21] found that same-day scores were less than weekly recall scores. Two results using a correlational approach identified a negative (moderate and good) association between an instrument using weekly recall and a different instrument, oppositely scored, using same-day recall scores [40].

3.4 Participant Recall Period Preferences

Of the 33 qualitative studies reviewed (see Table 2), 18 assessed disease-specific signs and symptoms, 9 assessed HRQoL, 5 assessed fatigue and sleep-related symptom, 3 assessed pain-related symptoms, and 1 assessed physical functioning. Most studies (55%, 18 of 33) used closed-ended probes to assess participant perceptions of the suitability of a designated recall period, while 45% (15 of 33) of studies used open-ended probes to elicit participant recall period preferences.

Table 2 Recall period preferred by majority of participants in qualitative studies

Of the 18 studies assessing questions on disease specific signs and symptoms 3 found that respondents expressed different preferences depending on context, with a preference for seven-day recall for symptom impact but one-day recall for symptom severity. The remaining 15 reported broadly equal preference for seven-day recall (8/15) as one-day recall (7/15).

Two of the three studies assessing pain-related symptoms found a preference for a seven-day recall period. The single study assessing physical functioning via work productivity found a preference for a seven-day recall period. A majority of studies (80%, 4 of 5) assessing fatigue and sleep-related symptoms found a preference for a seven-day recall period. Of those included studies considering measurement of the impact on HRQoL, a longer time period was preferred, with more studies (3 out of 9) preferring seven-day recall than one-day (1 out of 9) and others preferring period greater than seven days (4 out of 9) or having no clear preference (1 out of 9).

A number of themes were identified in these studies, i) duration should capture important effects, ii) accuracy of recall, iii) preference for unambiguous language and iv) adherence to the stated recall period.


i) duration should capture important effects

The seven-day recall was considered more appropriate for measuring symptoms in subjects with relatively stable symptoms, while those with variable symptoms or undergoing treatment and expecting rapid change may need the shorter one-day recall period to accurately reflect changes in symptoms [51, 52]. Discussions indicated an assumption that one-day recall instruments would be repeatedly administered, with respondents raising the issue of burden of completing the questionnaire on consecutive days [53, 54].

Where single administration was implied, some participants favoured the longer time period, which could be more representative of their overall experience, “I just think you’ll get a bigger picture by looking at it over a course of a week" [55]. In reference to varying asthma symptoms one participant said, “You have a chance at remembering how you felt on average, because you can have bad days and you can have good days” [56]. The seven-day recall was preferred by some participants for quality-of-life measurement because not all impacted activities occur on every day of the week [57]. Some participants also expressed concern that a seven-day recall might be too short, and not adequate to reflect their symptoms where impactful events occurred at intervals greater than one week [58, 59].


ii) accuracy of recall

Some participants acknowledged the ease of recalling over one-day “24 hours I can really, really remember how bad my itching was and you get more of a bam, to the point, to a real good timeframe” [61]. Others did not find the seven-day recall problematic. “I did not find any great difficulty [recalling the past 7 days]. At first, you have to put yourself back into the situation and look back at the 7 past days. It simply requires a few seconds to remember” [62]. Participants indicated recall accuracy as a concern only for recall periods greater than one week (e.g., 4 weeks [63]). One participant expressed a preference for using one-day recall to measure quality of life due to daily activities and stressors potentially interfering with accurate memory – “I think using “today” is better, I had a hectic week last week, I went to a funeral, I had other things, I was a bit anxious” [60].


iii) preference for unambiguous language

Some participants indicated a preference to revise the 24-hour recall instruction to “since waking” to disqualify consideration of time while sleeping [64]. Weekly recall instructions were sometimes misinterpreted as the last previous full week (e.g., from Monday to Sunday) [65], or the 5-day working week [66]. Therefore, an explicit seven-day recall instruction was considered preferable to mitigate potential recall period misinterpretations [67].


iv) adherence to the stated recall period

Participants described processes that underpinned their interpretation of recall period instruction, including interpreting health “today” as meaning health generally [67]. Thus, participants reported overlooking temporary issues experienced on the day of reporting to provide a representative picture of their health state (not over the last 24 h per se) [8].

4 Discussion

This systematic review examined the effect of a one-day versus seven-day recall duration on PROM and HRQoL instrument scores in adults with a range of clinical conditions. Across the 24 quantitative studies identified, 158 unique results were identified. Overall, compared to the average symptoms reported with a 24-h recall over seven days, a seven-day recall mostly predicted worse symptoms and worse HRQoL for a range of clinical conditions.

Symptoms tended to be reported as more severe when assessed with a weekly recall than with a one-day recall averaged over the same period (76%, 58 of 76 results [two were only reported as correlation and not included in this total]); however, this difference was not statistically significant for 24% (18 of 76 results). This pattern was similar for comparisons based on the same-day reporting although a smaller percentage of results showed a significant difference 26% (12 of 46 [five were only reported as correlation and are not included in this total]). The weekly recall period tended to report lower symptom severity (i.e., better health) than the maximum of the daily score over the seven-day period 86% (25 out of 9 results), with the remaining 4 not finding a statistically significant difference.

The three findings on HRQoL instruments used to estimate utility scores [30, 32] suggest weekly recall period leads to lower utility values than daily recall, particularly if negative events occurred during the previous seven days, which had been resolved.

The results reporting symptoms and HRQoL comparing mean of one-day recall across 7 days or the same day with the weekly recall 53% (35 of 66) find a one-day recall score that is at least 10% lower for symptoms or 10% higher for HRQoL (the green shading on Table 1) than the weekly recall score, and 89% (59 of 66) find one-day recall reporting lower problems or higher quality of life and only 6% (4 of 66) finding the opposite.

Within qualitative studies, participants identified four themes. First, ‘duration should accurately capture effects’ and preferred recall period varied depending upon the symptom and impact variability and the frequency of measurement. This aligns with findings in the review by Stull and colleagues [68] that there is no “one size fits all” ideal recall period. Second, ‘accuracy of recall’—although participants acknowledged the ease of the one-day recall they also had minimal concerns with accuracy of the seven-day recall. Third, participants expressed a ‘preference for unambiguous language’ when describing both recall periods. Finally, some participants noted a failure to ‘adhere to the recall period’ particularly for the framing of ‘today’, which they interpreted as health generally.

This review was intentionally limited in scope to a targeted comparison of a one-day versus seven-day recall period. Therefore, it does not consider longer recall periods that may be more suitable for chronic or variable conditions [56]. Information relevant to the understanding of recall duration effects may have been omitted through the exclusion of studies comparing other recall periods or symptoms reported using EMA. The PROSPERO-registered protocol was deviated from during the full-text screening to exclude studies using EMA to derive an index of daily recall scores, which was considered to not directly reflect one-day recall processes.

The review drew on different methods of exploring the impact of recall period, synthesising findings across many clinical conditions, different outcomes assessed, and different data collection and analysis techniques. The consistency of the findings amid this variability supports triangulation of our main findings.

4.1 Limitations of this Review

The search terms used did not exhaust all possible terms. For example, we did not include terms relating to ‘diaries’ which may have identified more one versus seven-day recall comparisons but would have reduced the precision of the search.

Other limitations of this review relate to the methodological flaws of included studies, such as inadequate control for the effect of repeated questionnaire assessments and the limited statistical power of between-group comparisons made within small samples. Similarly, the few studies using a comparison of two different instruments for the one-day and seven-day recall periods is likely to have introduced measurement artifacts that may have confounded inferences regarding recall duration effects specifically. The qualitative studies reviewed were limited by closed-ended probing techniques, which may have restricted participant considerations of preferred recall duration.

Assessing the content validity of PROM and HRQoL instruments is inherently limited by the absence of a gold standard marker of patient experience against which recall period effects can be reliably distinguished. More broadly, the quantitative studies assessed in this review do not provide insight into the cognitive mechanisms and recall period actually utilised by participants when considering their health. Additionally, some studies reviewed suggest that people may reinterpret recall period instructions when responding, for example, interpreting ‘today’ as meaning health generally [67].

The potential for differences between seven-day versus one-day responses to arise due to selection effects based on when respondents are willing or able to complete questionnaires has not been well explored. If the last seven-day period includes days in which the respondent would not have engaged in questionnaire completion due to high level of symptoms (e.g., feeling depressed) this would generate the pattern found here for the same-day index comparison in which the seven-day recall reports poorer health levels. Similarly, if missing daily reports during the past seven days occur on days with relatively higher level of symptoms and comparisons are made on incomplete data, this would also generate the pattern found here for the mean of one-day recall versus seven-day recall comparison. Such selection effects may be particularly problematic for conditions effecting motivation such as mental health conditions.

4.2 Future Research

High-quality, sufficiently powered studies that account for repeated questionnaire administration are required to measure the effect of a one-day versus seven-day recall period in PROM and HRQoL instruments. Mixed methods study designs incorporating both quantitative comparison of scores and qualitative exploration of participant recall processing may confer insight into the cognitive mechanisms underpinning potential recall period effects. Of the 57 studies included in this review, only one study assessed recall duration effects in participants with a mental health condition (i.e., Major Depressive Disorder [69]). The absence of psychometric studies assessing the effect of recall duration for psychiatric symptoms and conditions could be addressed in future research.

This review identified few results which compared the recall period for positively framed items. The only inclusions being from one study based on three items: happy, excited, and calm. Although the HRQoL instruments are scored positively (higher score shows better quality of life) they rely upon items reporting health problems using mostly negatively framed items. The results for the recall period on positive items, although not significantly different between recall period, are interesting in that items on feeling happy and excited suggest a higher score for weekly report, but not for calm. The interaction between item framing, arousal and recall period could usefully be explored in future research.

The variability in samples and instruments used in this review meant that results could not be pooled, and the magnitude of the impact of recall period remains uncertain. Of the 66 results reporting symptoms and HRQoL comparing mean of one-day recall across seven days or the same day with the weekly recall, the majority (89%) found that one-day recall showed fewer problems or a higher quality of life, although not all these individual findings showed a statistically significant difference. Whilst the direction of difference in recall period is clear, further research could usefully estimate the size of this recall effect more accurately.

5 Conclusion

This review identified a pattern of higher symptom scores and worse quality of life being reported for a seven-day compared to a one-day recall period on PROMs and HRQoL instruments. The review also identified anomalies in this pattern for two positively framed wellbeing items and a need for further research on recall effects in positively framed items. A better understanding of the impact of using different recall periods within PROMs and HRQoL instruments will help contextualise future comparisons between instruments which adopt different recall periods.