Introduction

Lack of physical activity is one of the primary risk factors for cardiovascular disease, which causes an estimated 32% of deaths worldwide [1]. To support people in becoming more physically active, goal-setting is commonly used as it provides motivation and helps to stay focused on a desired outcome [2, 3]. As such, goal-setting is often part of eHealth applications (e.g., [4]), which have the potential to make behavior change support more effective and widely available [5,6,7]. However, for goals to be effective, they need to meet several criteria such as being aligned with other goals a person has [8] and being realistic and achievable [9].

We thus wanted to incorporate the setting of effective goals in eHealth applications for behavior change. To this end, we designed a goal-setting dialog for running or walking with the virtual coach Jody, as virtual coaches can provide guidance where traditionally therapists would have. Virtual coaches can also foster adherence to eHealth applications by improving engagement and connecting with users [10]. In the goal-setting dialog, Jody asked users several questions to encourage the creation of SMART goals [11], which are goals that are specific, measurable, attainable, relevant, and time-bound. For example, Jody asked users why their goal was important to them based on a set of questions therapists use to find out what matters to their clients in their life [12]. Being helped to realize the reasons for wanting to achieve an outcome has previously been shown to lead to more personalized goals and better results [13].

Effective goals alone are, however, not sufficient for behavior change to be successful. This is because there is a variety of possible barriers relating to users’ capability, opportunity, and motivation [14], such as a lack of self-efficacy [2, 15, 16]. One way to increase users’ self-efficacy is to let them observe another person succeed [15], or, in other words, make vicarious experiences. According to social cognitive theory [17], there are four sources of self-efficacy for a behavior: mastery experiences, vicarious experiences, verbal persuasion, and somatic and affective states. Comparing these sources of self-efficacy, Warner et al [18] found that mastery experiences, vicarious experiences, and subjective perceptions of health had similar significant effects on exercise self-efficacy in a study with older adults. The meta-analysis by Ashford et al [19] further identified feedback on past or others’ performance as producing the highest levels of self-efficacy in the context of promoting lifestyle and recreational physical activity, followed by vicarious experiences. Since goal-setting typically first takes place at the start of a behavior change intervention when people may not yet have any personal experience with the target behavior, using vicarious experiences to foster self-efficacy is intuitively appealing.

While it can be helpful to observe another person succeed, it matters how relatable this “other” is: a more relatable “other” is beneficial [15, 20]. Finding such a relatable “other” can be difficult in practice for underserved groups [21] such as older adults [18]. Given vicarious experiences from diverse people, behavior change applications can, however, make use of user data to find relatable others. In our goal-setting dialog, Jody, therefore, showed personalized examples of other people who had achieved a running or walking goal.

To evaluate this dialog, we conducted a study in which 39 people set a running or walking goal with Jody. Based on this study, we tested the following hypotheses:

  • H1: People’s self-efficacy is higher after the dialog with the virtual coach than before.

  • H2: People’s self-efficacy is higher after receiving personalized examples than after receiving generic examples.

  • H3: The personalized examples are perceived as more motivating than generic examples.

  • H4: People have a positive attitude toward the virtual coach.

In addition, we inquired what people found motivating about the examples from other people.

Materials & methods

We conducted our study in March 2022. The Human Research Ethics Committee of Delft University of Technology approved the study (Letter of Approval number: 1707), and we preregistered the study in the Open Science Framework (OSF) [22].

Study design

The study was set up as a mixed-design study with one between- and one within-subject factor. The between-subject factor was the type of examples shown (2 levels: personalized/generic), and the within-subject factor was time (2 levels: pre-/post-measurement).

Materials

We used the Qualtrics platform to host the online questionnaires and a Google Compute Engine to host the virtual coach.

Virtual coach

The virtual coach Jody was implemented in Rasa [23]. It introduced itself as being there to help users set a goal for becoming more physically active. After describing the benefits of physical activity and goal-setting, Jody asked users to provide an initial idea for their choice of either a running or a walking goal. This was to reduce the anchoring effect [24] of the examples from other people. Next, users were shown two examples from other people and subsequently asked to re-formulate their goal as specifically as possible. Afterward, users were asked to reflect on the relevance and attainability of their goal as well as to provide a deadline for reaching their goal. In case of very low or very high attainability, users were suggested to adapt their goal to make it less or more challenging. Finally, Jody summarized users’ goals, upon which they could confirm their goal or change the behavior or its deadline. To increase the accessibility of the dialog, users could largely communicate by clicking on buttons with answer choices. A psychologist read through the dialog to ensure that the language and dialog structure were easy to follow. The implementation of the virtual coach [25] as well as a video of the dialog [26] can be found online.

Examples from other people

Each participant saw 2 out of 72 examples from other people. In a prior study, these examples were collected from 72 people, 4 each for 18 combinations of values for sex, age range, and weekly exercise amount. These people from the prior study were asked to 1) introduce themselves to a new person they meet at a friend’s gathering who is interested in their physical activity behavior, 2) describe a goal with regards to running or walking they achieved in the past year, and 3) describe how they achieved that goal. The latter was included because it can be motivating to read not only that somebody else achieved a goal but also how they managed to do so [27]. After the prior study, the collected examples were anonymized and corrected regarding spelling, punctuation, and grammar to allow them to be displayed in the goal-setting dialog. Fig. 1 shows how an example was presented in the dialog. The characteristics of the people who provided the examples are presented in Table 4 and all examples are available online [28].

Fig. 1
figure 1

Screenshot of part of the dialog showing an example of a person who achieved a running goal

Personalized Examples

People in the “personalized” condition saw examples that were predicted to be most motivating by a linear regression model. To create this model, we conducted a second data collection study on Prolific in which 36 individuals each rated 18Footnote 1 examples on how similar to themselves they considered the corresponding person and how motivating they considered the goal and how the person achieved it. Afterward, the prediction model was set up to predict the motivation rating based on the absolute difference in individual characteristics between the person providing an example and the person seeing an example (Fig. 2). We chose 10 out of 22 variables associated with the predictors of behavior capability, opportunity, and motivation [29] using stepwise regression and correlation analysis (Table 1)Footnote 2. Furthermore, as not only actual but also perceived similarity can play a role [30], we used as three additional prediction variables the similarity ratings for three clusters of people from the examples based on two prototype examples per cluster. The three clusters were computed based on the similarity ratings and the prototypes were the two most centered examples per cluster. These prototypes were not shown in the goal-setting dialog. The full model had a multiple \(R^2\) of 0.23 and is shown in Table 1.

Fig. 2
figure 2

Process of obtaining the prediction model used for choosing personalized examples. A 72 example people wrote about themselves and a goal they achieved. B 36 people each rated 18 examples on similarity and perceived motivational impact. The results from A and B were used to obtain the prediction model

Table 1 Multiple linear regression model used to predict motivation ratings

Generic Examples

Using the same motivation ratings as in the “personalized” condition, people in the “generic” condition received a random selection of two of the three overall most motivating examples (Table  6).

Measures

Running or walking self-efficacy

These were measured based on scales from 0 to 100, adapted from the Exercise Self-Efficacy Scale by McAuley [31] (see Online Resource 1).

Perceived motivational impact of examples

Participants rated examples from other people on how motivating they perceived them on a scale from -3 to 3. 0 was labeled as “Neutral”.

Acceptance of the virtual coach

We used an adaptation of the six questions by Provoost et al [32], each of which was rated on a scale from -3 to 3, with 0 labeled as “Neutral.”

Take-away from examples

Participants provided a free-text response to the question “What can you take away from these examples for yourself?” after seeing the two examples from other people in the dialog.

Motivational factors from examples

Participants gave a free-text response to the question “What do you find motivating about the running or walking goals that other people achieved?”

Transtheoretical Model (TTM)-stage for becoming physically active

Using the World Health Organization’s definition of physical activity [33] and guidelines on physical activity and sedentary behavior [34], we adapted the question by Norman et al [35] to physical activity to measure this stage of change.

Participants

We aimed for a sample size of 36 based on 1) a power analysis using G\(^*\)Power 3.1 [36] for ANOVA with repeated measures and within-between interaction leading to a sample size of 34 for a power of 0.8, an effect size of 0.25 (i.e., a medium effect size for ANOVA [37]), and an alpha of 0.05 and 2) wanting to recruit participants from 36 combinations of values for the variables smoking statusFootnote 3, sex, age, and weekly exercise amount. Eligible were people who were fluently English-speaking adults and who had not participated in the earlier data-gathering studies. 47 participants started the study and 39 were included in the analysis. Exclusion criteria are shown in Fig. 3. Participants were paid based on the minimum payment rules on Prolific (i.e., five GBP/hour).

Participant characteristics are shown in Table 2. Comparing Bayesian models with and without the condition as a predictor for each characteristic did not indicate systematic differences between the conditions for these characteristics.

Table 2 Participant characteristics for the two conditions

Procedure

The study consisted of 1) a pre-questionnaire, 2) a goal-setting dialog with the virtual coach in which people saw either generic or personalized examples, and 3) a post-questionnaire (Fig. 3). Participants who successfully completed the pre-questionnaire were invited to the dialog about one week later. These participants were randomly assigned to the two conditions for the examples shown in the goal-setting dialog. Using an adaptation of the algorithm by Xiao et al [39], we aimed to balance the two conditions with regard to smoking status, self-efficacy for the preferred goal type, and two ratings of perceived similarity for clusters of people from the examples. The dialog lasted about seven minutes, after which participants were asked to complete the post-questionnaire.

Fig. 3
figure 3

Study design including study components, gathered data, and exclusion criteria

Data preparation and analysis strategies

We created two index measures for the ratings of example types (i.e., personalized or generic) on perceived motivational impact as well as an index measure for the acceptance (Cronbach’s \(\alpha\) of 0.57, 0.56, and 0.87). Moreover, we reversed the scale of the TTM-stage for becoming physically active such that a higher value denotes a higher stage of change.

Instead of a frequentist analysis, we conducted a Bayesian analysis as it has been described as providing richer information [40]. We used Bayesian paired t-tests on the pre- and post-measurement of self-efficacy for H1 and the perceived motivational impact ratings for the two types of examples for H3. For H2, we used a Bayesian two-sample t-test on the change in self-efficacy between the pre- and post-measurement, and for H4 a Bayesian one-sample t-test comparing the acceptance to a neutral value of zero. We evaluated the posterior probabilities that the hypotheses are true using the guidelines by Chechile [41] and their extension to probabilities below 0.5 by Andraszewicz et al [42]. These guidelines frame posterior probabilities as “bets” one can place with varying levels of confidence. The analyses were conducted in R with the Bayesian First Aid package [43].

For exploratory purposes, we additionally used the Bayesian First Aid package to compute the Pearson correlations between the TTM-stage for becoming physically active on the one hand and the change in self-efficacy between the pre- and post-measurement, motivational impact ratings for the two types of examples, and the acceptance on the other hand.

We further conducted a qualitative analysis of people’s take-aways from the examples they saw as expressed in the dialog and what they found motivating about the examples as described in the post-questionnaire. Triangulating multiple sources serves to increase the validity of qualitative research [44]. Following the thematic analysis steps by Braun and Clarke [45], BH first familiarized herself with the data before creating a draft coding scheme. She discussed the coding scheme with NA to arrive at a final coding scheme with nine codes. BH then coded all responses, using multiple codes if relevant. We found substantial agreement based on double-coding (Cohen’s \(\kappa\) = 0.79) [46]. The responses from one participant were afterward excluded from further analysis because the participant shared sensitive information. The codes assigned by BH were subsequently used to identify themes.

All data and analysis code are available in the online repository accompanying this article [47].

Results

Figure 4 shows that participants on average had a lower self-efficacy after than before the dialog. Quantifying this through our Bayesian analysis shows that the mean drop in self-efficacy was 12.38, which leads to a posterior probability of 0.002 that the self-efficacy is higher after the dialog than before (Table 3). This can be evaluated as a very strong bet against H1. Moreover, contrary to H2, Fig. 4 does not show a higher but rather a somewhat lower self-efficacy in the “personalized” condition compared to the “generic” condition. Our Bayesian analysis suggests that it is not worth betting against H2 based on a posterior probability of 0.34 that H2 is true (Table 3). The perceived motivational impact of the personalized examples is, however, by on average 0.31 scale points higher than the one of the generic examples (Table 3). This leads to a posterior probability of 0.93 that H3 is true, which can be evaluated as a promising but risky bet. Regarding H4, the mean for people’s acceptance of Jody is 1.54, with the corresponding 95% credible interval ranging from 1.19 to 1.90 (Table 3). Based on a posterior probability of >0.99995, it is virtually certain that H4 is true.

Our exploratory analysis further provides no strong indication for correlations between the TTM-stage on the one hand and the change in self-efficacy and the motivational impact ratings for the two types of examples on the other hand (Table 7). For the acceptance, however, we see a small correlation [48] of 0.22 with a corresponding posterior probability of 0.92 that the correlation is greater than 0. Thus, we obtain a promising but risky bet that people in higher stages of change have a more positive attitude toward Jody.

Fig. 4
figure 4

Self-efficacy before and after the dialog with the virtual coach for both conditions

Table 3 Results of Bayesian analyses for the four hypotheses

Figure 5 shows the four themes for what people found motivating about examples from other people with sub-themes and participant quotes. People were motivated by 1) specific, achievable, and challenging goals, 2) people who enjoyed their path to the goal, did not give up, achieved a goal step by step, and stayed consistent, 3) examples from people they could relate to, and 4) goal achievement itself. There are no clear differences between the conditions for the take-away responses (Fig. 6).

Fig. 5
figure 5

The four themes for what people found motivating about examples from other people with sub-themes and participant quotes

Discussion

The presented study examined the use of personalized vicarious experiences in a goal-setting dialog for physical activity with a virtual coach. Our results lead to a very strong bet against people’s self-efficacy being higher after than before the dialog. This is contrary to our expectations that the dialog and the vicarious experiences therein would increase self-efficacy [2, 15]. One possible explanation for these unexpected results is the Dunning-Kruger effect [49], according to which people with little experience or knowledge regarding a task tend to overestimate their competence. Thus, after the pre-measurement, thinking about their physical activity goal and ability to achieve it may have given people more knowledge about the task. This may have led people to judge their competence more realistically, in this case lower. Similar effects have been observed by Kang et al [50] and Rowland et al [51]. More precisely, participants had lower confidence to give lectures on elementary arithmetic and lower physical activity self-efficacy, respectively, after an intervention with vicarious experiences than before.

Moreover, mirroring the findings by Kang et al [50] that the drop in self-efficacy was stronger for a condition with more personalized vicarious experience, we also saw that it is not worth betting against personalized examples leading to a higher increase in self-efficacy after the dialog than generic examples. This suggests that personalized experiences allowed people to get a more realistic picture of the task and their competence than generic ones. Interestingly, Kang et al [50] also found that the lower post-measurement of self-efficacy for the more personalized condition was more predictive of self-efficacy and perceived performance after a subsequent lecture than the post-measurement for the less personalized condition. As a more realistic assessment of one’s abilities may help to set goals that are attainable, which is important for goals to be effective [11], these findings suggest that one should not only consider how high self-efficacy is, but also how realistic the assessment of abilities is that self-efficacy is based on. Whether personalized examples indeed lead to the setting of more attainable goals is an interesting question for future work.

Comparing personalized to generic vicarious experiences with regard to their perceived motivational impact, our data lead to a promising bet that personalized experiences are more motivating than generic ones. In light of our self-efficacy findings, the motivating effect was possibly not strong enough to also influence self-efficacy positively. To increase the effectiveness of the personalization, the prediction model with a multiple \(R^2\) of only 0.23 might also need improvement. This might be done by incorporating other individual characteristics such as culture [52]. Notably, however, we have already tested 22 characteristics that are in general associated with the predictors of behavior capability, opportunity, and motivation. Hence, it may be worthwhile to focus on the content of the experiences rather than individual characteristics, which can be compared to taking a content-based rather than a collaborative filtering approach to recommender systems [53]. The results of our thematic analysis of what people found motivating about the experiences they saw can serve as a basis for this.

For our last hypothesis, we saw that people’s attitude toward the virtual coach Jody is virtually certainly positive. This shows the potential of Jody for supporting people in setting effective goals, as the aspects covered in the attitude assessment such as a good relationship [54] and ease of use [55] are beneficial. However, given that people only had a single interaction with Jody, a novelty effect could have contributed to this positive attitude [56, 57].

Besides the low multiple \(R^2\) of our prediction model used for the personalization of vicarious experiences, several further aspects warrant more research. First, some participants were in the precontemplation and maintenance stages of behavior change and thus either not yet interested in becoming or already physically active (Table 2). We could hence speculate that the effect of the goal-setting dialog and the personalized vicarious experiences therein is stronger for a population who are either contemplating or preparing to change. The small positive correlation between the stage of change and the acceptance of Jody provides some support for this. However, we did not find much evidence concerning the change in self-efficacy and the motivational impact ratings.

Second, personalization effects might be stronger for people who are intrinsically motivated than for people who have a monetary incentive to participate. Specifically, since social judgment theory posits that a higher degree of involvement is associated with a larger latitude of rejection in which ideas are seen as objectionable [58], intrinsically motivated people may be more likely to “object” to generic examples. Third, we only assessed the effect of the dialog and the experiences on predictors of behavior, namely, self-efficacy and reflective motivation [29]. Yet, the dialog and experiences could also affect other predictors of behavior such as automatic motivation (e.g., impulses, reflex responses) [29]. It would, therefore, be interesting to assess how behavior itself is affected. Thereby, one needs to keep in mind that it may take some time before the setting of goals affects behavior as people think about how to reach their goal [59]. Adding support for such planning (e.g., [, 60, 61]) would also be valuable, as especially people with low self-efficacy may find it difficult to come up with an effective plan [2].

Conclusion

This study examined the effect of a goal-setting dialog and specifically personalized vicarious experiences in the context of a virtual coach for becoming more physically active. Vicarious experiences were provided by showing examples of how other people reached physical activity goals. The findings suggest that people see personalized examples as more motivating than generic ones and that people had a positive attitude toward the virtual coach. Moreover, contrary to what we hypothesized, the dialog negatively affected people’s self-efficacy. Our data also provide some support that this negative effect was stronger for the personalized examples. These findings warrant further research on how self-efficacy can be improved in combination with goal-setting dialogs. It should thereby be taken into consideration whether these lower post-measurements of self-efficacy are associated with more accurate self-assessments of abilities.