1 Introduction

Private supplementary tutoring (PT) refers to the provision of paid, organised, and structured lessons on academic subjects, outside of formal school, while mimicking the formal school curriculum (Bray, 2009; Kuan, 2011; Zhang & Liu, 2016). PT has been reported as being especially popular in East Asian countries, such as China, Japan, and Singapore, partly because of the tense competition in examinations with high stakes, and partly because their culture traditionally holds education in high regard (Byun & Baker, 2015; Byun et al., 2018; Ömeroğulları et al., 2020; Wang & Guo, 2017; Wang et al., 2014). Data from the Chinese National Assessment for Education Quality, an authorised national survey on education, documented that 43.8% of fourth-grade and 23.4% of eighth-grade students participated in mathematics shadow education in 2015, and approximately 45% engaged in PT outside school for more than two hours per week (National Centre of Education Quality, 2018). Over the last two decades, PT has gained popularity in regions outside of East Asia as well and attracted attention from researchers worldwide, including in Europe and the United States (Bray, 2014; Davies, 2004; Mischo & Haag, 2002; Zhang et al., 2021). This study highlights the experience of PT in the Chinese context, for the purpose of gaining insight into the scope of PT internationally.

Although PT might have different characteristics in different cultures, the education research community agrees on the fact that it has the potential to exert significant influences on teaching and learning in schools. Internationally, consumers’ aspirations for PT to enhance students’ academic achievements have made it an educational phenomenon as important as formal school education. Given that PT consumes significant family resources worldwide, stakeholders have clear questions on whether it is successful in improving students’ academic achievements (Bray, 2014), and its functioning in the context of daily learning for making these improvements.

Academic achievement assessments in mathematics, conducted internationally, suggest that Chinese and other Eastern Asian students have clear advantages over their Western peers (Leung, 2001, 2018; Mullis et al., 2004; Wang & Lin, 2005). Typically, this advantage has been explored from the perspective of classroom instruction in formal schooling (Leung, 2018), with a plethora of studies that compare Chinese and Western cultures in this regard (Clarke et al., 2006, 2012). However, this could be attributed to the possibly greater efforts of Chinese students, both in formal learning in school and PT outside school (Bray et al., 2020; Wang & Guo, 2017). Therefore, it is important to explore the effect of PT on students’ mathematical achievements, when studying mathematical education in China (Wang & Guo, 2017; Zhang et al., 2021).

The effect of PT on students’ achievements should be clarified for the purpose of determining whether the high rate of PT participation is an important factor in the comparatively greater academic achievements of Chinese and other East Asian students. Moreover, results from the Chinese context might have implications for other countries with fast-growing PT participation rates. With regard to the appropriate methodology for evaluating PT effectiveness, there is a marked absence of experimental work, except that of Meyer and Van Klaveren (2013), which examined the effectiveness of extended day programs through a randomised field experiment. Longitudinal research must be conducted for stronger data analysis on causal factors, rather than relying solely on correlation analysis (Bray, 2014). While cross-sectional data have been insufficient for identifying the causal impact of PT on students’ academic achievements, longitudinal studies have focused on the contribution of PT to students’ achievement gains, instead of their achievement at any given time.

Therefore, in the present study we evaluate the effect of PT on mathematical achievement in China, based on a specifically designed longitudinal database, and provide insights into the ‘secret’ behind the greater mathematical achievements and the ‘Chinese experience’ of PT in mathematics, for the benefit of the international educational society.

2 Background

2.1 PT in the Chinese context

Chinese education has been traditionally influenced by Confucian values, which hold it in high regard (Wang et al., 2014). With respect to career-planning, it is valued as appropriate preparation for promising jobs and fulfilling lives, which serves as external motivation for education (Leung, 2001; Wong et al., 2012). Moreover, the ‘examination culture’ (Wong et al., 2012) demonstrates that examinations have an important status and high stakes in China (Wang et al., 2014). Chinese parents and students tend to attribute greater achievement to hard work, rather than talent, and study is considered a serious endeavour in which students must expend great effort and perseverance (Leung, 2001). Such a cultural setting potentially motivates Chinese parents to seek out PT as a learning resource that can aid their children in achieving more. In general, PT in mathematics is a common social phenomenon in Mainland China (Wang & Guo, 2017). Parents tend to feel pressure to invest in their children’s education during the early academic years, and PT, alongside formal schooling, presents itself as a profitable avenue for such investment (Zhang, 2018).

A national level survey (National Centre of Education Quality, 2018) showed consistently high participation rates in mathematics PT, especially at the compulsory and lower grade levels, in big cities such as Beijing and Shanghai (Wang & Guo, 2017). Moreover, participation rates in mathematics PT were consistently and considerably higher than those for other school subjects. The only comparable subject was English (the second language in the Chinese school system). Mathematics has clearly emerged as the most important subject in the shadow education system; this is consistent with the findings of studies conducted in other countries (Bray & Lykins, 2012; Byun, 2014; Guill et al., 2019).

Especially in the larger cities, where PT is very prevalent, some schools find students who have already learned their lessons at an informal tutoring service, before their teaching of the content (Chen & Chen, 2015). In fact, some teachers express concern about PT possibly becoming the natural process of learning mathematics (Liang, 2014) and posing unnecessary challenges to mathematics teachers in schools.

This review concludes that PT has a vast scale, involving large numbers of students from all grade levels (He et al., 2021). These students and their parents spend long hours and considerable money on PT (especially in big cities and at compulsory levels). A sizeable portion of students’ study hours is spent on PT. The high participation rate in China provides abundant information on the characteristics of PT instruction, effectiveness, etc. especially for countries that are witnessing higher PT participation rates than ever before.

2.2 Theoretical considerations concerning mathematics learning relating to PT in China

Theories connecting PT to students’ mathematical learning have not advanced adequately (Guill et al., 2019; Zhang et al., 2021). Carroll’s model of school learning (1963), which posits learning as a function of the ratio of the time spent on learning to the time required to learn, provides a theoretical foundation for understanding the role of PT in students’ mathematical learning (Guill et al., 2019; He et al., 2021; Zhang et al., 2021). By adding extra learning hours, PT potentially enhances students’ understanding; thus, time must be taken into consideration when examining the factors that impact mathematical learning and achievements. Previous studies have not addressed PT suitably in the analysis of factors impacting academic achievement (see the review by Winne & Nesbit, 2010).

In addition, Carroll’s model (1963) states that simply increasing learning time might not sufficiently enhance academic achievement, as the quality of PT instruction is also important for effectiveness. Therefore, the quality of PT should be considered when exploring students’ mathematical learning (achievement), especially when the participation rate is quite high, such as in Eastern countries. It is not enough to simply assume that PT plays an important role; it requires investigation. In the current study we tested the hypothesis that both formal-school and out-of-school mathematics instruction can impact students’ mathematical learning. The following conceptual framework (He et al., 2021) served as a guide for the purpose.

The two modes of instruction, i.e., formal-school and out-of-school, potentially interact with each other. Both could be impacted by student-level variables, such as the socio-economic status of the family, a student’s interest in mathematical learning, and availability of computers and books in the family (Winne & Nesbit, 2010; Zhang et al., 2020) (Fig. 1).

Fig. 1
figure 1

Conceptual framework

For the covariates controlling the endogenous bias in the PT effect, besides the use of several predictors (e.g., gender, and urban areas; Byun et al., 2018; Smyth, 2009) previous studies tended to use ‘popular’ variables in large-scale surveys, such as family SES (e.g., Kuan, 2011) or parents’ education and educational expectation (e.g., Liu, 2012). Students’ basic level of achievement was found to be a core factor in their PT participation (Baker et al., 2001; Zeng & Zhou, 2012), while parents’ education, pre-intervention score in mathematics, and educational expectations (e.g. Kuan, 2011; Liu, 2012; Stevenson & Baker, 1992) were identified as control variables. In addition, classroom factors were identified as having a fundamental impact on students’ achievements (Winne & Nesbit, 2010). These were not comprehensively included in previous studies on PT effect. Thus, a conceptual framework comprising variables at the levels of individual, family and class level should guide the analysis of PT effectiveness.

2.3 Measuring the effect of PT on mathematical achievement in China

As part of research on educational effectiveness, quantitative studies have evaluated the effect of PT on students’ performance in mathematics. Typically, researchers have used a regression model, by controlling the selection bias of PT and endogeneity (Zhang, 2013), for achieving a partially causal model. While cross-sectional data have been inefficient in identifying the causal impact of PT on students’ academic achievements, longitudinal studies have mostly focused on the contribution of PT to students’ achievement gains (Zhang et al., 2021). Therefore, detailed data analysis of the causal factors is still required in longitudinal research (Bray, 2014).

Typically, in the measurements of PT, whether or not students received PT is used as a dummy variable (Zhang, 2013) for comparing the achievements of tutored and non-tutored students (e.g., Baker et al., 2001; Bray, 2014; Byun, 2014; Guill et al., 2019; He et al., 2021; Kuan, 2011; Ömeroğulları et al., 2020; Zhang, 2013; Zhang et al., 2021). More recent studies have explored the effect of this dummy variable on mathematical achievement in China, based on different kinds of databases (e.g., Fang et al., 2018; Guo et al., 2020; Sun et al., 2020; Zeng & Zhou, 2012; Zhang, 2011, 2013; Zhang et al., 2021; Zhao & Xue, 2021; Zheng et al., 2020). However, only three of these were systematic, retrospective longitudinal studies. Zhang (2011, 2013) did not find a significant correlation between attending mathematics PT sometimes during the final year of high school, and mathematical performance in the Chinese National College Entrance Exam, even after controlling for the total score of the entrance exam and other covariates. Zhang et al. (2021) also did not find any significant positive effect of regular PT attendance on mathematical achievements at the middle school level, even with continuous PT participation throughout the academic year (not the final year). While He et al. (2021) did find a significant positive effect of regular PT attendance during summer vacation on mathematical achievements at the high school level, they did not find a significant positive effect of attending PT during the school semester.

It must be noted that simply inquiring whether a student received PT is too broad a question for meaningful statistical analyses (Bray, 2014). Factors such as the point in time at which PT was received, the amount of time spent on it (e.g., one semester or one academic year), and systematic or random attendance (He et al., 2021; Ömeroğulları et al., 2020; Zhang et al., 2021) must also be considered, since they might have a substantial impact on the statistical results of the evaluation of PT effectiveness. Thus, ‘Does private supplementary tutoring work?’ is too broad to be a meaningful question (Bray, 2014), and related studies require a more careful research design, especially including the detailed measurement of PT.

2.4 Research questions

In the present study we focused on the following research questions:

  1. 1.

    What are the PT participation rates across different periods of (almost) a year in China?

  2. 2.

    Does regular PT participation for (almost) a year have an effect on students’ mathematical achievements?

3 Method

3.1 Participants

Data were collected in a medium-sized city in central China, with intermediate levels of economic and educational development. Two rounds of longitudinal data were collected through a questionnaire survey, conducted in December 2019 and January 2021. Administrative data were obtained from the head of the department of mathematics at the city’s education bureau. The survey tracked PT throughout the second semester of the eighth grade, summer vacation, and the first semester of the ninth grade. Additionally, final mathematics scores for the semesters were collected from the schools. To ensure the validity and accuracy of the survey, the city’s middle schools were divided into three levels, under the supervision of the head of the department of mathematics, because they differed in their quality of schooling. A total of five schools (the number of first, second, and third level schools was 2, 2, and 1, respectively), 52 classes and 2645 students were selected using whole-level sampling. The students answered the questionnaire on their own, under the supervision of the class teacher, and the end-of-semester final examinations were organised by the city’s education bureau. After matching the two data sets with the names of students in the class, data with more than 20% of missing values were excluded, resulting in 2123 valid responses. The proportions of students from the first, second, and third level schools in the final sample were 32.6%, 36.9% and 30.4%, respectively.

3.2 Missing data

The imputation method of Expectation Maximization (EM) in SPSS 23.0, was used to impute the missing data. It is a number algorithm that can be used to maximise likelihood under a wide variety of missing-data models (Dempster et al., 1977).

3.3 Measures

3.3.1 PT participation

The questionnaire questioned students primarily on their mathematics PT participation at the following levels: the second semester of eighth grade, the summer vacation and the first semester of ninth grade. They were asked about their PT intensity (never, occasionally, or on an average of 1 h, 2 h, 3 h, 4 h or more each week). In this study, we coded only two situations: 1 = participation, 0 = no participation (including ‘occasionally’). Combinations of PT in the different periods were coded as follows: 000 (no PT in any of the three periods), 100 (PT only during eighth grade), 010 (PT only during summer vacation), 001 (PT only during ninth grade), 110 (PT during summer vacation and eighth grade), 101 (PT in eighth and ninth grades), 011 (PT during summer vacation and ninth grade) and 111 (PT in all three periods).

3.3.2 Outcome variables

The dependent variable was the mathematics score of students at the end of the first school semester of ninth grade. The test was designed by the head of the department of mathematics, to assess students’ mathematical learning in the first school semester of ninth grade; thus, errors caused by inconsistent measuring tools were avoided (Zhang et al., 2021). The total raw score was 120, which was standardised into z-scores across five schools, to eliminate the dimensional relationship between variables and make the data more comparable.

3.3.3 Covariates

Achievement at school is influenced by individual, familial, and classroom factors. Individual factors include the student’s gender, self-expectation of education, pre-score at mathematics, i.e., the final mathematics examination result of the first semester of eighth grade (e.g., Byun et al., 2018; Liu, 2012). Familial level factors include mother’s education, parents’ highest occupation (Li, 2003), availability of a personal learning desk at home (e.g. Byun et al., 2018; Kuan, 2011) etc. The class level factors include whether the student is in a key class, the head teacher's management style, classroom learning atmosphere, and whether the head teacher is a mathematics teacher. The controlled covariates are described in Table 1.

Table 1 Definitions and measures of the variables

3.4 Data analysis

Students in the same class might be affected by different characteristics. In this study we used a two-level Hierarchical Linear Model (HLM) for evaluating the effect of PT participation on students' mathematical performance. The first level is that of the student and the second level is that of the class, and the students are nested below the class. The empty model does not contain any student or class variables; it is used to measure differences in students' mathematics scores across classes (if any).

$$y_{ij} = \beta_{0j} + \beta_{1j} PT_{ij} + \beta_{2j} X_{ij} + \varepsilon_{ij}$$
(1)
$$\beta_{0j} = \gamma_{00} + \gamma_{0s} K_{j} + \mu_{0j}.$$
(2)

Here,\(y_{ij}\) is the grade of \(i\) from class \(j\),\(\beta_{0j}\) is the average grade of the \(j\) class, \(\beta_{1j}\) is the regression coefficient of the independent variable, \(\beta_{2j}\) is the regression coefficient of the covariate, \(\varepsilon_{ij}\) is the unexplained residual of the score of \(i\) from class \(j\), \(\gamma_{00}\) is the average grade of all students, \(\gamma_{0s}\) is the regression coefficient of class-level variable, \(K_{j}\) is the class level variable, and \(\mu_{0i}\) is the unexplained residue at the class level. Before running the HLM analysis, we examined the variance under the empty models, and found that the variance at the student level was 0.667, and that between classes was 0.336. The intra-class correlation coefficients (ICC) under the empty models for students’ standardised test scores showed strong nesting effects, with an ICC of 0.335. The results showed that multilevel modelling is necessary for this study.

4 Results

4.1 Descriptive statistics and correlation analysis

Table 2 displays the descriptive statistics for all study variables. The proportions of students participating in mathematics PT in eighth grade, summer vacation and ninth grade were 38.8%, 48.1% and 49.5%, respectively. The proportion of students who did not participate in PT in any of the three periods was 39.5%, that of students who participated in PT in only one of the three periods was 1.7% (eighth grade), 5.9% (summer vacation), and 7.7% (ninth grade), and that of students who participated in PT in all three periods was 30.6%.

Table 2 Descriptive statistics of the variables used for analysis

Figure 2 shows participation in PT in the three periods. One can observe that students show a certain degree of continuity in PT participation.

Fig. 2
figure 2

Dendrogram of the proportion of students participating in PT in the three periods

4.2 HLM analysis

We established Model 1 as a two-level model without any variables, except for pre-score in mathematics. It shows the extent to which the variance (46%) is explained by the variable of pre-score at mathematics, thereby demonstrating the benefits of the longitudinal design. To prevent overfitting, we used the manual stepwise regression for picking the most relevant variables for establishing Model 2. To make the model more realistic, adding relevant variables based on suggestions from previous studies on the basis of model 2 is model 3. The results of the HLM analysis indicated that, after controlling for the students’ individual, family and class level variables, regular PT attendance throughout the three periods had a significant effect on students’ mathematical achievement at the end of the first school semester of the final year of middle school, with a small effect size (0.108, p = 0.020). In addition, PT in ninth grade played an important role, as all the combinations that included PT participation in ninth grade seemed to positively impact students’ mathematical achievements, with statistical or marginal significance. However, 101 and 011 did not have any significant effect. Moreover, students who participated in PT during summer vacation, and quit it in the ninth grade, seemed to have a lower mathematical achievement at the end of the first school semester of ninth grade, with marginal significance.

Of all the covariables, self-expectation of education, parents' expectation of their children's education, and being in a key class had significant impacts.

For Models 2 and 3, the variance explaining the rate of level 1 variables is \(f^{2} = \frac{0.667 - 0.304}{{0.667}} = 0.543\), and the variance explaining rate of level 2 variables is \(f^{2} = \frac{0.336 - 0.092}{{0.336}} = 0.725\). These results show that the variables have a strong effect (Cohen, 1992) (Table 3).

Table 3 Result from the HLM analysis for PT effectiveness

5 Discussion

In this study we conducted a longitudinal analysis of the effectiveness of long-term regular PT attendance on students’ mathematical achievements in the final grade of Chinese middle school.

5.1 Scale of PT participation in China

The results of this study support those of previous studies, showing that approximately 40% of students regularly participated in mathematics PT, even during the COVID-19 pandemic. In addition, the longitudinal database reflected the variations in PT participation across three periods in one year. Students tend to participate continuously or not participate in PT during different periods in the same year (PT in the second half of the academic year of eighth grade, summer vacation, and the first half of ninth grade); thus, PT participation can be considered a ‘habit’ among some students, and PT participation in a particular period influences participation in the successive periods (Zhang et al., 2020). Future studies could focus on how families decide on PT participation.

5.2 The effect of PT on mathematical achievement in China

The results of the HLM analysis indicate that regular PT attendance throughout a year could enhance students’ mathematical achievements by the middle of the final year of middle school, with a small effect size.

According to Carroll’s (1963) model of school learning, extending the time spent on PT (up to a whole academic year) could enhance PT effectiveness; however, PT participation for half a year was not found to be effective (Zhang et al., 2021). It must be noted that the COVID-19 pandemic struck China during the spring of 2020, and in the city where data were collected, students returned to school after two months, i.e., in May. Hence, the effect of PT might have been enhanced by the suspension of formal school instruction during the pandemic. Therefore, it is likely that the small PT effect might be obliterated after the pandemic. However, these results compensate for the limitations of previous studies, which typically used PT as a dummy variable and indicated that PT might be effective alongside formal schooling, such as in scenarios like a pandemic.

In addition, several other variables, such as PT participation during the final year of middle school, could have enhanced academic performance by the final year, when students took the high-stake examination for entrance to well-ranked high schools. This indicates that students in the final academic year tend to be more motivated and make better use of PT, thus functioning as a moderating variable between PT and mathematical achievement. This hypothesis requires further testing.

This study shows how the evaluation of PT effectiveness is a rather complex issue (Guill et al., 2019; Zhang et al., 2021), as the measurement of PT participation could vary on several aspects, for example, the point of time at which PT was received, the amount of time spent on it, and whether students are in their final year of middle school. Moreover, this study derived its results from a medium-sized city in central China, with a moderate level of economic and educational development. Therefore, replicating it for larger or smaller cities, with higher or lower levels of development in formal and informal schooling, requires the exercise of caution.

5.3 PT and mathematics learning among Chinese students

As mentioned in Sect. 1, the evaluation of PT effectiveness should be discussed in the context of the scope of mathematical learning, especially in East Asian countries like China with high PT participation rates.

This study partly supports the statement made in the introductory section, that PT should be considered in the exploration of mathematical learning (i.e., greater mathematical achievement) in the Chinese context. However, while previous studies did not find an overall positive effect of PT on students’ mathematical achievements (Zhang, 2011, 2013; Zhang et al., 2021), this study found that regular PT attendance throughout the year had only a small effect on students’ mathematical achievements in the final year of middle school. Therefore, by considering the results of previous studies as well, one could state that the high PT participation rate is not a key factor in the greater mathematical achievements of Chinese students, internationally. Rather, it could be attributed to formal school instruction.

5.4 Strengths and limitations

A major strength of this study is the analysis of longitudinal data sets specific to out-of-school mathematical learning, involving a large sample size of hundreds of tutored students in the final year of middle school. Longitudinal data allowed controlling for systematic differences between tutored and non-tutored students, and those among related control variables. Moreover, the information available on PT was quite precise, and included the intensity of PT (Zhang et al, 2021) in three consecutive periods, the lack of which was a limitation of previous studies (Ömeroğulları et al., 2020). Achievement in mathematics was generally consistent with school curriculum and PT instruction. This allowed controlling for students' actual level of knowledge exactly before and after tutoring, which solved the problem of methodology found in previous studies (Bray, 2014).

The limitation of this study is the sample size of five schools in a city, which partly limited the applicability of the results. Furthermore, it focused on PT in different periods, rather than on strengthening PT, which would require the use of advanced statistical methods. Finally, the study neither discussed the quality of PT, such as the qualification of the tutors (Wang & Wang, 2021), nor other variables that reflect the quality of PT instruction (Zhang et al., 2021). Future studies could survey students from different areas for a detailed analysis of the effects of different types of PT. They could also pay more attention to the quality of PT, for a more comprehensive understanding of PT effectiveness.

Moreover, it cannot be definitely stated whether the small PT effect found during the COVID-19 pandemic would continue to exist after it ends, since no control group in the study was unaffected by the pandemic. However, since the sample city did not suffer severely due to the pandemic, students returned to school comparatively early. We could thus argue that the small PT effect might outlast the pandemic, but further investigation is required to support this statement.

6 Conclusion and implications

In this study we analysed the results of a specially designed longitudinal survey of private supplementary mathematics tutoring among middle school students in China. We concluded that regularly attending PT throughout could have a small effect on students’ mathematical achievements by the final year of middle school. Moreover, the effectiveness of PT in the final academic year depends on students’ motivation for learning. The study indicates that parents should carefully select PT for their children since it might not always yield the expected results. Mathematics PT does provide students with additional opportunities for enhancing knowledge, but this does not imply that participation in mathematics PT will necessarily have a positive impact on students’ achievement. PT participation should be carefully considered by parents, in consultation with schoolteachers, and based on students' personal characteristics such as performance foundation, knowledge lacunae, grade levels, and personal PT needs. The opinions of individual students must be respected while acknowledging the unique advantages of tutoring for personalised education. In addition, the study suggests that the government must issue comprehensive and professional guidelines for regulating the large, but mostly unregulated, PT industry in China. Such guidelines would improve the standards of implementing local PT policies, clarify the responsible stakeholders, recognise the vitality of local regulatory agencies in the policy implementation process, and positively direct the development of PT institutions. Finally, the study’s findings provide important insight to Western countries with significantly expanding PT rates, on the prudent use of PT in improving students' mathematical achievements.