Development of a Japanese version of the Epworth Sleepiness Scale (JESS) based on Item Response Theory

doi:10.1016/j.sleep.2008.04.015

Sleep Medicine

Volume 10, Issue 5, May 2009, Pages 556-565

https://doi.org/10.1016/j.sleep.2008.04.015 Get rights and content

Abstract

Background

Various Japanese versions of the Epworth Sleepiness Scale (ESS) have been used, but none was developed via standard procedures. Here we report on the construction and testing of the developer-authorized Japanese version of the ESS (JESS).

Methods

Developing the JESS involved translations, back translations, a pilot study, and psychometric testing. We identified questions in the ESS that were difficult to answer or were inappropriate in Japan, proposed possible replacements for those questions, and tested them with analyses based on item response theory (IRT) and classical test theory. The subjects were healthy people and patients with narcolepsy, idiopathic hypersomnia, or obstructive sleep apnea syndrome.

Results

We identified two of our proposed questions as appropriate replacements for two problematic questions in the ESS. The JESS had very few missing data. Internal consistency reliability and test–retest reliability were high. The patients had significantly higher JESS scores than did the healthy people, and higher JESS scores were associated with worse daytime function, as measured with the Pittsburgh Sleep Quality Index.

Conclusions

In Japan, the JESS provides reliable and valid information on daytime sleepiness. Researchers who use the ESS with other populations should combine their knowledge of local conditions with the results of psychometric tests.

Introduction

Daytime sleepiness is an important manifestation of sleep disorders. It can disrupt a patient’s social life and threaten public health and safety [1], [2], [3], [4]. Daytime sleepiness is an important marker in assessments of sleep disorders, and is measured both subjectively and objectively. The “gold standard” index of sleepiness is provided by the Multiple Sleep Latency Test (MSLT) [5], but this test is costly and time-consuming. Requiring much less time and money is the Epworth Sleepiness Scale (ESS), a self-report instrument for measuring a patient’s perception of sleepiness. Guidelines for the control of Obstructive Sleep Apnea Syndrome (OSAS), narcolepsy, and insomnia recommend the ESS [6], [7], [8], and it has also been used in occupational and community-based studies [9], [10].

The ESS comprises questions about subjective sleepiness in eight situations [11], [12]. Respondents use a 4-point scale (scored 0–3) to respond to each of the eight questions, and the scores are summed to give an overall score of 0–24. Higher scores indicate stronger subjective daytime sleepiness, and scores below 10 are considered to indicate no problem [9].

Several Japanese language versions of the ESS are available, but their relation to the original version and their acceptability are questionable because none was developed in accordance with standard procedures or in coordination with the developer of the original (English language). An advisory committee on sleep apnea syndrome within the Japanese Respiratory Society reported in 2004 that 165 of 277 hospitals used various Japanese language versions of the ESS, and used them in different ways (self-administration or interview). This inconsistency hampers comparisons among hospitals [13]. In addition, although many papers published from Japan, including one by us [14], have reported ESS data, the possibility remains that the sleepiness measured with those versions differs in important ways from that measured with the original ESS. Furthermore, the questions in the original ESS ask about sleepiness in various daily life situations, but we should not assume that all of those situations are familiar to respondents in Japan. Such a lack of familiarity could explain the reported high rates of missing data (9.5–19.2%) [15]. For example, the ubiquity of public transportation in Japan could account for the fact that 19.2% of the subjects in one study did not answer question 8, which asks about sleepiness “In a car, while stopped for a few minutes.”

Item Response Theory (IRT) is increasingly used in the construction of scales for measuring subjective attributes, particularly in research on Quality of Life. Common applications of IRT are the construction of shorter versions of existing scales, establishment of scoring algorithms, and Computerized Adaptive Testing (CAT) [16], [17], [18], [19]. In IRT, the probability of a particular response to a question is described as a function of a latent trait assumed to underly the manifest response. In IRT the value of the latent trait is called “theta” [20], [21], and in this study it is the actual subjective daytime sleepiness. The probability of a particular response is typically described with a function that has two parameters: “location” is the value of the latent trait about which a response provides the most information (in educational testing this is called “difficulty”), and “slope” is the degree to which responses can be used to distinguish between small differences in the latent trait [22]. In the ESS, questions with higher values of the location parameter provide more information about people whose daytime sleepiness is severe, and questions with lower values of the location parameter provide more information about people whose daytime sleepiness is milder. Questions with higher values of the slope parameter allow one to make fine distinctions between severities of daytime sleepiness, i.e., to measure small differences in daytime sleepiness, while questions with lower values of the slope parameter allow one to measure only relatively larger differences in daytime sleepiness. Analyses based on IRT allow more precise examinations of the characteristics of each question item than do those based on Classical Test Theory (CTT) [20], [21]. In addition to analyses of each question, IRT allows construction of an information function for each question and also for the scale as a whole. That test information function reflects the accuracy of measurement at different values of the latent trait [20], [21]. Given a sufficiently large and wide-ranging group of questions (the “item pool”), and knowledge of each question’s location and slope, a scale with a desired test information function can be constructed.

Here, at the request of the Japanese Respiratory Society, we report the development of a Japanese version of the ESS (the JESS). We were able to ensure that the JESS was as close as practically possible to the ESS, because one of us (M.J.) is the developer of the ESS. Our purposes were to translate the ESS into Japanese using commonly-accepted methods, to clarify problems with unauthorized Japanese-language versions of the ESS, to use IRT to develop a better Japanese-language version (the JESS), and to study the reliability and validity of the JESS.

Section snippets

Translation and preparation of the item pool

To develop the Japanese version we used a method that has been used in many countries and for many self-report scales [23]. The process includes translation from the source language into the target language (i.e., forward translation), translation from the target language back into the source language (i.e., back translation) so the developer of the source-language version can participate fully, and examination of translation quality. We also included a pilot test.

At the forward translation

Translation and item pool

Discussions between the Japanese team and the developer of the original version resulted in some differences between the JESS and previous Japanese-language versions.

In our first translation and in several previous Japanese-language versions the instructions and response options used the expression “nemutteshimau,” which means “fall asleep.” Because “dozing off” was meant to indicate sleeping for a short time, we instead used “utoutosuru (suubyou∼suufun nemutteshimau),” which means “doze off

Discussion

This study showed that the authorized Japanese translation of the ESS (the JESS) measures a construct similar to that measured by the original ESS. Moreover, we showed how the original ESS question about being in a car was problematic, and we identified an appropriate replacement for that question.

While the original ESS asked questions about probabilities (“how likely”), previous Japanese versions asked factual questions (“how often”) and measured more severe sleepiness. To avoid that content

Conclusion

Using standard, internationally recognized methods we developed and tested a version of the ESS for use in Japan (the JESS). To our knowledge, this study is the first application of IRT to the selection of questions for the ESS. Two questions from the original ESS were replaced. The two new questions were psychometrically similar to, or better than, the originals. The JESS is characterized by content equivalence with the original ESS and appropriateness for use in Japan, and its use is expected

Acknowledgements

This study was supported by Grants from the Institute for Health Outcomes and Process Evaluation Research (iHope International). We are grateful to Itsunari Minami and Yuriko Nakayama for recruiting subjects and collecting data. We also wish to thank Tsutomu Namikawa for his suggestions on our analysis.

References (36)

T. Akashiba et al.
Relationship between quality of life and mood or depression in patients with severe obstructive sleep apnea syndrome
Chest
(2002)
X. Liu et al.
Sleep loss and daytime sleepiness in the general adult population of Japan
Psychiatry Res
(2000)
G.R. Wunderlich et al.
An item response analysis of the international restless legs syndrome study group rating scale for restless legs syndrome
Sleep Med
(2005)
D.J. Buysse et al.
The Pittsburgh sleep quality index: a new instrument for psychiatric practice and research
Psychiatry Res
(1989)
J. Backhaus et al.
Test–retest reliability and validity of the Pittsburgh sleep quality index in primary insomnia
J Psychosom Res
(2002)
Y. Doi et al.
Psychometric assessment of subjective sleep quality using the Japanese version of the Pittsburgh sleep quality index (PSQI-J) in psychiatric disordered and control subjects
Psychiatry Res
(2000)
L.J. Findley et al.
Severity of sleep apnea and automobile crashes
N Engl J Med
(1989)
J.M. Lyznicki et al.
Sleepiness, driving, and motor vehicle crashes. Council on Scientific Affairs, American Medical Association
JAMA
(1998)
S. Melamed et al.
Excessive daytime sleepiness and risk of occupational injuries in non-shift daytime workers
Sleep
(2002)
M.A. Carskadon et al.
Guidelines for the multiple sleep latency test (MSLT): a standard measure of sleepiness
Sleep
(1986)

Scottish Intercollegiate Guidelines Network. Management of obstructive sleep apnoea/hypopnoea syndrome in adults. A...

M. Littner et al.

Practice parameters for the treatment of narcolepsy: an update for 2000

Sleep

(2001)

A. Chesson et al.

Practice parameters for the evaluation of chronic insomnia. An American academy of sleep medicine report. Standards of practice committee of the American academy of sleep medicine

Sleep

(2000)

M. Johns et al.

Daytime sleepiness and sleep habits of Australian workers

Sleep

(1997)

M.W. Johns

A new method for measuring daytime sleepiness: the Epworth sleepiness scale

Sleep

(1991)

M.W. Johns

Reliability and factor analysis of the Epworth sleepiness scale

Sleep

(1992)

T. Akashiba et al.

Current situation of the SAS diagnosis and treatment in facilities recognized by the Japanese respiratory society: results of the questionnaire survey

Nihon Kokyuki Gakkai Zasshi

(2004)

K. Chin et al.

Response shift in perception of sleepiness in obstructive sleep apnea–hypopnea syndrome before and after treatment with nasal CPAP

Sleep

(2004)

Cited by (242)

Interdisciplinary research on the dissemination and promotion of screening for sleep-related breathing disorders for traffic safety in Japan
2024, IATSS Research
Sleep-related breathing disorders (SRBD), characterized by episodes of apnea and hypopnea during sleep, are highly prevalent worldwide. Although adherence to continuous positive airway pressure (CPAP) therapy improves quality of life and reduces symptoms in patients with obstructive sleep apnea (OSA), and OSA treatment is well established, the vast majority of individuals with OSA who might benefit from treatment remain undiagnosed. This is attributed to the high-cost of undergoing polysomnography, which is the gold standard test for OSA, and requires special equipment and experts, limiting the number of facilities that can perform this test. SRBD is a risk factor for traffic accidents and lifestyle diseases because it causes daytime/waking sleepiness. Traffic accidents caused by commercial drivers have a serious impact on social safety. Furthermore, obesity and overweight status, which is a main risk factor for SRBD, is more prevalent among commercial drivers than among the general population. Thus, the promotion and awareness of SRBD screening among commercial drivers are urgently required. In this overview, we provide an assessment of the present status and challenges related to SRBD screening in Japan. Our insights are drawn from the outcomes of our project titled “Interdisciplinary research on the dissemination and promotion of screening for sleep disorders”, which received funding from the International Association of Traffic and Safety Sciences from 2012 to 2014. The project focused on exploring the association between SRBD and traffic accidents among commercial drivers in Japan. The findings of our project suggest the importance of early detection and treatment of SRBD through screening using objective measurements for all commercial drivers.
Developing the Japanese and English versions of the Mind Blanking Questionnaire (MBQ): validation and reliability
2024, Personality and Individual Differences
We occasionally experience moments when our minds go blank, or our attention disappears. This psychological phenomenon, known as mind blanking (MB), has recently garnered increased attention. MB has been assessed at the state level using experience sampling methods in which participants undertake a cognitive task and probes suddenly appear asking them to report the current contents of their thoughts. However, it may be possible to evaluate MB at the “trait level” as is the case with other mental states (e.g., mind wandering, anxiety, and the like). In the present study, we developed a new scale, the Mind Blanking Questionnaire (MBQ), for assessing the tendency of MB at the trait level in both Japanese and English. The MBQ exhibited good psychometric properties including internal consistency, test-retest reliability, and criterion and construct validities. Additionally, it displayed measurement invariance between the language versions, genders, and age groups. The MBQ would be a valuable tool to assess the individual tendency of MB and contribute to cross-cultural studies.
Comparison of sleep characteristics between Parkinson's disease with and without freezing of gait: A systematic review
2024, Sleep Medicine
Parkinson's disease (PD) is a neurodegenerative disorder characterized by a range of motor and non-motor symptoms. Among the motor complaints, freezing of gait (FOG) is a common and disabling phenomenon that episodically hinders patients' ability to produce efficient steps. Concurrently, sleep disorders are prevalent in PD and significantly impact the quality of life of affected individuals. Numerous studies have suggested a bidirectional relationship between FOG and sleep disorders. Therefore, our objective was to systematically review the literature and compare sleep outcomes in PD patients with FOG (PD + FOG) and those without FOG (PD-FOG). By conducting a comprehensive search of the PubMed and Web of Science databases, we identified 20 eligible studies for inclusion in our analysis. Our review revealed that compared to PD-FOG, PD + FOG patients exhibited more severe symptoms of rapid eye movement sleep behavior disorder in nine studies, increased daytime sleepiness in eight studies, decreased sleep quality in four studies, and more frequent and severe sleep disturbances in four studies. These findings indicate that PD + FOG patients generally experience worse sleep quality, higher levels of daytime sleepiness, and more disruptive sleep disturbances compared to those without FOG (PD-FOG). The association between sleep disturbances and FOG highlights the importance of evaluating and monitoring these symptoms in PD patients and open the possibility for future studies to assess the impact of managing sleep disturbances on the severity and occurrence of FOG, and vice versa.
Different dimensions of daytime sleepiness predicted mortality in older adults: Sex and muscle power-specific risk in Yilan Study, Taiwan
2024, Sleep Medicine
This study aimed to investigate the relationship between daytime sleepiness and mortality risk among older adults. The moderating effects of sex and physical function were examined.
This 9-year follow-up study was conducted with community-dwelling individuals aged ≥65 years. Daytime sleepiness was evaluated using the Epworth Sleepiness Scale (ESS). Exploratory factor analysis (EFA) was used to examine the ESS factors. Handgrip strength was measured to assess physical function, and the highest quartile was defined as good muscle power. Cox regression analysis was used to estimate the 9-year all-cause mortality risk. The interaction terms were examined to evaluate their moderating effect.
In total, 2588 individuals participated in the study. The EFA explored two factors: the passive factor (PF) and the active factor (AF). After controlling for various covariates, the cutoff-defined daytime sleepiness (ESS≥11), total raw scores, and factor scores of the ESS all failed to predict mortality risk. The 3-way interaction terms showed statistical significance in terms of [sex × PF × muscle power (p = 0.03)] but not for [sex × AF × muscle power (p = 0.11)]. Specifically, PF predicted mortality risk in women with good muscle power (hazard ratio (HR): 1.48; 95 % confidence interval (CI): 1.04–2.10), which is female-specific. In contrast, AF predicted mortality risk only in men with good muscle power (HR: 1.35; 95 % CI: 1.02–1.78).
The ESS-measured daytime sleepiness in older adults is multidimensional. The mortality risk for each dimension was determined based on sex and physical function.
Impact of diabetes and glycated hemoglobin level on the clinical manifestations of Parkinson's disease
2023, Journal of the Neurological Sciences
The coexistence of diabetes mellitus (DM) has been suggested to accelerate the progression of Parkinson's disease (PD) and make the phenotype more severe. In this study, we investigated whether DM or glycated hemoglobin (HbA1c) levels affect the differences in motor and nonmotor symptoms.
We conducted a cross-sectional study including 140 consecutive Japanese patients with PD for whom medical history and serum HbA1c records were available. The PD patients with a DM diagnosis were classified into the diabetes-complicated group (PD-DM) and the nondiabetes-complicated group (PD-no DM). Next, patients were classified based on a median HbA1c value of 5.7, and clinical parameters were compared. The correlations between HbA1c levels and other clinical variables were analyzed.
Of 140 patients, 23 patients (16%) had DM. Compared to PD-no DM patients, PD-DM patients showed lower MMSE scores. Compared to the lower HbA1c group, the higher HbA1c group showed a higher MDS-UPDRS part III score and a lower metaiodobenzylguanidine (MIBG) scintigraphy heart-to-mediastinum (H/M) ratio. HbA1c levels were positively correlated with age and the MDS-UPDRS part III score and negatively correlated with the MMSE score and H/M ratio on cardiac MIBG scintigraphy. Binary logistic regression analysis, which included age, sex, disease duration, and MMSE and MDS-UPDRS part III scores as independent variables, revealed that a lower MMSE score was an independent contributor to PD-DM and PD with high HbA1c levels.
DM complications and high HbA1c levels may affect cognitive function in patients with PD.
Epworth sleepiness scale: A meta-analytic study on the internal consistency
2023, Sleep Medicine
The Epworth Sleepiness Scale (ESS) is one of the most used self-reported instruments to assess sleepiness. Thus, several adaptations into different Languages have been performed worldwide over the years. The scale has produced disparate psychometric properties when applied in different settings. In the current study, our aim was to perform a Reliability Generalization meta-analysis of the Cronbach᾽s alphas of all published studies on ESS, specifically with a psychometric focus.
Three reference databases (Scopus, PubMed and Web of Science) were searched since 1991 to October 2022 and all the records on psychometric or validation studies that reported Cronbach's alphas, from clinical and nonclinical groups, were included. In total, data from 46 publications (63 estimates) were extracted, comprising 92,503 participants.
Using a Random-Effects Model, the cumulative Cronbach's alpha for the 63 estimates was about 0.82 (CI: 0.798, 0.832) which can be considered as a good measure. However, and as expected, it was observed a high level of heterogeneity (I² = 98.96%). Moderation analyses considering setting, date, continent, risk of bias, sex, age and language were performed in order to account for the heterogeneity. Even so, only the variables study setting and continent were significant, and had little importance in explaining the heterogeneity.
The ESS is a reliable tool to measure sleepiness; however, further studies are needed to investigate what variables might explain the observed variability. Moreover, it will be important to include empirical studies beyond psychometric ones.

View all citing articles on Scopus

^☆: Disclosure statement: This study was supported by grants from the Institute for Health Outcomes and Process Evaluation Research (iHope International). All authors have indicated no financial conflict of interest.

View full text

Original ArticleDevelopment of a Japanese version of the Epworth Sleepiness Scale (JESS) based on Item Response Theory☆

Abstract

Introduction

Section snippets

Translation and preparation of the item pool

Translation and item pool

Discussion

Conclusion

Acknowledgements

Chest

Psychiatry Res

Sleep Med

Psychiatry Res

J Psychosom Res

Psychiatry Res

Severity of sleep apnea and automobile crashes

N Engl J Med

Sleepiness, driving, and motor vehicle crashes. Council on Scientific Affairs, American Medical Association

JAMA

Excessive daytime sleepiness and risk of occupational injuries in non-shift daytime workers

Sleep

Guidelines for the multiple sleep latency test (MSLT): a standard measure of sleepiness

Sleep

Practice parameters for the treatment of narcolepsy: an update for 2000

Sleep

Practice parameters for the evaluation of chronic insomnia. An American academy of sleep medicine report. Standards of practice committee of the American academy of sleep medicine

Sleep

Daytime sleepiness and sleep habits of Australian workers

Sleep

A new method for measuring daytime sleepiness: the Epworth sleepiness scale

Sleep

Reliability and factor analysis of the Epworth sleepiness scale

Sleep

Current situation of the SAS diagnosis and treatment in facilities recognized by the Japanese respiratory society: results of the questionnaire survey

Nihon Kokyuki Gakkai Zasshi

Response shift in perception of sleepiness in obstructive sleep apnea–hypopnea syndrome before and after treatment with nasal CPAP

Sleep

Original Article
Development of a Japanese version of the Epworth Sleepiness Scale (JESS) based on Item Response Theory☆