Almost all commonly used statistical methods in psychology and other social sciences are based on the assumption that the collected data are normally distributed. For example, t- and F-distributions for mean comparison, Fisher Z-transformation for inferring correlation, and standard errors and confidence intervals in multivariate statistics are all based on the normality assumption (Tabachnick and Fidell 2012). Researchers rely on these methods to accurately portray the effects under investigation, but may not be aware that their data do not meet the normality assumption behind these tests or what repercussions they face when the assumption is violated. From a methodological perspective, if quantitative researchers know the type and severity of nonnormality that researchers are facing, they can examine the robustness of normal-based methods as well as develop new methods that are better suited for the analysis of nonnormal data. It is thus critical to understand whether practical data satisfy the normality assumption and if not, how severe the nonnormality is, what type of nonnormality it is, what the consequences are, and what can be done about it.

To understand normality or nonnormality, we need to first define a measure of it. Micceri (1989) evaluated deviations from normality based on arbitrary cut-offs of various measures of nonnormality, including asymmetry, tail weight, outliers, and modality. He found that all 440 large-sample achievement and psychometric measures distributions were nonnormal, 90 % of which had sample sizes larger than 450. More recently, Blanca et al. (2013) evaluated nonnormality using the skewness and kurtosisFootnote 1 of 693 small samples, with sample size ranging from 10 to 30. The study includes many psychological variables, and the authors found that 94.5 % of distributions were outside the range of [-0.25, 0.25] on either skewness or kurtosis and therefore violated the normality assumption. However, neither Micceri nor Blanca et al. discuss the distribution of skewness or kurtosis, how to test violations of normality, or how much effect they can have on the typically used methods such as t-test and factor analysis.

Scheffe (1959, p.333) has commented that kurtosis and skewness are “the most important indicators of the extent to which nonnormality affects the usual inferences made in the analysis of variance.” Skewness and kurtosis are also an intuitive means to understand normality. If skewness is different from 0, the distribution deviates from symmetry. If kurtosis is different from 0, the distribution deviates from normality in tail mass and shoulder (DeCarlo 1997b).Footnote 2

In practice, normality measures such as skewness and kurtosis are rarely reported. In order to study nonnormality, we have contacted and obtained responses from 124 researchers, among whom only three reported skewness and kurtosis in their papers. The under-report of normality measures can be due to several reasons. First, many researchers are still not aware of the prevalence and influence of nonnormality. Second, not every researcher is familiar with skewness and kurtosis or their interpretation. Third, extra work is needed to compute skewness and kurtosis than the commonly used summary statistics such as means and standard deviations. Fourth, researchers might worry about the consequences of reporting large skewness and kurtosis.

This paper provides a simple and practical response to the continuing under-report of nonnormality measures in published literature by elucidating the problem of nonnormality and offering feasible recommendations. We begin with an easy-to-follow introduction to univariate and multivariate skewness and kurtosis, their calculations, and interpretations. We then report on a review we conducted assessing the prevalence and severity of univariate and multivariate skewness and kurtosis in recent psychology and education publications. We also show the influence of skewness and kurtosis on commonly used statistical tests in our field using data of typical skewness, kurtosis, and sample size found in our review. In addition, we offer a tutorial on how to compute the skewness and kurtosis measures we report here through commonly used software including SAS, SPSS, R, and a Web application. Finally, we offer practical recommendations for our readers to follow in their own research, including a guideline on how to report sample statistics in empirical research and some possible solutions for nonnormality.

Univariate and multivariate skewness and kurtosis

Different formulations for skewness and kurtosis exist in the literature. Joanes and Gill (1998) summarize three common formulations for univariate skewness and kurtosis that they refer to as g 1 and g 2, G 1 and G 2, and b 1 and b 2. The R package moments (Komsta and Novomestky 2015), SAS proc means with vardef=n, Mplus, and STATA report g 1 and g 2. Excel, SPSS, SAS proc means with vardef=df, and SAS proc univariate report G 1 and G 2. Minitab reports b 1 and b 2, and the R package e1071 (Meyer et al. 2015) can report all three. There are also several measures of multivariate skewness and kurtosis, though Mardia’s measures (Mardia 1970) are by far the most common. These are currently available in STATA, or as add-on macros multnorm in SAS or mardia in SPSS (DeCarlo 1997a).

Univariate skewness and kurtosis

For the univariate case, we adopt Fisher’s skewness (G 1) and kurtosis (G 2). Specifically, the skewness, G 1, is calculated as

$$ G_{1} = \frac{\sqrt{n(n-1)}}{n-2}\cdot\frac{m_{3}}{m_{2}^{3/2}}, $$
(1)

and the kurtosis, G 2, as

$$ G_{2}=\frac{n-1}{(n-2)(n-3)}\cdot\left[(n+1)\left( \frac{m_{4}}{{m_{2}^{2}}}-3\right)+6\right], $$
(2)

where \(m_{r}={\sum }_{i=1}^{n}(x_{i}-\bar {x})^{r}/n\) is the rth central moment with \(\bar {x}\) being the sample mean and n the sample size. The sample skewness G 1 can take any value between negative infinity and positive infinity. For a symmetric distribution such as a normal distribution, the expectation of skewness is 0. A non-zero skewness indicates that a distribution “leans” one way or the other and has an asymmetric tail. Distributions with positive skewness have a longer right tail in the positive direction, and those with negative skewness have a longer left tail in the negative direction.

Fig. 1
figure 1

Illustration of positive and negative skewness

Figure 1 portrays three distributions with different values of skewness. The one in the middle is a normal distribution and its skewness is 0. The one on the left is a lognormal distribution with a positive skewness = 0.95. A commonly used example of a distribution with a long positive tail is the distribution of income where most households make around $53,000 a yearFootnote 3 and fewer and fewer make more. In psychology, typical response time data often show positive skewness because much longer response time is less common (Palmer et al. 2011). The distribution on the right in Fig. 1 is a skew-normal distribution with a negative skewness = -0.3. For example, high school GPA of students who apply for colleges often shows such a distribution because students with lower GPA are less likely to seek a college degree. In psychological research, scores on easy cognitive tasks tend to be negatively skewed because the majority of participants can complete most tasks successfully (Wang et al. 2008).

Kurtosis is associated with the tail, shoulder and peakedness of a distribution. Generally, kurtosis increases with peakedness and decreases with flatness. However, as DeCarlo (1997b) explains, it has as much to do with the shoulder and tails of a distribution as it does with the peakedness. This is because peakedness can be masked by variance. Figure 2a and b illustrate this relationship clearly. Figure 2a shows the densities of three normal distributions each with kurtosis of 0 but different variances, and Fig. 2b shows three distributions with different kurtosis but the same variance. Normal distributions with low variance have high peaks and light tails as in Fig. 2a, while distributions with high kurtosis have high peaks and heavy tails as in Fig. 2b. Hence, peakedness alone is not indicative of kurtosis, but rather it is the overall shape that is important. Kurtosis must increase as skewness increases, because of the relationship: kurtosis=skewness2-2 (Shohat 1929).

Fig. 2
figure 2

Illustration of the relationship between kurtosis and variance. In Fig 2a each population has a kurtosis of 0, and variance varies from 0.5 to 2.0. In Fig 2b each population has a variance of 1, and kurtosis varies from -1 to 3.

Kurtosis has a range of [-2(n-1)/(n-3),8) in a sample of size n and a range of [-2, 8] in the population.Footnote 4 The expectation of kurtosis of a normal distribution is 0. If a distribution is leptokurtic, meaning it has positive kurtosis, the distribution has a fatter tail than the normal distribution with the same variance. Generally speaking, if a data set is contaminated or contains extreme values, its kurtosis is positive. If a distribution is platykurtic, meaning it has negative kurtosis, the distribution has a relatively flat shoulder and short tails (e.g., see Fig. 2b). For example, the distribution of age of the US population has negative kurtosis because there are generally the same number of people at each age.Footnote 5

Because for a normal distribution both skewness and kurtosis are equal to 0 in the population, we can conduct hypothesis testing to evaluate whether a given sample deviates from a normal population. Specifically, the hypothesis testing can be conducted in the following way.Footnote 6 We first calculate the standard errors of skewness (SES) and kurtosis (SEK) under the normality assumption (Bliss 1967, p.144-145),

$$\begin{array}{@{}rcl@{}} SES & = & \sqrt{\frac{6n(n-1)}{(n-2)(n+1)(n+3)}}, \end{array} $$
(3)
$$\begin{array}{@{}rcl@{}} SEK & = & 2(SES)\sqrt{\frac{n^{2}-1}{(n-3)(n+5)}}. \end{array} $$
(4)

Note that the standard errors are functions of sample size. In particular, standard error decreases as sample size increases, and the strictness with which we call a distribution “normal” becomes more and more rigid. This is a natural consequence of statistical inference. With these standard errors, two statistics,

$$ Z_{G1}=G_{1}/SES $$
(5)

and

$$ Z_{G2}=G_{2}/SEK, $$
(6)

can be formed for skewness and kurtosis, respectively. Both of these statistics can be compared against the standard normal distribution, N(0,1), to obtain a p-value to test a distribution’s departure from normality (Bliss 1967). If there is a significant departure, the p-value is smaller than .05 and we can infer that the underlying population is nonnormal. If neither test is significant, there is not enough evidence to reject normality based on skewness or kurtosis although it may still be nonnormal in other characteristics.

Multivariate skewness and kurtosis

The univariate skewness and kurtosis have been extended to multivariate data. Multivariate skewness and kurtosis measure the same shape characteristics as in the univariate case. However, instead of making the comparison of the distribution of one variable against a univariate normal distribution, they are comparing the joint distribution of several variables against a multivariate normal distribution.

In this study, we use Mardia’s measures (Mardia 1970) of multivariate skewness and kurtosis, because they are most often included in software packages. Mardia defined multivariate skewness and kurtosis, respectively, as

$$\begin{array}{@{}rcl@{}} b_{1,p}&=&\frac{1}{n^{2}}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}\left[\left( \mathbf{x}_{i}-\bar{\mathbf{x}}\right) \mathbf{{~}^{\prime}S}^{-1}\left( \mathbf{x}_{j}-\bar{\mathbf{x}}\right)\right]^{3}, \end{array} $$
(7)
$$\begin{array}{@{}rcl@{}} b_{2,p}&=&\frac{1}{n}{\sum}_{i=1}^{n}\left[\left( \mathbf{x}_{i}-\bar{\mathbf{x}}\right)\mathbf{{~}^{\prime}S}^{-1} \left( \mathbf{x}_{i}-\bar{\mathbf{x}}\right)\right]^{2}, \end{array} $$
(8)

where x is a p×1 vector of random variables and S is the biased sample covariance matrix of x defined as

$$ \mathbf{S}=\frac{1}{n}{\sum}_{i=1}^{n}\left[(\mathbf{x}_{i}-\bar{\mathbf{x}})(\mathbf{x}_{i}-\bar{\mathbf{x}})^{\prime}\right]. $$
(9)

Both measures have a p subscript, so they are specific to a set of p variables. The expected Mardia’s skewness is 0 for a multivariate normal distribution and higher values indicate a more severe departure from normality. The expected Mardia’s kurtosis is p(p+2) for a multivarite normal distribution of p variables. As in the univariate case, values under this expectation indicate platykurtism and higher values indicate leptokurtism.Two statistics,

$$ z_{1,p} = \frac{n}{6}b_{1,p} $$
(10)

and

$$ z_{2,p}=\frac{b_{2,p}-\left[p(p+2)(n-1)\right]/(n+1)}{\sqrt{\left[8p(p+2)\right]/n}} $$
(11)

can be formed for multivariate skewness and kurtosis, respectively. The statistic z 1,p can be compared against the chi-squared distribution \(\chi _{p(p+1)(p+2)/6}^{2}\), and the statistic z 2,p can be compared against the standard normal distribution N(0,1) to test a distribution’s departure from normality. If the test statistic z 1,p is significant, e.g. the p-value is smaller than .05, the joint distribution of the set of p variables has significant skewness; if the test statistic z 2,p is significant, the joint distribution has significant kurtosis. If at least one of these tests is significant, it is inferred that the underlying joint population is nonnormal. As in the univariate case, non-significance does not necessarily imply normality.

Table 1 Univariate skewness and kurtosis

Review of skewness and kurtosis in practical data

Although Micceri (1989) and Blanca et al. (2013) have studied univariate nonnormality, we are not aware of any study that has investigated multivariate skewness and kurtosis with empirical data or has tested the significance of nonnormality. Therefore, we conducted a study to further evaluate the severity of nonnormality of empirical data, especially in the multivariate case. Focusing on published research, we contacted 339 researchers with publications that appeared in Psychological Science from January 2013 to June 2014 and 164 more researchers with publications that appeared in the American Education Research Journal from January 2010 to June 2014. The two journals were chosen due to their prestige in their corresponding fields. We asked the researchers to provide the univariate and multivariate skewness and kurtosis of continuous variables used in their papers. Binary, categorical, and nominal variables were excluded, but likert items were included because they are often treated as normal in data analysis. To help the researchers compute the skewness and kurtosis, we provided a tutorial for different software as we will present later in this paper. Our data collection ended in November, 2014, by which point we had obtained 1,567 univariate measures and 254 multivariate measures of skewness and kurtosis from 194 studies. Some authors submitted univariate results without multivariate results so some variables for which we have univariate measures are not included in a multivariate measure. The median sample size for these studies was 106, and the sample size ranged from 10 to 200,000. The median number of variables included in a multivariate measure was 3, and ranged from 1 to 36. Since researchers had the option to submit skewness and kurtosis anonymously, it is unclear how many authors responded to our request or what their study characteristics may be.

Univariate skewness and kurtosis

As shown in Table 1, univariate skewness ranged from -10.87 to 25.54 and univariate kurtosis from -2.20 to 1,093.48, far wider than previously reported or tested. Because these most extreme values may be outliers, we also report 1st through 99th percentiles of univariate skewness and kurtosis. Percentiles can be interpreted as the percent of samples with lower skewness or kurtosis than that value. There is clearly a large range from the 1st to the 99th percentile, especially for kurtosis. The correlation between sample size and skewness is r=-0.005, and with kurtosis is r=0.025. These are comparable to what Blanca et al. (2013) have reported in which correlations between sample size and skewness and kurtosis were .03 and -.02, respectively. The results in Table 1 include skewness and kurtosis when the sample size is smaller and larger than 106, the median sample size of all collected data. As shown in this table, negative skewness and kurtosis are much more common than previously reported: 38 % of distributions have negative skewness and 47 % have negative kurtosis. This could be due to the number of likert measures provided, but because of the anonymous submission option this cannot be confirmed. The mean univariate skewness is 0.51, and the sample size-weighted mean is 0.47. The mean univariate kurtosis is 4.29, and the sample size-weighted mean is 8.41. Sample size-weighted means are helpful because we expect sample measures to better-reflect that of the population as sample size increases. To account for this, measures from large samples are given higher weight than those from smaller samples. Therefore, on average, the skewness and kurtosis are larger than that of a normal distribution. To further visualize what these distributions look like, Fig. 3 shows histograms of 20 randomly selected distributions from our review. Note that there is no common shape that explains skewness or kurtosis.

Fig. 3
figure 3

Histograms of 20 randomly selected distributions collected for review

Percentages of univariate distributions with significant skewness or kurtosis by sample size are presented in Table 1. About 66 % of univariate distributions had significant skewness and 54 % had significant kurtosis. Almost 74 % of distributions had either significant skewness or kurtosis and were therefore classified as nonnormal. As expected, it becomes easier for tests to become significant with larger sample sizes. Over 95 % of distributions with sample sizes greater than the median sample size, 106, were tested as nonnormal. Conversely, when the sample size was less than 106 only 56 % of distributions were significantly nonnormal.

Multivariate Skewness and Kurtosis

The 254 collected Mardia’s multivariate skewness ranged from 0 to 1,332 and multivariate kurtosis from 1.80 to 1,476. Percentiles of Mardia’s skewness and kurtosis split by median sample size and median number of variables used in their calculation are presented in Table 2. The correlation between sample size and Mardia’s skewness is r=-0.01 and with Mardia’s kurtosis is r=0.02. The correlation between the number of variables and Mardia’s skewness is r=0.58 and with Mardia’s kurtosis is r=0.73. After centering Mardia’s kurtosis on p(p+2), the expected value under normality, the correlation between kurtosis and the number of variables becomes r=0.05. The mean multivariate skewness is 32.94, and the sample size-weighted mean is 28.26. The mean multivariate kurtosis is 78.70, and the sample size-weighted mean is 92.03. Therefore, the average skewness and kurtosis are greater than that of a multivariate normal distribution. This has important ramifications especially for SEM, for which multiple outcome measures are often used and for which multivariate kurtosis can asymptotically affect standard errors.

Table 2 Mardia’s measures by sample size and number of variables

Percentages of multivariate distributions with significant Mardia’s skewness and kurtosis are presented in Table 3. About 58 % of multivariate skewness measures and 57 % of multivariate kurtosis measures reached significance. Combining these, 68 % of multivariate distributions were significantly nonnormal. In particular, 94 % of Mardia’s measures were statistically significant when the sample size was larger than 106. Similarly, more Mardia’s measures became significant with more variables.

Table 3 Percent significant Mardia’s skewness and kurtosis at significance level 0.05

To summarize, based on the test of 1,567 univariate and 254 multivariate skewness and kurtosis from real data, we conclude that 74 % of univariate data and 68 % of multivariate data significantly deviated from a univariate or multivariate normal distribution. In examining only those univariate measures included in a multivariate measure, 68 % have significant nonnormality. Therefore, nonnormality is a severe problem in real data, though multivariate nonnormality does not appear to be a severe problem above and beyond that of univariate normality. However, this relationship requires further study to evaluate.

Influences of skewness and kurtosis

In order to clearly show the influence of skewness and kurtosis, we conducted simulations on the one-sample t-test, simple regression, one-way ANOVA, and confirmatory factor analysis (CFA). Simulation studies are useful because when data is generated from a specific model we know what results the statistical tests should show, and so we can evaluate how nonnormality affects those results. Note that for all of these models, the interest is in the normality of the dependent variable(s). There are no normality assumptions put on the independent variables.

Influence of univariate skewness and kurtosis

Yuan et al. (2005) show that the properties of mean estimates are not affected by either skewness or kurtosis asymptotically, but that the standard error of sample variance is a function of kurtosis. If normality is assumed (kurtosis = 0), the standard error of sample variance will be underestimated when kurtosis is positive and overestimated when kurtosis is negative. In other words, kurtosis will still have an effect on variance estimates at very large sample sizes while mean estimates are only affected in small samples. For example, Yanagihara and Yuan (2005) found that the expectation and variance of the t-statistic depends on skewness, but that the effect lessens as sample size increases.

To concretely demonstrate the influence of univariate skewness and kurtosis, we conducted a simulation study on a one-sample t-test. In the simulation, we set the skewness to the 1st, 5th, 25th, 50th, 75th, 95th, and 99th percentiles of univariate skewness found in our review of practical data. These were tested in sample sizes of the 5th, 25th, 50th, 75th, and 95th percentiles of sample size found in our review. Therefore, these conditions should represent typical results found in our field. Because kurtosis has little influence on the t-test, it was kept at the 99th percentile, 95.75, throughout all conditions. In total, we considered 35 conditions for each test. Under each condition, we generated 10,000 sets of data with mean 0, variance 1, and the specified skewness and kurtosis from a Pearson distribution in R (R Core Team 2016) using the package PearsonDS (Becker and Klößner 2016).Footnote 7 Then, we obtained the empirical type I error rate to reject the null hypothesis that the population mean is equal to 0 using the significance level 0.05 in a two-tailed, a lower-tail, and an upper-tail one-sample t-test.

Table 4 displays the empirical type I error rate for each condition. For brevity, type I error rates of just the lowest sample size are presented for conditions with skewness between -1.17 and 0.94 because these conditions did not present any problems. To better understand the empirical type I error rate, we bold those that are outside of the range [0.025, 0.075]. When the skewness and kurtosis are 0, the generated data are from a normal distribution and the empirical type I error rate is close to 0.05 even when the sample size is as small as 18 for all three tests. When data deviate from normality, the results show that a two-sided test is more robust than a one-sided test. The two-sided test only has increased type I error rate for a skewness of 6.32, for which a sample size of 554 is necessary to dissipate the effect. A lower tail t-test has even higher type 1 error rates at this skewness, and an upper tail t-test has an increased type I error rate with negative skewness and very low rates with high positive skewness.

Table 4 Type I error rates of the one-sample t-test

A simple regression and a one-way ANOVA with three groups were also tested at all of these conditions. The regression was robust to all conditions, even at the lowest sample size. Type 1 error rate for the ANOVA gets as low as 0.022 under the most extreme skewness (6.32) at the lowest sample size (18). However, all other type I error rates were within the [0.025,0.075] robustness range. Type I error rates can increase if each population is from a different distribution, but as long as each distribution has equal variances the departures are still not too severe. Influence on power, however, can be immense (See Levine and Dunlap (1982), for example).

Influence of multivariate skewness and kurtosis

In order to show the influence of multivariate skewness and kurtosis, we conducted simulation studies on CFAs. First, we focus on a one-factor model with four manifest variables. For each manifest variable, the factor loading is fixed at 0.8 and the uniqueness factor variance is 0.36. The variance of the factor is set to 1. Note that the expected Mardia’s kurtosis is p(p + 2). When kurtosis = 24 data are from a multivariate normal distribution, and the centered kurtosis is 0. Although in our review of practical data about half of the data sets had centered Mardia’s kurtosis less than 0, 21 is the only multivariate kurtosis less than 24 we were able to successfully simulate. Hence, we used these two values of Mardia’s kurtosis (21 and 24) along with the 75th, 95th, and 99th percentiles of Mardia’s kurtosis found in our review of practical data of four manifest variables (30, 60, and 100). Sample sizes of 48, 106, 554, and 1489 were used to evaluate these conditions. Because skewness does not influence SEM asymptotically, it was kept at 0 throughout all conditions. In total, 20 conditions were considered. 1,000 data sets were used to evaluate each condition. The authors are currently unaware of any method to simulate data with a particular multivariate skewness and kurtosis, so instead we used the R package lavaan (Rosseel 2012) to simulate data from a model with certain univariate skewness and kurtosis. Appropriate univariate values were found to simulate multivariate values of a population by trial and error.

The influence of skewness and kurtosis is evaluated through the empirical type I error rate of rejecting the factor model using the normal-distribution-based chi-squared goodness-of-fit test. This test is significant when the model does not fit the data. Because the true one-factor model was fit to the simulated data, one would expect the empirical type I error rate to be close to the nominal level 0.05. Deviation from it indicates the influence of skewness and kurtosis. The empirical type I error rates at different levels of Mardia’s kurtosis are summarized in Table 5.

Table 5 Type I error rates of the χ 2 test for factor analysis with 4 manifest variables

The results show that when the data are from a multivariate normal distribution (kurtosis = 24), the empirical type I error rates were close to the nominal level 0.05. However, when the data deviate from a multivariate normal distribution to a Mardia’s kurtosis of 60, the empirical type I error rates are all greater than 0.05. Unsurprisingly, the problem becomes worse with an increase in sample size. For example, when the multivariate kurtosis is 100 and the sample size is 1489, the normal-distribution-based chi-squared test rejects the correct one-factor model 29.8 % of the time.

Type 1 error rates were also obtained in a one-factor model with eight manifest variables and a two-factor model with four manifest variables each to investigate the effects of an increase in the number of manifest variables or number of factors. Factor loadings were adjusted to maintain uniqueness factor variance at 0.36 and total variance at 1. The same conditions were tested as in the simulation study above, with the exception of those with a sample size of 48. The same univariate kurtoses were used to simulate the data, though they result in different multivariate kurtosis for eight variables than they do for four. The resulting empirical type I error rates of these multivariate kurtoses for both of these models are given in Table 6.

Table 6 Empirical Type I error rates of the χ 2 test for factor analysis with 8 manifest variables

Once again, type I error is maintained when the distribution is multivariate normal (kurtosis = 80), but once kurtosis reaches 150 all type I errors are above 0.05. As sample size increases, the problem worsens. In comparison to the results shown in Table 5, type I errors are worse with an increase in the number of manifest variables. However, holding the number of manifest variables constant, an increase in the number of factors lowers type I error rate.

In summary, if either univariate or multivariate nonnormal data are analyzed using normal-distribution-based methods, it will lead to incorrect statistical inference. Given the prevalence of nonnormality as we have shown in the previous section, it is very important to quantify the nonnormality. We suggest using skewness and kurtosis to measure nonnormality and we will show how to obtain both univariate and multivariate skewness and kurtosis in the next section.

Computing univariate and multivariate skewness and kurtosis

In this section, we illustrate how to compute univariate and multivariate skewness and kurtosis in popular statistical software including SAS, SPSS, and R as well as a newly developed Web application. As previously mentioned, different softwares produce different types of univariate skewness and kurtosis. Furthermore, most don’t report tests or multivariate measures. Using our software and macros for SAS, SPSS, and R produces consistent and full results across software. Some software requires macros that can be downloaded from our website at http://w.psychstat.org/nonnormal. Our Web application can be found at http://w.psychstat.org/kurtosis. All tools provided perform listwise deletion before assessing nonnormaltiy. As an example, we use a subset of data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K) to show the use of different software. The ECLS-K is a longitudinal study with data collected in kindergarten in the fall and spring of 1998-99, in 1st grade in the fall and spring of 1999-2000, in 3rd grade in the spring of 2002, in 5th grade in the spring of 2004, and in 8th grade in the spring of 2007. The data used here consist of four consecutive mathematical ability measures of 563 children from kindergarten to 1st grade. To simplify our discussion, we assume that all files to be used are in the folder of “C:\nonnormal”, which needs to be changed accordingly.

SAS

To use SAS for computing the univariate and multivariate skewness and kurtosis, first download the mardia.sas macro file from our website. Our macro was modified from a SAS macro MULTNORM provided by the SAS company. After saving the sas macro file, the following code can be used to get the skewness and kurtosis for the ECLS-K data.Footnote 8

figure a

In the SAS input, Line 1 through Line 4 read the ECLS-K data in the file “eclsk563.txt” into SAS. Line 5 includes the SAS macro file downloaded from our website for use within SAS. The sixth line uses the function mardia in the macro to calculate skewness and kurtosis. The argument “data=” specifies the SAS database to use and “var=” specifies the variables to use in calculating the skewness and kurtosis.

The SAS output from the analysis of the ECLS-K data is given below. The first part of the output, from Line 1 to Line 8, displays the univariate skewness and kurtosis as well as their corresponding standard error. For example, the skewness for the ECLS-K data at time 1 is 0.69 with a standard error 0.10 (Line 5). Based on a z-test, one would conclude that the skewness is significantly larger than 0. For another example, the kurtosis for the data at time 4 is 1.29 witha standard error 0.21 (Line 8), indicating the kurtosis is significantly larger than 0.

The second part of the output, from Line 10 to Line 23 includes the information on multivariate skewness and kurtosis. First, the multivariate skewness is 2.26 (Line 16) with a standardized measure of 212.24 (Line 17). The p-value for a chi-squared test is approximately 0 (Line 18). Therefore, the multivariate skewness is significantly larger than 0. Second, the multivariate kurtosis is 25.47 (Line 21) with the standardized measure of 2.51 (Line 22). The p-value for a z-test is approximately 0.01 (Line 23). Therefore, the multivariate kurtosis is significantly different from that of a multivariate normal distribution with 4 variables (24). Consequently, the data do not follow a multivariate normal distribution and therefore violate the normality assumption if used in multivariate analysis.

figure b

SPSS

DeCarlo (1997b) has developed an SPSS macro to calculate multivariate skewness and kurtosis.Footnote 9 We slightly modified the macro to make the output of univariate skewness and kurtosis consistent to other software. To use the SPSS macro, first download the macro file mardia.sps to your computer from our website. Then, open a script editor (File- >New- >Syntax) within SPSS and include the following SPSS script.

The code on the first eight lines in the input is used to read the ECLS-K data into SPSS. These lines are not necessary if your data are already imported into SPSS. Line 10 gets the SPSS macro into SPSS for use. The function mardia calculates univariate and multivariate skewness and kurtosis for the variables specified by the vars option on Line 11. Note that the folder to the data file and the SPSS macro file needs to be modified to reflect the actual location of them.

figure c

The SPSS output from the analysis of the ECLS-K data is given below. Similar to the SAS output, the first part ofthe output includes univariate skewness and kurtosis and the second part is for the multivariate skewness and kurtosis. SPSS obtained the same skewness and kurtosis as SAS because the same definition for skewness and kurtosis was used.

figure d

R

To use R, first download the R code file mardia.r to your computer from our website. Then, in the editor of R, type the following code. The code on Line 1 gets the ECLS-K data into R and Line 2 provides names for the variables in the data. The third line loads the R function mardia into R. Finally, the last line uses the function mardia to carry out the analysis on Line 4.

figure e

The output from the R analysis is presented below. Clearly, it obtains the same univariate and multivariate skewness and kurtosis as SAS and SPSS.

figure f

Web Application for Skewness and Kurtosis

To further ease the calculation of univariate and multivariate skewness and kurtosis, we also developed a Web application that can work within a Web browser and does not require knowledge of any specific software. The Web application utilizes the R function discussed in the previous section to obtain skewness and kurtosis on a Web server and produces the same results as SAS, SPSS, and R.

figure g

To access the Web application, type the URL http://psychstat.org/kurtosis in a Web browser and a user will see an interface as shown in Fig. 4. To use the Web application, the following information needs to be provided on the interface.

Fig. 4
figure 4

Interface of the Web application

Data

The data file can be chosen by clicking the “Choose File” buttonFootnote 10 and locating the data set of interest on the local computer.

Type of data

The Web application allows commonly used data types such as SPSS, SAS, Excel, and text data. To distinguish the data used, it recognizes the extension names of the data file. For example, a SPSS data file ends with the extension name .sav, a SAS data file with the extension name .sas7bdat, and an Excel data file with the extension name .xls or .xlsx. In addition, a CSV file (comma separated value data file) with the extension name .csv and a TXT file (text file) with the extension name .txt can also be used. If a .csv or .txt file is used, the user needs to specify whether variable names are included in the file. For Excel data, it requires the first row of the data file to be the variable names.

Select variables to be used

Skewness and kurtosis can be calculated on either all the variables or a subset of variables in the data. To use all the variables, leave this field blank. To select a subset of variables, provide the column numbers separated by comma “,”. Consecutive variables can be specified using “-”. For example, 1, 2-5, 7-9, 11 will select variables 1, 2, 3, 4, 5, 7, 8, 9, 11.

Missing data

Missing data are allowed in the data although they will be removed before the calculation of skewness and kurtosis. This field should be left blank if the data file has no missing values. If multiple values are used to denote missing data, they can be specified all together separated by a comma (,). For example, -999, -888, NA will specify all three values as missing data.

After providing the required information, clicking the “Calculate” button will start the calculation of skewness and kurtosis. The output of the analysis is provided below. The output is identical to the R output except for the variable names for univariate skewness and kurtosis. This is because by default the variable names are constructed using “V” and an integer in R.

Discussion and recommendations

The primary goals of this study were to assess the prevalence of nonnormality in recent psychology and education publications and its influence on statistical inference, as well as to provide a software tutorial on how to compute univariate and multivariate skewness and kurtosis. First, nonnormality clearly exists in real data. Based on the test of skewness and kurtosis of data from 1,567 univariate variables, much more than tested in previous reviews, we found that 74 % of either skewness or kurtosis were significantly different from that of a normal distribution. Furthermore, 68 % of 254 multivariate data sets had significant Mardia’s multivariate skewness or kurtosis. Our results together with those of Micceri (1989) and Blanca et al. (2013) strongly suggest the prevalence of nonnormality in real data.

Our investigation on the influence of skewness and kurtosis involved simulation studies on the one-sample t-test and factor analysis. Through simulation, we concretely showed that nonnormality, as measured by skewness and kurtosis, exerted great influence on statistical tests that bear the normality assumption. For example, the use of the t-test incorrectly rejected a null hypothesis 17 % of the time and the chi-squared test incorrectly rejected a correct factor model 30 % of the time under some conditions. Therefore, nonnormality can cause severe problems. For example, a significant result might be simply an artificial effect caused by nonnormality.

Given the prevalence of nonnormality and its influence on statistical inference, it is critical to report statistics such as skewness and kurtosis to understand the violation of normality. Thus, we highly recommend that journal editors and reviewers encourage authors to report skewness and kurtosis in their papers. In Table 7, we list the summary statistics that are critical to different statistical methods in empirical data analysis. For example, mean comparisons would be influenced by skewness while factor analysis is more influenced by kurtosis. To facilitate the report of univariate and multivariate skewness and kurtosis, we have provided SAS, SPSS, and R code as well as a Web application to compute them.

Table 7 Critical summary statistics for different methods in empirical research

Once nonnormality has been identified as a problem, the main options for handling it in a statistical analysis include transformation, nonparametric methods, and robust analysis. Transforming data so that they are closer to normally distributed is a relatively easy option, because after transformation the researcher can proceed with whichever normality-based method they desire. In psychology, log transformation is a common way to get rid of positive skewness, for example. More generally, the class of Box-Cox transformations (Box and Cox 1964) is also very popular because it’s easy to use and can accomodate many types of nonnormality. However, it has been suggested that Box-Cox and other transformations seldom maintain linearity, normality, and homoscedasticity simultaneously (Sakia 1992, for example), and even if transformation is successful the resulting parameter estimates might be difficult to interpret.

Corder and Foreman (2014) offer an easy-to-follow review of nonparametric techniques, including the Mann–Whitney U-test, Kruskal-Wallis test, and Spearman rank order correlation, among others. The basic premise of most of these methods is to perform analysis on ranks rather than the raw data. This is, of course, a more robust procedure than assuming normality of raw data, but can be less powerful in some circumstances and the results can be less meaningful. However, for data that is already ordinal or ranked these methods can be very useful, and can still be advantageous in other circumstances, as well.

Robust analysis can work better than transformation and non-parametric methods in many situations, though historically it has also been the most difficult to conduct. Robust analysis can address three points of concern: parameter estimates, standard errors of those estimates, and test statistics. Within the context of SEM the methods that perform best in dealing with each of these issues, respectively, are robust estimation using Huber-type weights (Huber 1967), sandwich-type standard errors, and a rescaled chi-squared statistic following robust estimation (Yuan and Bentler 1998). Alternatively, one can also obtain sandwich-type standard errors and rescaled test statistics following normal distribution-based maximum likelihood (Satorra and Bentler 1988), and such a procedure will still yield consistent results when data are of heavier tails but do not contain outliers.

Recently, some software packages have begun to include these procedures, making robust analysis a much easier option than it has ever been before. Currently, EQS (Multivariate Software, Inc.), WebSEM (Zhang and Yuan 2012), and the R package rsem (Yuan and Zhang 2012) are the only softwares to offer truly robust methods that address all three points of concern (parameter estimates, standard errors, and test statistics), and WebSEM and rsem offer them for free. Additionally, WebSEM has a user-friendly interface in which researchers can draw the path diagram they wish to fit.

As shown in Fig. 3, there is no common distribution of practical data in psychology and education. With such diversity in data shapes and research goals, it is impossible to create one universal solution. However, we hope that through this paper we were able to elucidate the problem through our review of practical data and simulation and offer some feasible recommendations to researchers in our field. It is our hope that researchers begin to take nonnormality seriously and start to report them along with means and variances that have already been established in data analysis. We believe that reporting skewness and kurtosis in conjunction with moving toward robust analysis offer two high-impact changes that can be made in the literature at this time. These actions will not only increase the transparency of data analysis, but will also encourage quantitative methodologists to develop better techniques to deal with nonnormality, improve statistical practices and conclusions in empirical analysis, and increase awareness and knowledge of the nonormality problem for all researchers in our field.