Bayesian Testing of Constrained Hypotheses

Mulder, Joris

doi:10.1007/978-3-319-26633-6_9

Joris Mulder⁵

Part of the book series: Human–Computer Interaction Series ((HCIS))

4714 Accesses

Abstract

Statistical hypothesis testing plays a central role in applied research to determine whether theories or expectations are supported by the data or not. Such expectations are often formulated using order constraints. For example an executive board may expect that sales representatives who wear a smart watch will respond faster to their emails than sales representatives who don’t wear a smart watch. In addition it may be expected that this difference becomes more pronounced over time because representatives need to learn how to use the smart watch effectively. By translating these expectations into statistical hypotheses with equality and/or order constraints we can determine whether the expectations receive evidence from the data. In this chapter we show how a Bayesian statistical approach can effectively be used for this purpose. This Bayesian approach is more flexible than the traditional p-value test in the sense that multiple hypotheses with equality as well as order constraints can be tested against each other in a direct manner. The methodology can straightforwardly be used by practitioners using the freely downloadable software package BIEMS. An application in a human-computer interaction is used for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that the Jeffreys prior is not a proper probability distribution (it is improper). This implies that it does not integrate to 1. Improper priors can be used in Bayesian estimation when there is enough information in the data to obtain a proper posterior. In this application this is the case when $n_A+n_B\ge 5$ (i.e., $P=3$ plus the number of groups/teams), and $n_A,n_B\ge 2$.

References

Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) 2nd International Symposium on Information Theory, Akademiai Kiado, Budapest, pp 267–281
Google Scholar
Barlow R, Bartholomew D, Bremner J, Brunk H (1972) Statistical inference under order restrictions. John Wiley, New York
MATH Google Scholar
Berger JO, Pericchi LR (1996) The intrinsic Bayes factor for model selection and prediction. J Am Stat Assoc 91:109–122
Article MathSciNet MATH Google Scholar
Braeken J, Mulder J, Wood S (2015) Relative effects at work: bayes factors for order hypotheses. J Manage 41:544–573
Google Scholar
Dickey J (1971) The weighted likelihood ratio, linear hypotheses on normal location parameters. Ann Stat 42:204–223
Article MathSciNet MATH Google Scholar
Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis, 2nd edn. Chapman & Hall, London
MATH Google Scholar
Gu X, Mulder J, Decovic M, Hoijtink H (2014) Bayesian evaluation of inequality constrained hypotheses. Psychol Methods 19:511–527
Article Google Scholar
Hoijtink H (2011) Informative hypotheses: theory and practice for behavioral and social scientists. Chapman & Hall, CRC, New York
Book Google Scholar
Hubbard R, Armstrong J (2006) Why we don’t really know what ’statistical significance’ means: a major educational failure. J Mark Educ 28:114–120
Article Google Scholar
Jeffreys H (1961) Theory of Probability, 3rd edn. Oxford University Press, New York
Google Scholar
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795
Article MathSciNet MATH Google Scholar
Kato BS, Hoijtink H (2006) A Bayesian approach to inequality constrained linear mixed models: estimation and model selection. Stat Model 6:231–249
Article MathSciNet Google Scholar
Klugkist I, Laudy O, Hoijtink H (2005) Inequality constrained analysis of variance: a Bayesian approach. Psychol Methods 10:477–493
Article Google Scholar
Liang F, Paulo R, Molina G, Clyde MA, Berger JO (2008) Mixtures of $g$ priors for Bayesian variable selection. J Am Stat Assoc 103(481):410–423
Article MathSciNet MATH Google Scholar
Lynch SM (2007) Introduction to applied Bayesian statistics and estimation for social scientists. Springer Science & Business Media
Google Scholar
Mulder J (In Press) Bayes factors for testing order-constrained hypotheses on correlations. J Math Psychol
Google Scholar
Mulder J, Klugkist I, van de Schoot A, Meeus W, Selfhout M, Hoijtink H (2009) Bayesian model selection of informative hypotheses for repeated measurements. J Math Psychol 53:530–546
Article MathSciNet MATH Google Scholar
Mulder J, Hoijtink H, Klugkist I (2010) Equality and inequality constrained multivariate linear models: objective model selection using constrained posterior priors. J Stat Plan Inference 140:887–906
Article MathSciNet MATH Google Scholar
Mulder J (2014) Prior adjusted default Bayes factors for testing (in)equality constrained hypotheses. Comput Stat Data Anal 71:448–463
Article MathSciNet Google Scholar
Mulder J, Hoijtink H, de Leeuw C (2012) Biems: a fortran 90 program for calculating Bayes factors for inequality and equality constrained model. J Stat Softw 46
Google Scholar
O’Hagan A (1995) Fractional Bayes factors for model comparison (with discussion). J Roy Stat Soc Ser B 57:99–138
MathSciNet MATH Google Scholar
Robertson T, Wright FT, Dykstra R (1988) Order restricted statistical inference. Wiley, New York
MATH Google Scholar
Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet MATH Google Scholar
Sellke T, Bayarri MJ, Berger JO (2001) Calibration of $p$ values for testing precise null hypotheses. Am Stat 55(1):62–71
Article MathSciNet MATH Google Scholar
Silvapulle MJ, Sen PK (2004) Constrained statistical inference: inequality, order, and shape restrictions, 2nd edn. Wiley, Hoboken
MATH Google Scholar
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit. J Roy Stat Soc Ser B 64(2):583–639
Article MathSciNet MATH Google Scholar
van de Schoot R, Hoijtink H, Romeijn J-W, Brugman D (2011) A prior predictive loss function for the evaluation of inequality constrained hypotheses. J Math Psychol 16:225–237
MATH Google Scholar
van de Schoot R, Hoijtink H, Hallquist MN (2012) Bayesian evaluation of inequality-constrained hypotheses in sem models using mplus. Struct Equ Model A Multi J 19:593–609
Article MathSciNet Google Scholar
Verdinelli I, Wasserman L (1995) Computing bayes factors using a generalization of the savage-dickey density ratio. J Am Stat Assoc 90:614–618
Article MathSciNet MATH Google Scholar
Wagenmakers E-J (2007) A practical solution to the pervasive problem of p values. Psychon Bull Rev 14:779–804
Article Google Scholar
Wetzels R, Grasman RPPP, Wagenmakers EJ (2010) An encompassing prior generalization of the Savage-Dickey density ratio test. Comput Stat Data Anal 38:666–690
MathSciNet MATH Google Scholar
Wetzels R, Matzke D, Lee M, Rounder JN, Yverson GJ, Wagenmakers EJ (2011) Statistical evidence in experimental psychology: an empirical comparison using 855 t tests. Perspect Psychol Sci 6:291–298
Article Google Scholar
Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with $g$-prior distributions, pp 233–243. Elsevier, Amsterdam, North-Holland
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Methodology and Statistics, Tilburg University, Tilburg, The Netherlands
Joris Mulder

Authors

Joris Mulder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joris Mulder .

Editor information

Editors and Affiliations

Moray School of Education, Edinburgh University, Edinburgh, United Kingdom
Judy Robertson
Donders Centre for Cognition, Radboud University Nijmegen, Tilburg, The Netherlands
Maurits Kaptein

Appendices

Appendix 1: Gibbs sampler (theory)

We consider the general case of P repeated measurements. In the example discussed above, P was equal to 3. The following semi-conjugate prior is used for the model parameters,

$$\begin{aligned} p(\varvec{\mu }_A,\varvec{\mu }_B,\varvec{\varSigma })= & {} p(\varvec{\mu }_A)\times p(\varvec{\mu }_B)\times p(\varvec{\varSigma })\\ \nonumber\propto & {} N_{\varvec{\mu }_A}(\mathbf m _{A0},\mathbf S _{A0}) \times N_{\varvec{\mu }_A}(\mathbf m _{A0},\mathbf S _{A0})\times |\varvec{\varSigma }|^{-\frac{P+1}{2}}, \end{aligned}$$

(9.17)

where $\mathbf m _{A0}$ and $\mathbf m _{B0}$ are the prior mean of $\varvec{\mu }_A$ and $\varvec{\mu }_B$, respectively, and $\mathbf S _{A0}$ and $\mathbf S _{B0}$ the respective prior covariance matrices of $\varvec{\mu }_A$ and $\varvec{\mu }_B$.

The data are stored in the $(n_A+n_B)\times P$ data matrix $\mathbf Y =[\mathbf Y _A'~\mathbf Y _B']'$, where the ith row of $\mathbf Y $ contains the P measurements of ith sales executive and the first $n_A$ rows correspond to the responses of the executives in team A and the remaining $n_B$ rows contain the responses of the executives of team B. The likelihood of the data can be written as

$$\begin{aligned} p(\mathbf Y |\varvec{\mu }_A,\varvec{\mu }_B,\varvec{\varSigma })= & {} \prod _{i=1}^{n_A} p(\mathbf y _i|\varvec{\mu }_A,\varvec{\varSigma }) \times \prod _{i=n_A+1}^{n_A+n_B} p(\mathbf y _i|\varvec{\mu }_B,\varvec{\varSigma })\\\propto & {} N_{\varvec{\mu }_A|\varvec{\varSigma }}(\bar{\mathbf{y }}_A,\varvec{\varSigma }/n_A)\times N_{\varvec{\mu }_A|\varvec{\varSigma }}(\bar{\mathbf{y }}_B,\varvec{\varSigma }/n_B)\times IW_{\varvec{\varSigma }}(\mathbf S ,n_A+n_B-2), \end{aligned}$$

where $\bar{\mathbf{y }}_A$ and $\bar{\mathbf{y }}_B$ denote the sample means of team A and team B over the P measurements, the sums of squares equal

$$ \mathbf S =(\mathbf Y _A-\mathbf 1 _{n_A}\bar{\mathbf{y }}_A')'(\mathbf Y _A-\mathbf 1 _{n_A}\bar{\mathbf{y }}_A')+(\mathbf Y _B-\mathbf 1 _{n_B}\bar{\mathbf{y }}_B')'(\mathbf Y _B-\mathbf 1 _{n_B}\bar{\mathbf{y }}_B'), $$

and $IW_{\varvec{\varSigma }}(\mathbf S ,n)$ denotes an inverse Wishart probability density for $\varvec{\varSigma }$. Note that the likelihood function of $\varvec{\varSigma }$ given $\varvec{\mu }_A$ and $\varvec{\mu }_B$ is proportional to an inverse Wishart density $IW(\mathbf S _{\varvec{\mu }},n_A+n_B)$, where $\mathbf S _{\varvec{\mu }}=(\mathbf Y _A-\mathbf 1 _{n_A}\varvec{\mu }_A')'(\mathbf Y _A-\mathbf 1 _{n_A}\varvec{\mu }_A')+(\mathbf Y _B-\mathbf 1 _{n_B}\varvec{\mu }_B')'(\mathbf Y _B-\mathbf 1 _{n_B}\varvec{\mu }_B')$. These results can be found in most classic Bayesian text books, such as Gelman et al. (2004), for example.

Because the prior in (9.17) is semi-conjugate, the conditional posterior distributions of each model paramater given the other parameters have known distributions from which we can easily sample,

$$\begin{aligned} p(\varvec{\mu }_A|\mathbf Y ,\varvec{\varSigma })= & {} N\left( \left( \mathbf S _{A0}^{-1}+n_A\varvec{\varSigma }^{-1}\right) ^{-1}\left( \mathbf S _{A0}^{-1}{} \mathbf m _{A0}+n_A\varvec{\varSigma }^{-1}\bar{\mathbf{y }}_A \right) ,\left( \mathbf S _{A0}^{-1}+n_A\varvec{\varSigma }^{-1}\right) ^{-1}\right) \\ p(\varvec{\mu }_B|\mathbf Y ,\varvec{\varSigma })= & {} N\left( \left( \mathbf S _{B0}^{-1}+n_B\varvec{\varSigma }^{-1}\right) ^{-1}\left( \mathbf S _{B0}^{-1}{} \mathbf m _{B0}+n_A\varvec{\varSigma }^{-1}\bar{\mathbf{y }}_B \right) ,\left( \mathbf S _{B0}^{-1}+n_B\varvec{\varSigma }^{-1}\right) ^{-1}\right) \\ p(\varvec{\varSigma }|\mathbf Y ,\varvec{\mu }_A,\varvec{\mu }_B)= & {} IW(\mathbf S _{\varvec{\mu }},n_A+n_B). \end{aligned}$$

We can use a Gibbs sampler to get a sample from the joint posterior of $(\varvec{\mu }_A,\varvec{\mu }_B,\varvec{\varSigma })$. In a Gibbs sampler we sequentially draw each model parameter from its conditional posterior given the remaining parameters. The Gibbs sampler algorithm can be written as

1.
Set initial values for the model parameters: $\varvec{\mu }_A^{(0)}$, $\varvec{\mu }_B^{(0)}$, and $\varvec{\varSigma }^{(0)}$.
2.
Draw $\varvec{\mu }_A^{(s)}$ from its conditional posterior $p(\varvec{\mu }_A|\mathbf Y ,\varvec{\varSigma }^{(s-1)})$.
3.
Draw $\varvec{\mu }_B^{(s)}$ from its conditional posterior $p(\varvec{\mu }_B|\mathbf Y ,\varvec{\varSigma }^{(s-1)})$.
4.
Draw $\varvec{\varSigma }^{(s)}$ from its conditional posterior $p(\varvec{\varSigma }|\mathbf Y ,\varvec{\mu }_A^{(s)},\varvec{\mu }_A^{(s)})$.
5.
Repeat steps 2–4 for $s=1,\ldots ,S$.

In the software program R, drawing from a multivariate normal distribution can be done using the function ‘rmvnorm’ in the ‘mvtnorm’-package and drawing from an inverse Wishart distribution can be done using the function ‘riwish’ in the ‘MCMCpack’-package.

It may be that the initial values, $\varvec{\mu }_A^{(0)}$, $\varvec{\mu }_B^{(0)}$, and $\varvec{\varSigma }^{(0)}$, are chosen far away from the subspace where the posterior is concentrated. If this is the case, a burn-in period of, say, 100 draws is needed. After the burn-in period convergence is reached and the remaining draws come from the actual posterior of the model parameters.

Appendix 2: Gibbs sampler (R code)

Conditional posteriors for $\varvec{\mu }_A$, $\varvec{\mu }_B$, and $\varvec{\varSigma }$.

Gibbs sampler

Generate data matrix Y

Compute classical estimates

Set priors for Gibbs sampler

Run Gibbs sampler

Compute descriptive statistics from Gibbs output

Create data matrix for BIEMS

Appendix 3: Derivation of the Bayes factor

The Bayes factor is derived for a one-sided hypothesis $H_1:\delta <0$ versus the unconstrained hypothesis $H_u:\delta \in \mathbb {R}$. In the encompassing prior approach, the prior under $H_1$, $p_1(\delta ,\sigma ^2)$, is a truncation of the unconstrained (or encompassing) prior under $H_u$, $p_u(\delta ,\sigma ^2)$, in the region where $\delta <0$, i.e., $p_1(\delta ,\sigma ^2)=p_u(\delta ,\sigma ^2)I(\delta <0)/\text{ Pr }(\delta <0|H_u)$, where the prior probability $\text{ Pr }(\delta <0|H_u)=\int _{\delta <0}p_u(\delta )d\delta $, where $I(\cdot )$ is the indicator function. Note that $\text{ Pr }(\delta <0|H_u)=\frac{1}{2}$ if the unconstrained prior is centered at 0, such as $p_u(\delta )=N(0,\sigma _0^2)$. Also note that the likelihood under $H_1$ is a truncation of the likelihood under $H_u$, i.e., $p_1(\mathbf y |\delta ,\sigma ^2)=p_u(\mathbf y |\delta ,\sigma ^2)I(\delta <0)$. For this reason we can omit the hypothesis index u in the likelihood functions in the derivation below. The Bayes factor of $H_1$ versus $H_u$ can then be derived as follows

$$\begin{aligned} \nonumber B_{1u}= & {} \frac{\iint _{\delta <0}p(\mathbf y |\delta ,\sigma ^2)p_1(\delta ,\sigma ^2) d \delta d\sigma ^2}{\iint p(\mathbf y |\delta ,\sigma ^2)p_u(\delta ,\sigma ^2)d \delta d \sigma ^2}\\ \nonumber= & {} \frac{1}{\text{ Pr }(\delta <0|H_u)}\iint _{\delta <0} \frac{p(\mathbf y |\delta ,\sigma ^2)p_u(\delta ,\sigma ^2)}{\iint p(\mathbf y |\delta ,\sigma ^2)p_u(\delta ,\sigma ^2)d \delta d \sigma ^2}d \delta d \sigma ^2 \\ \nonumber= & {} \frac{1}{\text{ Pr }(\delta <0|H_u)}\iint _{\delta <0} p_u(\delta ,\sigma ^2|\mathbf y )d \delta d \sigma ^2\\= & {} \frac{\text{ Pr }(\delta <0|\mathbf y ,H_u)}{\text{ Pr }(\delta <0|H_u)}, \end{aligned}$$

which corresponds to (9.15), where $\text{ Pr }(\delta <0|\mathbf y ,H_u)=\iint _{\delta <0}p_{u}(\delta |\mathbf y ) d\delta d\sigma ^2$ is the posterior probability that the constraints hold under $H_u$. For $H_2:\delta >0$ versus the unconstrained hypothesis $H_u:\delta \in \mathbb {R}$ we can follow the same steps to obtain (9.16).

For $H_0:\delta =0$ versus the unconstrained hypothesis $H_u:\delta \in \mathbb {R}$, the encompassing prior approach implies that $p_0(\sigma ^2)=p_u(\sigma ^2|\delta =0)$. Consequently,

$$\begin{aligned} \nonumber B_{0u}= & {} \frac{\int p(\mathbf y |\delta =0,\sigma ^2)p_0(\sigma ^2) d\sigma ^2}{\iint p(\mathbf y |\delta ,\sigma ^2)p_u(\delta ,\sigma ^2)d\delta d \sigma ^2}\\= & {} \frac{\int p(\mathbf y |\delta =0,\sigma ^2)p_u(\sigma ^2|\delta =0) d\sigma ^2}{\iint p(\mathbf y |\delta ,\sigma ^2)p_u(\delta ,\sigma ^2)d \delta d \sigma ^2} \\= & {} \frac{1}{p_u(\delta =0)}\int \frac{p(\mathbf y |\delta =0,\sigma ^2)p_u(\delta =0,\sigma ^2)}{\iint p(\mathbf y |\delta ,\sigma ^2)p_u(\delta ,\sigma ^2)d \delta d \sigma ^2} d\sigma ^2\\= & {} \frac{1}{p_u(\delta =0)}\int p_u(\delta =0,\sigma ^2|\mathbf y ) d\sigma ^2\\= & {} \frac{p_u(\delta =0|\mathbf y )}{p_u(\delta =0)}, \end{aligned}$$

which is equal to the Savage-Dickey density ratio in (9.14).

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mulder, J. (2016). Bayesian Testing of Constrained Hypotheses. In: Robertson, J., Kaptein, M. (eds) Modern Statistical Methods for HCI. Human–Computer Interaction Series. Springer, Cham. https://doi.org/10.1007/978-3-319-26633-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-26633-6_9
Published: 23 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26631-2
Online ISBN: 978-3-319-26633-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics