Skip to main content

Bayesian Testing of Constrained Hypotheses

  • Chapter
  • First Online:
Modern Statistical Methods for HCI

Part of the book series: Human–Computer Interaction Series ((HCIS))

  • 4714 Accesses

Abstract

Statistical hypothesis testing plays a central role in applied research to determine whether theories or expectations are supported by the data or not. Such expectations are often formulated using order constraints. For example an executive board may expect that sales representatives who wear a smart watch will respond faster to their emails than sales representatives who don’t wear a smart watch. In addition it may be expected that this difference becomes more pronounced over time because representatives need to learn how to use the smart watch effectively. By translating these expectations into statistical hypotheses with equality and/or order constraints we can determine whether the expectations receive evidence from the data. In this chapter we show how a Bayesian statistical approach can effectively be used for this purpose. This Bayesian approach is more flexible than the traditional p-value test in the sense that multiple hypotheses with equality as well as order constraints can be tested against each other in a direct manner. The methodology can straightforwardly be used by practitioners using the freely downloadable software package BIEMS. An application in a human-computer interaction is used for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that the Jeffreys prior is not a proper probability distribution (it is improper). This implies that it does not integrate to 1. Improper priors can be used in Bayesian estimation when there is enough information in the data to obtain a proper posterior. In this application this is the case when \(n_A+n_B\ge 5\) (i.e., \(P=3\) plus the number of groups/teams), and \(n_A,n_B\ge 2\).

References

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) 2nd International Symposium on Information Theory, Akademiai Kiado, Budapest, pp 267–281

    Google Scholar 

  • Barlow R, Bartholomew D, Bremner J, Brunk H (1972) Statistical inference under order restrictions. John Wiley, New York

    MATH  Google Scholar 

  • Berger JO, Pericchi LR (1996) The intrinsic Bayes factor for model selection and prediction. J Am Stat Assoc 91:109–122

    Article  MathSciNet  MATH  Google Scholar 

  • Braeken J, Mulder J, Wood S (2015) Relative effects at work: bayes factors for order hypotheses. J Manage 41:544–573

    Google Scholar 

  • Dickey J (1971) The weighted likelihood ratio, linear hypotheses on normal location parameters. Ann Stat 42:204–223

    Article  MathSciNet  MATH  Google Scholar 

  • Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis, 2nd edn. Chapman & Hall, London

    MATH  Google Scholar 

  • Gu X, Mulder J, Decovic M, Hoijtink H (2014) Bayesian evaluation of inequality constrained hypotheses. Psychol Methods 19:511–527

    Article  Google Scholar 

  • Hoijtink H (2011) Informative hypotheses: theory and practice for behavioral and social scientists. Chapman & Hall, CRC, New York

    Book  Google Scholar 

  • Hubbard R, Armstrong J (2006) Why we don’t really know what ’statistical significance’ means: a major educational failure. J Mark Educ 28:114–120

    Article  Google Scholar 

  • Jeffreys H (1961) Theory of Probability, 3rd edn. Oxford University Press, New York

    Google Scholar 

  • Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795

    Article  MathSciNet  MATH  Google Scholar 

  • Kato BS, Hoijtink H (2006) A Bayesian approach to inequality constrained linear mixed models: estimation and model selection. Stat Model 6:231–249

    Article  MathSciNet  Google Scholar 

  • Klugkist I, Laudy O, Hoijtink H (2005) Inequality constrained analysis of variance: a Bayesian approach. Psychol Methods 10:477–493

    Article  Google Scholar 

  • Liang F, Paulo R, Molina G, Clyde MA, Berger JO (2008) Mixtures of \(g\) priors for Bayesian variable selection. J Am Stat Assoc 103(481):410–423

    Article  MathSciNet  MATH  Google Scholar 

  • Lynch SM (2007) Introduction to applied Bayesian statistics and estimation for social scientists. Springer Science & Business Media

    Google Scholar 

  • Mulder J (In Press) Bayes factors for testing order-constrained hypotheses on correlations. J Math Psychol

    Google Scholar 

  • Mulder J, Klugkist I, van de Schoot A, Meeus W, Selfhout M, Hoijtink H (2009) Bayesian model selection of informative hypotheses for repeated measurements. J Math Psychol 53:530–546

    Article  MathSciNet  MATH  Google Scholar 

  • Mulder J, Hoijtink H, Klugkist I (2010) Equality and inequality constrained multivariate linear models: objective model selection using constrained posterior priors. J Stat Plan Inference 140:887–906

    Article  MathSciNet  MATH  Google Scholar 

  • Mulder J (2014) Prior adjusted default Bayes factors for testing (in)equality constrained hypotheses. Comput Stat Data Anal 71:448–463

    Article  MathSciNet  Google Scholar 

  • Mulder J, Hoijtink H, de Leeuw C (2012) Biems: a fortran 90 program for calculating Bayes factors for inequality and equality constrained model. J Stat Softw 46

    Google Scholar 

  • O’Hagan A (1995) Fractional Bayes factors for model comparison (with discussion). J Roy Stat Soc Ser B 57:99–138

    MathSciNet  MATH  Google Scholar 

  • Robertson T, Wright FT, Dykstra R (1988) Order restricted statistical inference. Wiley, New York

    MATH  Google Scholar 

  • Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Sellke T, Bayarri MJ, Berger JO (2001) Calibration of \(p\) values for testing precise null hypotheses. Am Stat 55(1):62–71

    Article  MathSciNet  MATH  Google Scholar 

  • Silvapulle MJ, Sen PK (2004) Constrained statistical inference: inequality, order, and shape restrictions, 2nd edn. Wiley, Hoboken

    MATH  Google Scholar 

  • Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit. J Roy Stat Soc Ser B 64(2):583–639

    Article  MathSciNet  MATH  Google Scholar 

  • van de Schoot R, Hoijtink H, Romeijn J-W, Brugman D (2011) A prior predictive loss function for the evaluation of inequality constrained hypotheses. J Math Psychol 16:225–237

    MATH  Google Scholar 

  • van de Schoot R, Hoijtink H, Hallquist MN (2012) Bayesian evaluation of inequality-constrained hypotheses in sem models using mplus. Struct Equ Model A Multi J 19:593–609

    Article  MathSciNet  Google Scholar 

  • Verdinelli I, Wasserman L (1995) Computing bayes factors using a generalization of the savage-dickey density ratio. J Am Stat Assoc 90:614–618

    Article  MathSciNet  MATH  Google Scholar 

  • Wagenmakers E-J (2007) A practical solution to the pervasive problem of p values. Psychon Bull Rev 14:779–804

    Article  Google Scholar 

  • Wetzels R, Grasman RPPP, Wagenmakers EJ (2010) An encompassing prior generalization of the Savage-Dickey density ratio test. Comput Stat Data Anal 38:666–690

    MathSciNet  MATH  Google Scholar 

  • Wetzels R, Matzke D, Lee M, Rounder JN, Yverson GJ, Wagenmakers EJ (2011) Statistical evidence in experimental psychology: an empirical comparison using 855 t tests. Perspect Psychol Sci 6:291–298

    Article  Google Scholar 

  • Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with \(g\)-prior distributions, pp 233–243. Elsevier, Amsterdam, North-Holland

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joris Mulder .

Editor information

Editors and Affiliations

Appendices

Appendix 1: Gibbs sampler (theory)

We consider the general case of P repeated measurements. In the example discussed above, P was equal to 3. The following semi-conjugate prior is used for the model parameters,

$$\begin{aligned} p(\varvec{\mu }_A,\varvec{\mu }_B,\varvec{\varSigma })= & {} p(\varvec{\mu }_A)\times p(\varvec{\mu }_B)\times p(\varvec{\varSigma })\\ \nonumber\propto & {} N_{\varvec{\mu }_A}(\mathbf m _{A0},\mathbf S _{A0}) \times N_{\varvec{\mu }_A}(\mathbf m _{A0},\mathbf S _{A0})\times |\varvec{\varSigma }|^{-\frac{P+1}{2}}, \end{aligned}$$
(9.17)

where \(\mathbf m _{A0}\) and \(\mathbf m _{B0}\) are the prior mean of \(\varvec{\mu }_A\) and \(\varvec{\mu }_B\), respectively, and \(\mathbf S _{A0}\) and \(\mathbf S _{B0}\) the respective prior covariance matrices of \(\varvec{\mu }_A\) and \(\varvec{\mu }_B\).

The data are stored in the \((n_A+n_B)\times P\) data matrix \(\mathbf Y =[\mathbf Y _A'~\mathbf Y _B']'\), where the ith row of \(\mathbf Y \) contains the P measurements of ith sales executive and the first \(n_A\) rows correspond to the responses of the executives in team A and the remaining \(n_B\) rows contain the responses of the executives of team B. The likelihood of the data can be written as

$$\begin{aligned} p(\mathbf Y |\varvec{\mu }_A,\varvec{\mu }_B,\varvec{\varSigma })= & {} \prod _{i=1}^{n_A} p(\mathbf y _i|\varvec{\mu }_A,\varvec{\varSigma }) \times \prod _{i=n_A+1}^{n_A+n_B} p(\mathbf y _i|\varvec{\mu }_B,\varvec{\varSigma })\\\propto & {} N_{\varvec{\mu }_A|\varvec{\varSigma }}(\bar{\mathbf{y }}_A,\varvec{\varSigma }/n_A)\times N_{\varvec{\mu }_A|\varvec{\varSigma }}(\bar{\mathbf{y }}_B,\varvec{\varSigma }/n_B)\times IW_{\varvec{\varSigma }}(\mathbf S ,n_A+n_B-2), \end{aligned}$$

where \(\bar{\mathbf{y }}_A\) and \(\bar{\mathbf{y }}_B\) denote the sample means of team A and team B over the P measurements, the sums of squares equal

$$ \mathbf S =(\mathbf Y _A-\mathbf 1 _{n_A}\bar{\mathbf{y }}_A')'(\mathbf Y _A-\mathbf 1 _{n_A}\bar{\mathbf{y }}_A')+(\mathbf Y _B-\mathbf 1 _{n_B}\bar{\mathbf{y }}_B')'(\mathbf Y _B-\mathbf 1 _{n_B}\bar{\mathbf{y }}_B'), $$

and \(IW_{\varvec{\varSigma }}(\mathbf S ,n)\) denotes an inverse Wishart probability density for \(\varvec{\varSigma }\). Note that the likelihood function of \(\varvec{\varSigma }\) given \(\varvec{\mu }_A\) and \(\varvec{\mu }_B\) is proportional to an inverse Wishart density \(IW(\mathbf S _{\varvec{\mu }},n_A+n_B)\), where \(\mathbf S _{\varvec{\mu }}=(\mathbf Y _A-\mathbf 1 _{n_A}\varvec{\mu }_A')'(\mathbf Y _A-\mathbf 1 _{n_A}\varvec{\mu }_A')+(\mathbf Y _B-\mathbf 1 _{n_B}\varvec{\mu }_B')'(\mathbf Y _B-\mathbf 1 _{n_B}\varvec{\mu }_B')\). These results can be found in most classic Bayesian text books, such as Gelman et al. (2004), for example.

Because the prior in (9.17) is semi-conjugate, the conditional posterior distributions of each model paramater given the other parameters have known distributions from which we can easily sample,

$$\begin{aligned} p(\varvec{\mu }_A|\mathbf Y ,\varvec{\varSigma })= & {} N\left( \left( \mathbf S _{A0}^{-1}+n_A\varvec{\varSigma }^{-1}\right) ^{-1}\left( \mathbf S _{A0}^{-1}{} \mathbf m _{A0}+n_A\varvec{\varSigma }^{-1}\bar{\mathbf{y }}_A \right) ,\left( \mathbf S _{A0}^{-1}+n_A\varvec{\varSigma }^{-1}\right) ^{-1}\right) \\ p(\varvec{\mu }_B|\mathbf Y ,\varvec{\varSigma })= & {} N\left( \left( \mathbf S _{B0}^{-1}+n_B\varvec{\varSigma }^{-1}\right) ^{-1}\left( \mathbf S _{B0}^{-1}{} \mathbf m _{B0}+n_A\varvec{\varSigma }^{-1}\bar{\mathbf{y }}_B \right) ,\left( \mathbf S _{B0}^{-1}+n_B\varvec{\varSigma }^{-1}\right) ^{-1}\right) \\ p(\varvec{\varSigma }|\mathbf Y ,\varvec{\mu }_A,\varvec{\mu }_B)= & {} IW(\mathbf S _{\varvec{\mu }},n_A+n_B). \end{aligned}$$

We can use a Gibbs sampler to get a sample from the joint posterior of \((\varvec{\mu }_A,\varvec{\mu }_B,\varvec{\varSigma })\). In a Gibbs sampler we sequentially draw each model parameter from its conditional posterior given the remaining parameters. The Gibbs sampler algorithm can be written as

  1. 1.

    Set initial values for the model parameters: \(\varvec{\mu }_A^{(0)}\), \(\varvec{\mu }_B^{(0)}\), and \(\varvec{\varSigma }^{(0)}\).

  2. 2.

    Draw \(\varvec{\mu }_A^{(s)}\) from its conditional posterior \(p(\varvec{\mu }_A|\mathbf Y ,\varvec{\varSigma }^{(s-1)})\).

  3. 3.

    Draw \(\varvec{\mu }_B^{(s)}\) from its conditional posterior \(p(\varvec{\mu }_B|\mathbf Y ,\varvec{\varSigma }^{(s-1)})\).

  4. 4.

    Draw \(\varvec{\varSigma }^{(s)}\) from its conditional posterior \(p(\varvec{\varSigma }|\mathbf Y ,\varvec{\mu }_A^{(s)},\varvec{\mu }_A^{(s)})\).

  5. 5.

    Repeat steps 2–4 for \(s=1,\ldots ,S\).

In the software program R, drawing from a multivariate normal distribution can be done using the function ‘rmvnorm’ in the ‘mvtnorm’-package and drawing from an inverse Wishart distribution can be done using the function ‘riwish’ in the ‘MCMCpack’-package.

It may be that the initial values, \(\varvec{\mu }_A^{(0)}\), \(\varvec{\mu }_B^{(0)}\), and \(\varvec{\varSigma }^{(0)}\), are chosen far away from the subspace where the posterior is concentrated. If this is the case, a burn-in period of, say, 100 draws is needed. After the burn-in period convergence is reached and the remaining draws come from the actual posterior of the model parameters.

Appendix 2: Gibbs sampler (R code)

Conditional posteriors for \(\varvec{\mu }_A\), \(\varvec{\mu }_B\), and \(\varvec{\varSigma }\).

figure a

Gibbs sampler

figure b

Generate data matrix Y

figure c

Compute classical estimates

figure d

Set priors for Gibbs sampler

figure e

Run Gibbs sampler

figure f

Compute descriptive statistics from Gibbs output

figure g

Create data matrix for BIEMS

figure h

Appendix 3: Derivation of the Bayes factor

The Bayes factor is derived for a one-sided hypothesis \(H_1:\delta <0\) versus the unconstrained hypothesis \(H_u:\delta \in \mathbb {R}\). In the encompassing prior approach, the prior under \(H_1\), \(p_1(\delta ,\sigma ^2)\), is a truncation of the unconstrained (or encompassing) prior under \(H_u\), \(p_u(\delta ,\sigma ^2)\), in the region where \(\delta <0\), i.e., \(p_1(\delta ,\sigma ^2)=p_u(\delta ,\sigma ^2)I(\delta <0)/\text{ Pr }(\delta <0|H_u)\), where the prior probability \(\text{ Pr }(\delta <0|H_u)=\int _{\delta <0}p_u(\delta )d\delta \), where \(I(\cdot )\) is the indicator function. Note that \(\text{ Pr }(\delta <0|H_u)=\frac{1}{2}\) if the unconstrained prior is centered at 0, such as \(p_u(\delta )=N(0,\sigma _0^2)\). Also note that the likelihood under \(H_1\) is a truncation of the likelihood under \(H_u\), i.e., \(p_1(\mathbf y |\delta ,\sigma ^2)=p_u(\mathbf y |\delta ,\sigma ^2)I(\delta <0)\). For this reason we can omit the hypothesis index u in the likelihood functions in the derivation below. The Bayes factor of \(H_1\) versus \(H_u\) can then be derived as follows

$$\begin{aligned} \nonumber B_{1u}= & {} \frac{\iint _{\delta <0}p(\mathbf y |\delta ,\sigma ^2)p_1(\delta ,\sigma ^2) d \delta d\sigma ^2}{\iint p(\mathbf y |\delta ,\sigma ^2)p_u(\delta ,\sigma ^2)d \delta d \sigma ^2}\\ \nonumber= & {} \frac{1}{\text{ Pr }(\delta <0|H_u)}\iint _{\delta <0} \frac{p(\mathbf y |\delta ,\sigma ^2)p_u(\delta ,\sigma ^2)}{\iint p(\mathbf y |\delta ,\sigma ^2)p_u(\delta ,\sigma ^2)d \delta d \sigma ^2}d \delta d \sigma ^2 \\ \nonumber= & {} \frac{1}{\text{ Pr }(\delta <0|H_u)}\iint _{\delta <0} p_u(\delta ,\sigma ^2|\mathbf y )d \delta d \sigma ^2\\= & {} \frac{\text{ Pr }(\delta <0|\mathbf y ,H_u)}{\text{ Pr }(\delta <0|H_u)}, \end{aligned}$$

which corresponds to (9.15), where \(\text{ Pr }(\delta <0|\mathbf y ,H_u)=\iint _{\delta <0}p_{u}(\delta |\mathbf y ) d\delta d\sigma ^2\) is the posterior probability that the constraints hold under \(H_u\). For \(H_2:\delta >0\) versus the unconstrained hypothesis \(H_u:\delta \in \mathbb {R}\) we can follow the same steps to obtain (9.16).

For \(H_0:\delta =0\) versus the unconstrained hypothesis \(H_u:\delta \in \mathbb {R}\), the encompassing prior approach implies that \(p_0(\sigma ^2)=p_u(\sigma ^2|\delta =0)\). Consequently,

$$\begin{aligned} \nonumber B_{0u}= & {} \frac{\int p(\mathbf y |\delta =0,\sigma ^2)p_0(\sigma ^2) d\sigma ^2}{\iint p(\mathbf y |\delta ,\sigma ^2)p_u(\delta ,\sigma ^2)d\delta d \sigma ^2}\\= & {} \frac{\int p(\mathbf y |\delta =0,\sigma ^2)p_u(\sigma ^2|\delta =0) d\sigma ^2}{\iint p(\mathbf y |\delta ,\sigma ^2)p_u(\delta ,\sigma ^2)d \delta d \sigma ^2} \\= & {} \frac{1}{p_u(\delta =0)}\int \frac{p(\mathbf y |\delta =0,\sigma ^2)p_u(\delta =0,\sigma ^2)}{\iint p(\mathbf y |\delta ,\sigma ^2)p_u(\delta ,\sigma ^2)d \delta d \sigma ^2} d\sigma ^2\\= & {} \frac{1}{p_u(\delta =0)}\int p_u(\delta =0,\sigma ^2|\mathbf y ) d\sigma ^2\\= & {} \frac{p_u(\delta =0|\mathbf y )}{p_u(\delta =0)}, \end{aligned}$$

which is equal to the Savage-Dickey density ratio in (9.14).

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Mulder, J. (2016). Bayesian Testing of Constrained Hypotheses. In: Robertson, J., Kaptein, M. (eds) Modern Statistical Methods for HCI. Human–Computer Interaction Series. Springer, Cham. https://doi.org/10.1007/978-3-319-26633-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26633-6_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26631-2

  • Online ISBN: 978-3-319-26633-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics