Maximally selected chi-square statistics and binary splits of nominal variables

Biom J. 2006 Aug;48(5):838-48. doi: 10.1002/bimj.200510191.

Abstract

We address the problem of maximally selected chi-square statistics in the case of a binary Y variable and a nominal X variable with several categories. The distribution of the maximally selected chi-square statistic has already been derived when the best cutpoint is chosen from a continuous or an ordinal X, but not when the best split is chosen from a nominal X. In this paper, we derive the exact distribution of the maximally selected chi-square statistic in this case using a combinatorial approach. Applications of the derived distribution to variable selection and hypothesis testing are discussed based on simulations. As an illustration, our method is applied to a birth data set.

MeSH terms

  • Biometry / methods*
  • Cesarean Section
  • Chi-Square Distribution*
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Female
  • Humans
  • Parturition / physiology
  • Pregnancy