Welch's Test

The Welch's Test tab tests whether the means of a response variable differ across three or more groups. It generalizes the two-sample Welch t-test to k groups and does not assume equal variances.

Basic Usage

Open the Tab

Select Analysis > Welch's Test... from the menu bar.

Run an Analysis

Configure the following in the settings panel:

  1. Select a dataset from Dataset
  2. Select a categorical variable for Factor
  3. Select a numeric variable for Response Variable
  4. Click Run Analysis

Welch's Test setup with Iris dataset, species x sepal_length

Data Format

Data must be in long format with one row per observation. Use Reshape to convert wide-format data.

Null Hypothesis

H0:μ1=μ2==μkH_0: \mu_1 = \mu_2 = \cdots = \mu_k

Tests whether all kk group population means are equal. Each group's variance is estimated separately to compute the F* statistic, so equal variances are not assumed. See Hypothesis Testing Concepts for theoretical background.

Variable Selection

Factor (Categorical): Select a categorical variable that defines the groups. Columns with nominal or ordinal measurement scale appear as options.

Response Variable (Numeric): Select the numeric variable to analyze. Columns with interval or ratio measurement scale appear as options.

Significance Level

Set the significance level for the F* test and Games-Howell post-hoc comparisons. Choose from α = 0.10, 0.05 (default), or 0.01. The Games-Howell confidence interval width is 1 − α.

Reading the Results

Observations

The total number of observations used in the analysis appears at the top. If rows were excluded due to missing values, the count of excluded rows is also shown.

Group Statistics

A summary table of descriptive statistics for each group.

ColumnDescription
GroupGroup name
NNumber of observations
MeanGroup mean
SDStandard deviation (square root of unbiased variance, denominator n − 1)
MinMinimum value
MaxMaximum value

Welch's Test

The Welch F* test results table.

ColumnDescription
F*Inverse-variance-weighted F statistic, distinct from the classical ANOVA F statistic
df1Numerator degrees of freedom (k − 1)
df2Denominator degrees of freedom from Satterthwaite approximation, generally non-integer
Pr(>F*)p-value -- the probability of observing an F* statistic as extreme as, or more extreme than, the observed value under the null hypothesis

Welch's Test result for the Iris dataset, testing the effect of species

Below the table, the conclusion about rejecting or not rejecting the null hypothesis at the selected significance level is displayed.

Games-Howell Post-Hoc Test

The F* test determines whether at least one group mean differs from the others, but does not identify which pairs of groups differ. The Games-Howell post-hoc test compares all pairs of group means.

It is computed automatically regardless of the F* test result. For each pair, the test computes the same standard error and degrees of freedom as the Welch t-test, then uses the studentized range distribution to control the family-wise error rate. See Hypothesis Testing Concepts for computation details and FWER control.

Results Table

ColumnDescription
ComparisonThe two groups being compared
DiffDifference in means (Group 1 mean − Group 2 mean)
SEStandard error of the difference, using each group's own variance
tt statistic (Diff / SE)
dfWelch-Satterthwaite degrees of freedom for this pair, generally non-integer
p-valueAdjusted p-value from the studentized range distribution
CI Lower / CI UpperSimultaneous confidence interval for the mean difference

Games-Howell post-hoc comparisons for all pairs of Iris species

Assumptions

Welch's Test and the Games-Howell post-hoc test assume the following:

  • Independence: Observations are independent of each other
  • Normality: The response variable follows a normal distribution within each group. With large sample sizes, the central limit theorem keeps the Type I error rate close to the nominal level. See Hypothesis Testing Concepts for normality diagnostics

Equal variances are not assumed. Use this test when group variances may differ or when group sizes are unequal.

Error Messages

MessageCause and resolution
Not enough valid observations (at least 3 required)Fewer than 3 valid observations remain after excluding missing values
At least 2 groups are requiredThe factor variable has only one level
Welch ANOVA could not be computed. Each group must have at least 2 observations with non-zero variance.A group has 1 or fewer observations, or all values within a group are identical

If the Games-Howell section shows "not available", the cause is the same: a group has fewer than 2 observations or zero variance. The analysis runs with as few as 2 observations per group, but Monte Carlo simulations by Games and Howell (1976) confirmed that the Type I error rate stays near the nominal level when each group has at least 6 observations. Interpret results with caution for smaller samples.

Missing Values

Rows with missing values in the factor or response variable are automatically excluded. The number of excluded rows is displayed in the results panel.

References

  • Games, P. A., & Howell, J. F. (1976). Pairwise multiple comparison procedures with unequal N's and/or variances: A Monte Carlo study. Journal of Educational Statistics, 1(2), 113--125. https://www.jstor.org/stable/1164979
  • Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. Biometrika, 38(3/4), 330--336. https://www.jstor.org/stable/2332579