Welch's Test

The Welch's Test tab tests whether the means of a response variable differ across three or more groups. It generalizes the two-sample Welch t-test to k groups and does not assume equal variances.

Basic Usage

Open the Tab

Select Analysis > Welch's Test... from the menu bar.

Run an Analysis

Configure the following in the settings panel:

Select a dataset from Dataset
Select a categorical variable for Factor
Select a numeric variable for Response Variable
Click Run Analysis

Welch's Test setup with Iris dataset, species x sepal_length

Data Format

Data must be in long format with one row per observation. Use Reshape to convert wide-format data.

Null Hypothesis

$H_0: \mu_1 = \mu_2 = \cdots = \mu_k$

Tests whether all $k$ group population means are equal. Each group's variance is estimated separately to compute the F* statistic, so equal variances are not assumed. See Hypothesis Testing Concepts for theoretical background.

Variable Selection

Factor (Categorical): Select a categorical variable that defines the groups. Columns with nominal or ordinal measurement scale appear as options.

Response Variable (Numeric): Select the numeric variable to analyze. Columns with interval or ratio measurement scale appear as options.

Significance Level

Set the significance level for the F* test and Games-Howell post-hoc comparisons. Choose from α = 0.10, 0.05 (default), or 0.01. The Games-Howell confidence interval width is 1 − α.

Reading the Results

Observations

The total number of observations used in the analysis appears at the top. If rows were excluded due to missing values, the count of excluded rows is also shown.

Group Statistics

A summary table of descriptive statistics for each group.

Column	Description
Group	Group name
N	Number of observations
Mean	Group mean
SD	Standard deviation (square root of unbiased variance, denominator n − 1)
Min	Minimum value
Max	Maximum value

Welch's Test

The Welch F* test results table.

Column	Description
F*	Inverse-variance-weighted F statistic, distinct from the classical ANOVA F statistic
df1	Numerator degrees of freedom (k − 1)
df2	Denominator degrees of freedom from Satterthwaite approximation, generally non-integer
Pr(>F*)	p-value -- the probability of observing an F* statistic as extreme as, or more extreme than, the observed value under the null hypothesis

Welch's Test result for the Iris dataset, testing the effect of species

Below the table, the conclusion about rejecting or not rejecting the null hypothesis at the selected significance level is displayed.

Games-Howell Post-Hoc Test

The F* test determines whether at least one group mean differs from the others, but does not identify which pairs of groups differ. The Games-Howell post-hoc test compares all pairs of group means.

It is computed automatically regardless of the F* test result. For each pair, the test computes the same standard error and degrees of freedom as the Welch t-test, then uses the studentized range distribution to control the family-wise error rate. See Hypothesis Testing Concepts for computation details and FWER control.

Results Table

Column	Description
Comparison	The two groups being compared
Diff	Difference in means (Group 1 mean − Group 2 mean)
SE	Standard error of the difference, using each group's own variance
t	t statistic (Diff / SE)
df	Welch-Satterthwaite degrees of freedom for this pair, generally non-integer
p-value	Adjusted p-value from the studentized range distribution
CI Lower / CI Upper	Simultaneous confidence interval for the mean difference

Games-Howell post-hoc comparisons for all pairs of Iris species

Assumptions

Welch's Test and the Games-Howell post-hoc test assume the following:

Independence: Observations are independent of each other
Normality: The response variable follows a normal distribution within each group. With large sample sizes, the central limit theorem keeps the Type I error rate close to the nominal level. See Hypothesis Testing Concepts for normality diagnostics

Equal variances are not assumed. Use this test when group variances may differ or when group sizes are unequal.

Error Messages

Message	Cause and resolution
Not enough valid observations (at least 3 required)	Fewer than 3 valid observations remain after excluding missing values
At least 2 groups are required	The factor variable has only one level
Welch ANOVA could not be computed. Each group must have at least 2 observations with non-zero variance.	A group has 1 or fewer observations, or all values within a group are identical

If the Games-Howell section shows "not available", the cause is the same: a group has fewer than 2 observations or zero variance. The analysis runs with as few as 2 observations per group, but Monte Carlo simulations by Games and Howell (1976) confirmed that the Type I error rate stays near the nominal level when each group has at least 6 observations. Interpret results with caution for smaller samples.

Missing Values

Rows with missing values in the factor or response variable are automatically excluded. The number of excluded rows is displayed in the results panel.

ANOVA -- one-way and two-way ANOVA assuming equal variances
Two-Sample Test / Paired Test -- use t-tests for comparing two groups
Hypothesis Testing Concepts -- background on p-values and statistical testing

References

Games, P. A., & Howell, J. F. (1976). Pairwise multiple comparison procedures with unequal N's and/or variances: A Monte Carlo study. Journal of Educational Statistics, 1(2), 113--125. https://www.jstor.org/stable/1164979
Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. Biometrika, 38(3/4), 330--336. https://www.jstor.org/stable/2332579

Welch's Test

Basic Usage

Open the Tab

Run an Analysis

Data Format

Null Hypothesis

Variable Selection

Significance Level

Reading the Results

Observations

Group Statistics

Welch's Test

Games-Howell Post-Hoc Test

Results Table

Assumptions

Error Messages

Missing Values

Related Pages

References