ANOVA (Analysis of Variance)

The ANOVA tab analyzes whether the means of a response variable differ across groups defined by categorical variables. Both one-way and two-way designs are supported.

Basic Usage

Open the Tab

Select Analysis > ANOVA... from the menu bar.

Run an Analysis

Configure the following in the settings panel:

  1. Select a dataset from Dataset
  2. Choose One-Way or Two-Way under Analysis Type
  3. Select a categorical variable for Factor A
  4. Select a numeric variable for Response Variable
  5. Click Run Analysis

Data Format

Data must be in long format with one row per observation. Each row contains the factor value and the response variable value. Use Reshape to convert wide-format data.

One-Way ANOVA

Analyzes differences in the response variable means across groups defined by a single categorical factor. Use this when you have one grouping factor.

Statistical Model

yij=μ+αi+εijy_{ij} = \mu + \alpha_i + \varepsilon_{ij}

yijy_{ij} is the jj-th observation in group ii, μ\mu is the overall mean, αi\alpha_i is the effect of group ii, and εij\varepsilon_{ij} is the error term.

Null Hypothesis

H0:μ1=μ2==μkH_0: \mu_1 = \mu_2 = \cdots = \mu_k

Tests whether all kk group population means are equal.

Variable Selection

Factor A: Select a categorical variable that defines the groups. Columns with nominal or ordinal measurement scale appear as options.

Response Variable: Select the numeric variable to analyze. Columns with interval or ratio measurement scale appear as options.

Example

To analyze whether sepal length differs among the three Iris species (setosa, versicolor, virginica) in the Iris sample data:

  1. Dataset: Iris
  2. Analysis Type: One-Way
  3. Factor A: species
  4. Response Variable: sepal_length
  5. Click Run Analysis

One-way ANOVA setup with Iris dataset, species x sepal_length

Confidence Level

For one-way ANOVA, set the confidence level for Tukey HSD confidence intervals. Choose from 90%, 95% (default), or 99%.

Two-Way ANOVA

Analyzes the effects of two categorical factors and their interaction on the response variable. Use this when you have two grouping factors.

Statistical Model

With interaction:

yijk=μ+αi+βj+(αβ)ij+εijky_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \varepsilon_{ijk}

αi\alpha_i is the effect of factor A, βj\beta_j is the effect of factor B, and (αβ)ij(\alpha\beta)_{ij} is the interaction effect.

Additional Settings

Factor B: Select a second categorical variable, different from Factor A.

Include interaction term (A x B): Whether to include the interaction term in the model. Enabled by default. Include the interaction when the effect of one factor may depend on the level of the other. If the interaction is known to be absent, excluding it increases the power of main effect tests.

Sum of Squares Type: Choose the method for computing sums of squares.

Sum of Squares Types

Type I computes sums of squares sequentially based on the order factors enter the model. Each factor's contribution depends on which factors are already in the model.

Type III computes sums of squares for each factor as if it were the last one entered. Each factor's contribution is adjusted for all other factors.

For balanced designs (equal sample sizes in all cells), Type I and Type III produce identical results. For unbalanced designs, Type III is generally preferred because results do not depend on factor ordering.

Type III Interpretation with Interaction

When the interaction term is included, the Type III test for a main effect estimates the effect of that factor while the other factor is at its reference level. MIDAS uses treatment coding with the first level in alphabetical order as the reference category. With balanced data, this coincides with the test about marginal means averaged across all levels. With unbalanced data, the two may differ.

Reading the Results

Observations

The total number of observations used in the analysis appears at the top. If rows were excluded due to missing values, the count of excluded rows is also shown.

Group Statistics

A summary table of descriptive statistics for each group.

ColumnDescription
GroupGroup name
NNumber of observations
MeanGroup mean
SDStandard deviation
MinMinimum value
MaxMaximum value

ANOVA Table

The main results table. Decomposes the total variance of the response variable into contributions from each factor and residual error.

ColumnDescription
SourceSource of variation
SSSum of squares -- the amount of variation attributable to each source
dfDegrees of freedom
MSMean square (SS / df)
FF statistic (MS of the source / MS of residuals)
Pr(>F)p-value -- the probability of observing an F statistic as extreme as, or more extreme than, the observed value under the null hypothesis

ANOVA table showing the species effect on sepal_length in the Iris dataset

Tukey HSD Post-Hoc Comparisons

The ANOVA F-test determines whether at least one group mean differs from the others, but does not identify which pairs of groups differ. Tukey HSD post-hoc tests compare all pairs of group means to identify where the differences lie.

For one-way ANOVA, Tukey HSD is computed automatically regardless of the F-test result. If the F-test p-value is large, treat the Tukey HSD results as exploratory. The Tukey-Kramer method is used, which handles unequal group sizes.

Tukey HSD tests all pairwise mean differences simultaneously while controlling the family-wise error rate. Compared to running individual t-tests for each pair, this reduces the inflation of false positives from multiple comparisons.

ColumnDescription
ComparisonThe two groups being compared
DiffDifference in means (Group 1 mean − Group 2 mean)
SEStandard error of the difference
qStudentized range statistic
p-valuep-value from the studentized range distribution
CI Lower / CI UpperSimultaneous confidence interval for the mean difference, based on the selected confidence level

The critical value qcriticalq_{\text{critical}}, MSE, and residual degrees of freedom are displayed below the table.

Tukey HSD post-hoc comparisons for all pairs of Iris species

Assumptions

ANOVA assumes the following. Verify that these are reasonable when interpreting results.

  • Independence: Observations are independent of each other
  • Normality: The response variable follows a normal distribution within each group. With large sample sizes, the central limit theorem provides robustness
  • Homogeneity of variance: The variance is equal across all groups

The current implementation does not provide diagnostic tools for checking these assumptions.

Error Messages

In two-way ANOVA, if any combination of factor levels has no observations, the model with interaction cannot be estimated. The error "The design matrix is rank deficient" is displayed. Turn off the interaction term or check whether your data has empty cells.

Missing Values

Rows containing missing values are automatically excluded. The number of excluded rows is displayed in the results panel. For two-way ANOVA, rows with missing values in either factor or the response variable are excluded.