Orthogonal Polynomials

The Orthogonal Polynomials tab generates orthogonal polynomial columns from a numeric column. With the raw polynomial basis x,x2,,xdx, x^2, \dots, x^d, correlations between columns grow extreme as the degree increases, causing coefficients and fitted values to lose significant digits. Using orthogonal polynomial columns as predictors in Linear Regression reduces the condition number of the design matrix to approximately 1, improving numerical precision.

Basic Usage

Opening Orthogonal Polynomials

Select Data > Orthogonal Polynomials... from the menu bar to open a new Orthogonal Polynomials tab.

Generating Columns

  1. Select the target dataset from the Dataset dropdown
  2. Select the numeric column to transform in Source column
  3. Set the maximum polynomial degree in Degree (1 to 30). The degree must be less than the number of valid data points in the source column (rows after excluding null, NaN, and Infinity)
  4. Click Preview to inspect the result
  5. Enter a name for the output dataset in Output Name
  6. Click Save as Dataset

The original dataset is not modified. A new derived dataset is created. Rows with null, NaN, or Infinity in the source column are excluded from the derived dataset. For the remaining rows, all original columns are retained and poly_1, poly_2, ..., poly_{degree} columns are appended. The output dataset may have fewer rows than the original.

Each orthogonal polynomial column is normalized to Pj2=n\|P_j\|^2 = n, where nn is the number of valid data points.

Generating degree-3 orthogonal polynomials from the x column

Polynomial Regression Workflow

To use orthogonal polynomials instead of the raw polynomial basis:

  1. In the Orthogonal Polynomials tab, generate degree-dd polynomial columns from the x column and save the dataset
  2. Open a Linear Regression tab and select the saved derived dataset
  3. Set y as the response variable
  4. Set poly_1, poly_2, ..., poly_d as explanatory variables

R-squared, residual SD, fitted values, and prediction intervals are identical to those from raw polynomial regression. The coefficients are expressed in the orthogonal polynomial basis and differ in both value and interpretation from the raw polynomial basis coefficients. Each poly_j coefficient represents how much the jj-th orthogonal polynomial component contributes to the response variable. Because the basis is orthogonal, the tt-test for each coefficient is independent of the others. If the pp-value of the highest-degree poly_d is large, the degree-dd component does not contribute to the model.

To choose the polynomial degree, run regressions with different degrees and compare AIC or Adj. R-squared in Linear Regression. Orthogonal polynomials improve numerical precision but do not prevent overfitting.

Next steps

See also