Enum Definitions

The Enum type is a data type for categorical data with a predefined set of valid values. Each Enum definition specifies the allowed values.

Enum types are not auto-inferred when loading CSV files. You create an Enum definition first, then convert string columns to that Enum type. Enum definitions are shared across datasets within the same project (they are not carried over to other projects). Each Enum definition can contain up to 50 values. For categories exceeding this limit, use string type instead.

Opening the Manage Enums Tab

Select Data > Manage Enums... from the menu bar to open the Manage Enums tab. Use this tab to create, edit, and delete Enum definitions.

Manage Enums tab

Creating an Enum Manually

New Enum

  1. Click + New Enum
  2. Enter the Enum name
  3. Enter the values. The order you enter them is recorded as the Enum's value order. This order is applied to graph axes and sorting when the column is set to the ordinal scale
  4. Click + Add Value to add more values
  5. Click Save

Validation Rules

  • Enum name is required and must be unique among existing Enum names
  • At least one value is required
  • Duplicate values are not allowed
  • Leading and trailing whitespace is automatically trimmed. Values are case-sensitive, so High and high are treated as distinct values

Creating an Enum from a Column

You can auto-generate an Enum definition from the unique values of an existing string column.

  1. Click Create from Column
  2. Select a dataset
  3. Select a column. Only string-type columns are available
  4. Enter the Enum name. The default is {column_name}_enum
  5. Review the preview showing unique values with their counts and percentages
  6. Click Create Enum

Values are sorted by frequency in descending order. If the column has more than 50 unique values, only the top 50 most frequent values are used. A warning is displayed in the preview so you can verify which values are included.

For categories with a meaningful order, always verify the value order by editing the Enum after creation. To reflect the order in graphs and statistics, convert the column to the Enum type and then change it to the ordinal scale.

Create Enum from Column dialog

Editing and Deleting Enums

Editing

Click Edit on an Enum card to enter edit mode. You can change the name, add or remove values, and change the value order. To reorder values, use the up and down arrow buttons next to each value.

When columns have already been converted to this Enum type, the following restrictions apply:

  • The name cannot be changed. To rename, first convert the dependent columns back to string, then change the Enum name, and convert the columns to Enum again
  • Values still present in the column data cannot be removed. First use Column Type Conversion to convert the dependent columns back to string, exclude or replace the unwanted values, and then convert them back to Enum
  • When you change the order, a confirmation dialog is shown if any dependent column is set to the ordinal scale
  • Values can be added without restrictions. However, the limit of 50 values per Enum definition still applies

When you save a new value order, it is immediately reflected in Basic Statistics, Data Table sorting, and graph legend ordering for columns set to the ordinal scale. The axis order of existing graphs does not change automatically: each graph keeps the value order captured when the column was assigned to the axis as its Category Order. To apply the new order to the axis, change the Category Order in the graph settings panel.

Deleting

Click Delete on an Enum card to delete the Enum definition. Deletion is not allowed when dependent columns exist. Use Column Type Conversion to change the data type of dependent columns first.

Converting String Columns to Enum Type

After creating an Enum definition, use Column Type Conversion to convert string columns to the Enum type.

  1. Right-click a column header in Data Table and select Convert Column Types...
  2. In the To dropdown for the column you want to convert, select the Enum name you created
  3. Click Preview to review the conversion result
  4. Click Apply to execute the conversion

The conversion result is created as a new dataset. The original dataset is not modified.

If the string column contains values not included in the Enum definition, they are handled according to the error handling option in column type conversion. NULL replaces such cells with NULL, Exclude row removes the affected rows from the resulting dataset, and Fail aborts the conversion. Values are matched case-sensitively, so values whose case does not match the Enum definition are treated as "not included in the Enum definition".

For categories with a meaningful order, you also need to change the column to ordinal scale after conversion.

Ordinal Scale and Graph Ordering

Changing to Ordinal Scale

Enum columns default to the nominal scale. If the value order is meaningful, right-click the column header in Data Table and select Edit Scale of Measurement to change it to the ordinal scale. The measurement scale determines which statistics are shown in Basic Statistics. For more on measurement scales in general, see Data Types and Measurement Scales.

Graph Axis Ordering

When an Enum column is set to ordinal scale, both graph axes and legend ordering for aesthetics (color, fill, shape, etc.) default to the Enum definition order. With nominal scale, both axes and legends are sorted alphabetically. You can manually change the axis display order from the Category Order section in the graph settings panel.

The basic statistics table of statistics by measurement scale assumes numeric columns (int64, float64). For Enum columns on the nominal scale, only frequency counts are shown, the same as string columns (but unlike string columns, Enum columns reject values outside the definition). On the ordinal scale, min / max / median / quartiles are also computed from the position order of the Enum definition, in addition to the frequency counts. mean / std / skewness / ex. kurt are not computed because ordinal categories do not have a defined distance. IQR (= Q3 − Q1) is also not meaningful on an ordinal scale because distances between values are not defined, and is therefore not computed. See Basic Statistics - String Type and Enum Type for how these are computed.

See also