Data Types and Measurement Scales

MIDAS automatically infers data types and measurement scales when loading data. Data type and measurement scale are independent concepts: data type describes how the value is represented (number, date, text, etc.), while measurement scale describes its analytical properties (nominal, ordinal, interval, ratio). Measurement scales filter which items appear in Basic Statistics — statistics that do not fit the scale are not displayed — so verify the scale is correct after loading.

See Data Preparation and Import for instructions on loading data and changing types.

Data Types

MIDAS automatically determines the type of loaded data.

boolean Boolean values represented as true/false, yes/no, or y/n (case-insensitive). 0/1 are treated as int64. To treat a column recorded as 0/1 as boolean, convert it with Column Type Conversion.

int64 (integer) Numbers without a decimal point (e.g., 1, 42, -10). Integers whose absolute value exceeds 9,007,199,254,740,991 (25312^{53}-1) cannot be represented exactly as numbers, so auto-inference loads them as string.

float64 (floating point) Numbers with a decimal point (e.g., 3.14, 0.5, -2.71). Columns that mix integers and decimals are loaded as float64.

date Date data. Write dates in CSV as YYYY-MM-DD (e.g., 2025-11-17). YYYY/MM/DD, MM/DD/YYYY, and MM-DD-YYYY are also accepted. Zero-padding of month and day is optional, so values like 2025/1/15 are accepted. Loaded dates are displayed as YYYY-MM-DD regardless of the separator or the browser timezone. Month-day-year order is always interpreted as MM/DD/YYYY. DD/MM/YYYY is not detected, so reformat CSVs recorded in that style to YYYY-MM-DD before loading.

datetime Data including both date and time (e.g., 2025-11-17 14:30:00). Values with a time component are inferred as datetime even when the time is zero, as in 2025-01-15 00:00:00. Values are converted to UTC internally, so the displayed time depends on the timezone. When date-only values and datetime values are mixed in one column, the whole column is treated as datetime, and rows without a time are stored as 00:00 UTC. For example, a browser in JST displays 2025-01-15 as 2025/1/15 9:00:00. See Datetime Data and Timezones for details.

string Text data that does not match any of the above. MIDAS treats only empty cells as missing values; strings such as NA, ., and - are not treated as missing and are loaded as text. These values can make type inference for the column fail and fall back to string, so represent missing values as empty cells.

enum Categorical data with a restricted set of valid values. Enums are not auto-inferred; define them manually and convert string columns. Because enums can define the order of their values, use enum type for ordered categories. With ordinal scale, enum columns get order-based statistics such as the median and quartiles, but statistics that assume distances between values, such as the mean and standard deviation, are not computed. See Enum Definitions for creation steps and details on statistics.

Data types appear in the second row of the column header in the Data Table: the first row shows the column name, and the second row shows the data type and measurement scale (e.g., int64 interval). If a data type is not determined correctly, fix it with Column Type Conversion. Type conversion does not modify the original dataset; the result is created as a new dataset. Changing the measurement scale (Edit Scale of Measurement, described below) only updates column metadata and does not create a new dataset.

Datetime Data and Timezones

MIDAS stores datetime data in UTC. Timezone information is not preserved.

If a value includes a timezone offset, it is converted to UTC using that offset. Datetimes that include a time component but no offset are interpreted in the browser's timezone. Date-only values are interpreted as 00:00 UTC. Values are displayed in the browser's local timezone. Aggregations are based on the internal UTC values, so opening an MDS file on devices in different timezones does not change aggregation results (counts, means, grouping at time boundaries, etc.). Only the local-timezone display changes.

The following examples assume the browser timezone is JST (+09:00).

CSV valueInterpretation on loadData Table display
2025-01-15 14:30:00Interpreted as 14:30 JST2025/1/15 14:30:00
2025-01-15T14:30:00+09:00Interpreted as 14:30 JST2025/1/15 14:30:00
2025-01-15T14:30:00ZInterpreted as 14:30 UTC2025/1/15 23:30:00

In a column that mixes date-only and datetime values, date-only rows are interpreted as 00:00 UTC. In the examples above, 2025-01-15 would be displayed as 2025/1/15 9:00:00.

Datetimes with a time component and no timezone information are interpreted in the browser's timezone. On the same device, times appear exactly as written in the CSV, but because they are stored in UTC, the display changes when the MDS file is opened on a device in a different timezone. Loading the same CSV on devices in different timezones also stores different UTC values for datetimes without an offset. When timezone consistency matters, add timezone offsets to the datetimes in the CSV or normalize them to UTC before loading.

Measurement Scales

Measurement scales classify "what operations are meaningful for a given data". Based on Stevens' (1946) four levels of measurement. The scales form a hierarchy — nominal < ordinal < interval < ratio — where higher scales support more operations and include all operations available to lower scales. MIDAS uses this classification to filter which statistics are displayed. The scale level alone does not uniquely determine the analysis method, so set the scale according to the meaning of your data and change it after loading if needed.

Nominal Scale

Data representing categories with no meaningful order. Only equality (==, \neq) is meaningful.

Examples: Gender (male/female), colors (red/blue/green), country names

Ordinal Scale

Categories with meaningful order, but no defined interval between values. Comparisons (<<, >>) are meaningful.

Examples: Satisfaction (low/medium/high), grade level (1st/2nd/3rd year), grades (A/B/C/D)

Interval Scale

Equally spaced numeric data where differences are meaningful, but ratios are not. The zero point is arbitrary.

Examples: Temperature (Celsius), year (AD)

  • The difference between 20°C and 10°C is a meaningful 10°C
  • However, 20°C is not "twice as warm" as 10°C

Ratio Scale

Equally spaced numeric data with a true zero point. Both differences and ratios are meaningful.

Examples: Height, weight, price, age

  • The difference between 20kg and 10kg is a meaningful 10kg
  • Furthermore, 20kg is "twice as heavy" as 10kg

Ratio scale is not assigned by auto-inference. Whether zero represents a true origin depends on the meaning of the data and cannot be determined from values alone. For data with a true zero point such as height or weight, right-click the column header in the Data Table and set ratio scale from Edit Scale of Measurement. Setting ratio scale adds the coefficient of variation (cv) and the geometric mean (geo mean) to basic statistics. See Basic Statistics for the display conditions.

Auto-Inference from Data Types to Measurement Scales

MIDAS determines data types on import and automatically assigns measurement scales. Auto-inference is only an initial value — after loading, data type and measurement scale can be changed independently.

Data TypeInferred ScaleReason
booleanNominaltrue/false are unordered categories
int64IntervalDifferences between numbers are meaningful, but whether zero represents a true origin depends on the data
float64IntervalSame as integers
dateIntervalDate differences are meaningful, but ratios of dates are not
datetimeIntervalSame as dates
stringNominalText is treated as categories
enumNominalEven when an order is defined, whether the analysis uses it is made explicit by the scale setting

Auto-inference may not match the actual meaning of the data. For example, postal codes and ID columns are loaded as numeric and assigned interval scale, but they are semantically nominal. For these columns, right-click the column header in the Data Table and select Edit Scale of Measurement to change to the appropriate scale. Note that numeric strings with leading zeros (0060001, 001, etc.) are automatically loaded as string, so leading zeros are preserved.

For ordered categories stored as text, create an Enum definition and convert the column to Enum type. A string column cannot define the order of its values, so changing the scale to ordinal only shows frequency counts; order-based statistics such as the median are not computed. Data type and measurement scale are independent, so defining the order in an Enum does not automatically change the scale to ordinal. After converting to Enum type, change the scale to ordinal from Edit Scale of Measurement.

References

See also