Data Types and Measurement Scales
MIDAS automatically infers data types and measurement scales when loading data. Data type and measurement scale are independent concepts: data type describes how the value is represented (number, date, text, etc.), while measurement scale describes its analytical properties (nominal, ordinal, interval, ratio). Measurement scales filter which items appear in Basic Statistics — statistics that do not fit the scale are not displayed — so verify the scale is correct after loading.
See Data Preparation and Import for instructions on loading data and changing types.
Data Types
MIDAS automatically determines the type of loaded data.
boolean
Boolean values represented as true/false, yes/no, or y/n (case-insensitive). 0/1 are treated as int64. To treat a column recorded as 0/1 as boolean, convert it with Column Type Conversion.
int64 (integer)
Numbers without a decimal point (e.g., 1, 42, -10). Integers whose absolute value exceeds 9,007,199,254,740,991 () cannot be represented exactly as numbers, so auto-inference loads them as string.
float64 (floating point)
Numbers with a decimal point (e.g., 3.14, 0.5, -2.71). Columns that mix integers and decimals are loaded as float64.
date
Date data. Write dates in CSV as YYYY-MM-DD (e.g., 2025-11-17). YYYY/MM/DD, MM/DD/YYYY, and MM-DD-YYYY are also accepted. Zero-padding of month and day is optional, so values like 2025/1/15 are accepted. Loaded dates are displayed as YYYY-MM-DD regardless of the separator or the browser timezone. Month-day-year order is always interpreted as MM/DD/YYYY. DD/MM/YYYY is not detected, so reformat CSVs recorded in that style to YYYY-MM-DD before loading.
datetime
Data including both date and time (e.g., 2025-11-17 14:30:00). Values with a time component are inferred as datetime even when the time is zero, as in 2025-01-15 00:00:00. Values are converted to UTC internally, so the displayed time depends on the timezone. When date-only values and datetime values are mixed in one column, the whole column is treated as datetime, and rows without a time are stored as 00:00 UTC. For example, a browser in JST displays 2025-01-15 as 2025/1/15 9:00:00. See Datetime Data and Timezones for details.
string
Text data that does not match any of the above. MIDAS treats only empty cells as missing values; strings such as NA, ., and - are not treated as missing and are loaded as text. These values can make type inference for the column fail and fall back to string, so represent missing values as empty cells.
enum Categorical data with a restricted set of valid values. Enums are not auto-inferred; define them manually and convert string columns. Because enums can define the order of their values, use enum type for ordered categories. With ordinal scale, enum columns get order-based statistics such as the median and quartiles, but statistics that assume distances between values, such as the mean and standard deviation, are not computed. See Enum Definitions for creation steps and details on statistics.
Data types appear in the second row of the column header in the Data Table: the first row shows the column name, and the second row shows the data type and measurement scale (e.g., int64 interval). If a data type is not determined correctly, fix it with Column Type Conversion. Type conversion does not modify the original dataset; the result is created as a new dataset. Changing the measurement scale (Edit Scale of Measurement, described below) only updates column metadata and does not create a new dataset.
Datetime Data and Timezones
MIDAS stores datetime data in UTC. Timezone information is not preserved.
If a value includes a timezone offset, it is converted to UTC using that offset. Datetimes that include a time component but no offset are interpreted in the browser's timezone. Date-only values are interpreted as 00:00 UTC. Values are displayed in the browser's local timezone. Aggregations are based on the internal UTC values, so opening an MDS file on devices in different timezones does not change aggregation results (counts, means, grouping at time boundaries, etc.). Only the local-timezone display changes.
The following examples assume the browser timezone is JST (+09:00).
| CSV value | Interpretation on load | Data Table display |
|---|---|---|
2025-01-15 14:30:00 | Interpreted as 14:30 JST | 2025/1/15 14:30:00 |
2025-01-15T14:30:00+09:00 | Interpreted as 14:30 JST | 2025/1/15 14:30:00 |
2025-01-15T14:30:00Z | Interpreted as 14:30 UTC | 2025/1/15 23:30:00 |
In a column that mixes date-only and datetime values, date-only rows are interpreted as 00:00 UTC. In the examples above, 2025-01-15 would be displayed as 2025/1/15 9:00:00.
Datetimes with a time component and no timezone information are interpreted in the browser's timezone. On the same device, times appear exactly as written in the CSV, but because they are stored in UTC, the display changes when the MDS file is opened on a device in a different timezone. Loading the same CSV on devices in different timezones also stores different UTC values for datetimes without an offset. When timezone consistency matters, add timezone offsets to the datetimes in the CSV or normalize them to UTC before loading.
Measurement Scales
Measurement scales classify "what operations are meaningful for a given data". Based on Stevens' (1946) four levels of measurement. The scales form a hierarchy — nominal < ordinal < interval < ratio — where higher scales support more operations and include all operations available to lower scales. MIDAS uses this classification to filter which statistics are displayed. The scale level alone does not uniquely determine the analysis method, so set the scale according to the meaning of your data and change it after loading if needed.
Nominal Scale
Data representing categories with no meaningful order. Only equality (, ) is meaningful.
Examples: Gender (male/female), colors (red/blue/green), country names
Ordinal Scale
Categories with meaningful order, but no defined interval between values. Comparisons (, ) are meaningful.
Examples: Satisfaction (low/medium/high), grade level (1st/2nd/3rd year), grades (A/B/C/D)
Interval Scale
Equally spaced numeric data where differences are meaningful, but ratios are not. The zero point is arbitrary.
Examples: Temperature (Celsius), year (AD)
- The difference between 20°C and 10°C is a meaningful 10°C
- However, 20°C is not "twice as warm" as 10°C
Ratio Scale
Equally spaced numeric data with a true zero point. Both differences and ratios are meaningful.
Examples: Height, weight, price, age
- The difference between 20kg and 10kg is a meaningful 10kg
- Furthermore, 20kg is "twice as heavy" as 10kg
Ratio scale is not assigned by auto-inference. Whether zero represents a true origin depends on the meaning of the data and cannot be determined from values alone. For data with a true zero point such as height or weight, right-click the column header in the Data Table and set ratio scale from Edit Scale of Measurement. Setting ratio scale adds the coefficient of variation (cv) and the geometric mean (geo mean) to basic statistics. See Basic Statistics for the display conditions.
Auto-Inference from Data Types to Measurement Scales
MIDAS determines data types on import and automatically assigns measurement scales. Auto-inference is only an initial value — after loading, data type and measurement scale can be changed independently.
| Data Type | Inferred Scale | Reason |
|---|---|---|
| boolean | Nominal | true/false are unordered categories |
| int64 | Interval | Differences between numbers are meaningful, but whether zero represents a true origin depends on the data |
| float64 | Interval | Same as integers |
| date | Interval | Date differences are meaningful, but ratios of dates are not |
| datetime | Interval | Same as dates |
| string | Nominal | Text is treated as categories |
| enum | Nominal | Even when an order is defined, whether the analysis uses it is made explicit by the scale setting |
Auto-inference may not match the actual meaning of the data. For example, postal codes and ID columns are loaded as numeric and assigned interval scale, but they are semantically nominal. For these columns, right-click the column header in the Data Table and select Edit Scale of Measurement to change to the appropriate scale. Note that numeric strings with leading zeros (0060001, 001, etc.) are automatically loaded as string, so leading zeros are preserved.
For ordered categories stored as text, create an Enum definition and convert the column to Enum type. A string column cannot define the order of its values, so changing the scale to ordinal only shows frequency counts; order-based statistics such as the median are not computed. Data type and measurement scale are independent, so defining the order in an Enum does not automatically change the scale to ordinal. After converting to Enum type, change the scale to ordinal from Edit Scale of Measurement.
References
- Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 677-680. https://www.jstor.org/stable/1671815
See also
- Data Preparation and Import - File formats and import steps
- Basic Statistics - Statistics displayed by measurement scale
Also available as a Markdown file.