Basic Statistics

The Statistics tab displays statistical information for the selected columns.

See also the "View Basic Statistics" section in Getting Started.

Statistics by Data Type

The statistics displayed vary depending on the column's data type.

Numeric Type (int64, float64)

When you select a numeric column, the following statistics are displayed.

Numeric column statistics

Measurement Scales and Displayed Statistics

For numeric types, MIDAS displays only statistically meaningful items based on the column's measurement scale (Nominal, Ordinal, Interval, Ratio).

StatisticNominalOrdinalIntervalRatio
Category breakdown (Most frequent)oo
min / maxooo
Quantiles (median, etc.)ooo
mean / stdoo
skewness / ex. kurtoo
iqr / rangeoo

For example, postal codes should be treated as Nominal scale. When treated as nominal, mean and standard deviation are not displayed because numerical magnitude has no meaning for nominal scales.

See Data Preparation and Import for how to change measurement scales.

For numeric columns with Nominal or Ordinal scale, a breakdown of counts per value (Most frequent) is displayed.

Basic Information

The column header shows data type, measurement scale, valid count (n), and missing count (miss).

Example: FLOAT64 · interval · n=150 · miss=0

Data Distribution (Histogram)

Visualize the distribution of your data.

  • Bin count: Adjust the number of histogram bins
  • Show density: When checked, overlays a kernel density estimation curve on the histogram

Use the buttons at the top right of the chart to switch operation modes:

  • Pan mode: Drag to pan the chart
  • Select mode: Drag to select a range and highlight corresponding rows

Moments

  • mean: Average value xˉ\bar{x}
  • std: Sample standard deviation s=1n1(xixˉ)2s = \sqrt{\frac{1}{n-1}\sum(x_i - \bar{x})^2}
  • skewness: Skewness G1=n(n1)(n2) ⁣(xixˉs)3G_1 = \frac{n}{(n-1)(n-2)} \sum\!\left(\frac{x_i - \bar{x}}{s}\right)^3 where ss is the sample standard deviation defined above (bias-corrected, n3n \geq 3). 0 indicates symmetry; positive values indicate right-skewed
  • ex. kurt: Excess kurtosis G2=n(n+1)(n1)(n2)(n3) ⁣(xixˉs)43(n1)2(n2)(n3)G_2 = \frac{n(n+1)}{(n-1)(n-2)(n-3)} \sum\!\left(\frac{x_i - \bar{x}}{s}\right)^4 - \frac{3(n-1)^2}{(n-2)(n-3)} where ss is the sample standard deviation (bias-corrected, n4n \geq 4). 0 indicates same as normal distribution; positive values indicate heavier tails

For columns with ratio scale, the following statistics are also displayed:

  • cv: Coefficient of variation CV=s/xˉ×100\text{CV} = s / \bar{x} \times 100\\%. Represents the relative magnitude of variability to the mean
  • geo mean: Geometric mean (ixi)1/n\left(\prod_i x_i\right)^{1/n}. Defined only when all values are strictly positive. Hidden for columns that contain zero or negative values

Spread

  • iqr: Interquartile range (75th percentile - 25th percentile)
  • range: Range (maximum - minimum)

Quantiles

For a probability pp, quantiles are calculated on the sorted data x(1)x(2)x(n)x_{(1)} \le x_{(2)} \le \ldots \le x_{(n)} by computing h=(n1)p+1h = (n-1)p + 1 and linearly interpolating as Qp=x(h)+(hh)(x(h+1)x(h))Q_p = x_{(\lfloor h \rfloor)} + (h - \lfloor h \rfloor)\bigl(x_{(\lfloor h \rfloor + 1)} - x_{(\lfloor h \rfloor)}\bigr). Shows positions when data is sorted in ascending order:

  • 0%(min): Minimum value
  • 1%, 5%, 10%: Lower percentiles
  • 25%: First quartile
  • 50%: Median
  • 75%: Third quartile
  • 90%, 95%, 99%: Upper percentiles
  • 100%(max): Maximum value

String Type and Enum Type

When selecting a string or Enum column, the following are displayed:

String column statistics

  • Category Distribution: Bar chart showing count for each category
  • Unique values: Number of unique values
  • Most frequent: Most frequent values and their counts (click to select corresponding rows)

When an Enum column's measurement scale is changed to ordinal, min / max / median / Q1 / Q3 are also computed based on the position order of the Enum definition, in addition to the frequency counts above. IQR / mean / std / skewness / ex. kurt are not displayed because ordinal categories do not have a defined distance and arithmetic has no defined meaning. See Enum Definitions for details.

Boolean Type

When selecting a True/False column, the following are displayed:

  • True: Count and percentage of True values
  • False: Count and percentage of False values

Datetime Type

When selecting a datetime column, the following are displayed:

Datetime column statistics

  • Date Distribution: Chart showing data distribution over time
    • Interval: Select aggregation interval (Auto, 1 minute, 1 hour, 1 day, 1 week, 1 month, etc.)
    • Show trend: Display trend line
  • Earliest: The oldest datetime
  • Latest: The most recent datetime
  • Time span: Duration (e.g., "29 days, 22 hours")

Comparing Multiple Columns (Relationships)

When you select two or more numeric columns (Interval or Ratio scale), the Relationships section appears. The display varies depending on the number of selected columns.

Relationships section

Scatter Plot Matrix (2-4 columns)

Displays combinations of selected columns in a scatter plot matrix:

  • Diagonal: Histogram for each column
  • Off-diagonal: Scatter plots for column pairs

Correlation Matrix (5+ columns)

When 5 or more columns are selected, a Pearson correlation coefficient heatmap is displayed instead of the scatter plot matrix. Color intensity indicates correlation strength.

Comparison Table

Compare statistics of selected columns side by side. Displays type, scale, n, and miss as basic information, along with mean, std, skewness, ex. kurt, quantiles (min through max), iqr, and range for each column.

Grouping Feature

Select a column from the Show stats by dropdown to group data by that column's values and view statistics for each group.

How to Use

  1. Select a column to use for grouping from the Show stats by dropdown in the Statistics tab (e.g., species)
  2. Statistics are displayed for each value in the selected column

Usage Example

When selecting the sepal_length column in the Iris dataset and grouping by species:

  • Statistics for sepal_length of Iris-setosa
  • Statistics for sepal_length of Iris-versicolor
  • Statistics for sepal_length of Iris-virginica

are displayed separately, enabling comparison between species.

Row Selection Integration

You can select data rows from charts in the Statistics tab. See Row Selection for an overview of how selection works across tabs.

Row selection from histogram

Selection from Histogram

Use the buttons at the top right of the chart to switch operation modes:

  • Pan mode: Drag to pan the chart (default)
  • Select mode: Drag to select a range

Click a bar: Click a histogram bar to select rows within that bin (range).

Rectangle selection: In Select mode, drag to specify a range and select data within that area.

Selected rows can be viewed in the Selected Rows tab.

Adding to selection: Hold Ctrl (Mac: Cmd) while clicking to add to existing selection.

Selection from Scatter Plot

The scatter plot matrix displayed when multiple numeric columns are selected also supports clicking and rectangle selection for row selection.

Opening Filtered Data Tab

Double-click on a data point or bar in the chart to automatically open a Filtered Data tab displaying the selected data.