Sample Datasets
MIDAS includes sample data that you can use to learn data analysis and visualization.
How to Open Sample Data
- Open MIDAS to see the launcher screen
- Click the dataset you want from the "Sample Data" section in the left sidebar
- The data loads and the project screen opens
Palmer Penguins
Measurement data of three penguin species observed in Antarctica (344 rows, 8 columns).
Columns
species: Penguin species (Adelie, Chinstrap, Gentoo)island: Island namebill_length_mm: Bill lengthbill_depth_mm: Bill depthflipper_length_mm: Flipper lengthbody_mass_g: Body masssex: Sexyear: Survey year
Contains some missing values.
Data source: https://allisonhorst.github.io/palmerpenguins/
License: CC0 (Public Domain)
Gapminder
Country-level data from 1952 to 2007 (1,704 rows, 6 columns). Analyze trends in life expectancy, population, and GDP.
Columns
country: Country namecontinent: Continentyear: YearlifeExp: Life expectancypop: PopulationgdpPercap: GDP per capita
Data source: https://www.gapminder.org/data/
License: CC BY 4.0
Attribution: "Data from Gapminder Foundation, https://www.gapminder.org/data/, CC BY 4.0"
Auto MPG
Automobile fuel efficiency data from 1970 to 1982 (398 rows, 9 columns).
Columns
mpg: Fuel efficiency (miles per gallon)cylinders: Number of cylinders (4, 6, 8)displacement: Engine displacement (cubic inches)horsepower: Horsepowerweight: Vehicle weight (pounds)acceleration: Acceleration (0-60 mph time in seconds)model_year: Model year (70 = 1970, 82 = 1982)origin: Country of origin (usa, europe, japan)name: Vehicle model name
Contains some missing values.
Data source: https://archive.ics.uci.edu/dataset/9/auto+mpg
License: Public Domain
World Bank
Development indicators for 50 major countries (50 rows, 10 columns, 2021-2022 data).
Columns
country: Country namecountry_code: Country coderegion: Regionincome_group: Income grouppopulation_2022: Population (2022)gdp_usd_billions_2022: GDP (billions USD, 2022)gdp_per_capita_2022: GDP per capita (2022)life_expectancy_2021: Life expectancy (2021)urban_population_percent_2022: Urban population percentage (2022)internet_users_percent_2021: Internet usage rate (2021)
Data source: https://data.worldbank.org/
License: CC BY 4.0
Attribution: "Data from World Bank Open Data, https://data.worldbank.org/, CC BY 4.0"
Bike Sharing
Washington D.C. bike sharing data (2011-2012). Available in two versions: daily (731 rows) and hourly (17,379 rows).
Time Variables
instant: Record IDdteday: Date (YYYY-MM-DD)season: Season (1: Spring, 2: Summer, 3: Fall, 4: Winter)yr: Year (0: 2011, 1: 2012)mnth: Month (1-12)hr: Hour (0-23, hourly data only)weekday: Day of week (0: Sunday, 6: Saturday)holiday: Holiday flag (0: Regular day, 1: Holiday)workingday: Working day flag (1: Weekday, 0: Weekend or holiday)
Weather Variables
weathersit: Weather condition- 1: Clear, few clouds, partly cloudy
- 2: Mist + cloudy, mist + broken clouds
- 3: Light snow, light rain + thunderstorm + scattered clouds
- 4: Heavy rain + ice pellets + thunderstorm + mist
temp: Normalized temperature (Celsius divided by 41)atemp: Normalized feeling temperature (Celsius divided by 50)hum: Normalized humidity (humidity divided by 100)windspeed: Normalized wind speed (wind speed divided by 67)
Usage Counts
casual: Casual user countregistered: Registered user countcnt: Total count (casual + registered)
Count data with expected overdispersion (variance > mean).
Data source: https://archive.ics.uci.edu/dataset/275/bike+sharing+dataset
License: CC0 (Public Domain)
Earthquakes
Worldwide earthquake data from September 2024 (1,041 rows, 7 columns, magnitude 4.0+).
Columns
time: Occurrence datetimelatitude,longitude: Locationdepth: Depthmag: Magnitudeplace: Location description
Data source: https://www.usgs.gov/programs/earthquake-hazards
License: Public Domain (USGS Data)
Iris
Measurement data of three iris species, a classic classification dataset (150 rows, 5 columns).
Columns
sepal_length,sepal_width: Sepal dimensionspetal_length,petal_width: Petal dimensionsspecies: Species
Data source: https://archive.ics.uci.edu/dataset/53/iris
License: Public Domain
Heart Failure
Clinical records of 299 heart failure patients (299 rows, 13 columns).
Columns
age: Ageanaemia: Anaemia status (0: No, 1: Yes)creatinine_phosphokinase: CPK enzyme level (mcg/L)diabetes: Diabetes status (0: No, 1: Yes)ejection_fraction: Ejection fraction (%)high_blood_pressure: High blood pressure status (0: No, 1: Yes)platelets: Platelet count (kiloplatelets/mL)serum_creatinine: Serum creatinine (mg/dL)serum_sodium: Serum sodium (mEq/L)sex: Sex (0: Female, 1: Male)smoking: Smoking status (0: No, 1: Yes)time: Follow-up period (days)DEATH_EVENT: Death event (0: Survived, 1: Died)
In the Survival Analysis tab, select time as the Time Variable and DEATH_EVENT as the Event Variable to generate Kaplan-Meier survival curves.
Data source: https://archive.ics.uci.edu/dataset/519/heart+failure+clinical+records
License: CC BY 4.0
Attribution: "Chicco, D., Jurman, G. (2020). BMC Medical Informatics and Decision Making. https://doi.org/10.1186/s12911-020-1023-5"
Dose Response
Insecticide dose-response data (8 rows, 4 columns).
Columns
dose: Insecticide concentration (mg/L)exposed: Number of insects exposed at each dose (trials)dead: Number of insects that died (successes)mortality_rate: Mortality rate (for reference)
In the GLM tab, select the Binomial family, switch Response format to Grouped, and set dead as Successes and exposed as Trials. See the Grouped Binomial GLM Tutorial for step-by-step instructions.
Data source: Synthetic data (inspired by Bliss, 1935)
License: CC0 (Public Domain)
Student's Sleep
Data published in 1908 by William Sealy Gosset under the pseudonym "Student" — the same paper that introduced the t-test (20 rows, 3 columns). Records the extra hours of sleep gained by 10 subjects under two soporific drugs, compared to a control.
Columns
ID: Subject identifier (1-10)extra: Increase in hours of sleep compared to controlgroup: Drug administered (Drug 1, Drug 2)
Data source: Student (1908). The Probable Error of a Mean. Biometrika, 6(1), 1-25.
License: Public domain (published 1908)