Sample Datasets
MIDAS includes sample data that you can use to learn data analysis and visualization.
How to Open Sample Data
- Open MIDAS to see the launcher screen
- Click the dataset you want from the "Sample Data" section in the left sidebar
- The data loads and the project screen opens
Palmer Penguins
Measurement data of three penguin species observed in Antarctica (344 rows, 8 columns).
Columns
species: Penguin species (Adelie, Chinstrap, Gentoo)island: Island namebill_length_mm: Bill lengthbill_depth_mm: Bill depthflipper_length_mm: Flipper lengthbody_mass_g: Body masssex: Sexyear: Survey year
Contains some missing values.
Data source: https://allisonhorst.github.io/palmerpenguins/
License: CC0 (Public Domain)
Gapminder
Data for 142 countries from 1952 to 2007 (1,704 rows, 6 columns, 5-year intervals). Analyze trends in life expectancy, population, and GDP.
Columns
country: Country namecontinent: Continentyear: YearlifeExp: Life expectancypop: PopulationgdpPercap: GDP per capita (PPP, constant 2005 international dollars)
Data source: https://www.gapminder.org/data/
License: CC BY 4.0
Attribution: "Data from Gapminder Foundation, https://www.gapminder.org/data/, CC BY 4.0"
Auto MPG
Automobile fuel efficiency data from 1970 to 1982 (398 rows, 9 columns).
Columns
mpg: Fuel efficiency (miles per gallon)cylinders: Number of cylinders (4, 6, 8)displacement: Engine displacement (cubic inches)horsepower: Horsepowerweight: Vehicle weight (pounds)acceleration: Time to accelerate from 0 to 60 mph (seconds)model_year: Model year (70 = 1970, 82 = 1982)origin: Country of origin (usa, europe, japan)name: Vehicle model name
Contains some missing values.
Data source: https://archive.ics.uci.edu/dataset/9/auto+mpg
License: Public Domain
World Bank
Development indicators for 50 major countries (50 rows, 10 columns, 2021-2022 data).
Columns
country: Country namecountry_code: Country coderegion: Regionincome_group: Income grouppopulation_2022: Population (2022)gdp_usd_billions_2022: GDP (billions USD, 2022)gdp_per_capita_2022: GDP per capita (2022)life_expectancy_2021: Life expectancy (2021)urban_population_percent_2022: Urban population percentage (2022)internet_users_percent_2021: Internet usage rate (2021)
Data source: https://data.worldbank.org/
License: CC BY 4.0
Attribution: "Data from World Bank Open Data, https://data.worldbank.org/, CC BY 4.0"
Bike Sharing
Washington D.C. bike sharing data (2011-2012). Available in two versions: daily (731 rows) and hourly (17,379 rows).
Time Variables
instant: Record IDdteday: Date (YYYY-MM-DD)season: Season (1: Spring, 2: Summer, 3: Fall, 4: Winter)yr: Year (0: 2011, 1: 2012)mnth: Month (1-12)hr: Hour (0-23, hourly data only)weekday: Day of week (0: Sunday, 6: Saturday)holiday: Holiday flag (0: Regular day, 1: Holiday)workingday: Working day flag (1: Weekday, 0: Weekend or holiday)
Weather Variables
weathersit: Weather condition- 1: Clear, few clouds, partly cloudy
- 2: Mist + cloudy, mist + broken clouds
- 3: Light snow, light rain + thunderstorm + scattered clouds
- 4: Heavy rain + ice pellets + thunderstorm + mist
temp: Normalized temperature (Celsius divided by 41)atemp: Normalized feeling temperature (Celsius divided by 50)hum: Normalized humidity (humidity divided by 100)windspeed: Normalized wind speed (divided by max wind speed of 67 km/h)
Usage Counts
casual: Casual user countregistered: Registered user countcnt: Total count (casual + registered)
Count data with expected overdispersion (variance > mean).
Data source: https://archive.ics.uci.edu/dataset/275/bike+sharing+dataset
License: CC0 (Public Domain)
Earthquakes
Worldwide earthquake data from September 2024 (1,041 rows, 7 columns, magnitude 4.0+).
Columns
time: Occurrence datetimelatitude,longitude: Locationdepth: Depthmag: MagnitudemagType: Magnitude type (mb, mww, etc.)place: Location description
Data source: https://www.usgs.gov/programs/earthquake-hazards
License: Public Domain (USGS Data)
Iris
Measurement data of three iris species, a classic classification dataset (150 rows, 5 columns).
Columns
sepal_length,sepal_width: Sepal dimensionspetal_length,petal_width: Petal dimensionsspecies: Species
Data source: https://archive.ics.uci.edu/dataset/53/iris
License: Public Domain
Heart Failure
Clinical records of 299 heart failure patients (299 rows, 13 columns).
Columns
age: Ageanaemia: Anaemia status (0: No, 1: Yes)creatinine_phosphokinase: CPK enzyme level (U/L)diabetes: Diabetes status (0: No, 1: Yes)ejection_fraction: Ejection fraction (%)high_blood_pressure: High blood pressure status (0: No, 1: Yes)platelets: Platelet count (kiloplatelets/mL)serum_creatinine: Serum creatinine (mg/dL)serum_sodium: Serum sodium (mEq/L)sex: Sex (0: Female, 1: Male)smoking: Smoking status (0: No, 1: Yes)time: Follow-up period (days)DEATH_EVENT: Death event (0: Survived, 1: Died)
In the Survival Analysis tab, select time as the Time Variable and DEATH_EVENT as the Event Variable to generate Kaplan-Meier survival curves.
Data source: https://archive.ics.uci.edu/dataset/519/heart+failure+clinical+records
License: CC BY 4.0
Attribution: "Chicco, D., Jurman, G. (2020). BMC Medical Informatics and Decision Making. https://doi.org/10.1186/s12911-020-1023-5"
Dose Response
Insecticide dose-response data (8 rows, 4 columns).
Columns
dose: Insecticide concentration (mg/L)exposed: Number of insects exposed at each dose (trials)dead: Number of insects that died (successes)mortality_rate: Mortality rate (calculated from dead / exposed)
In the GLM tab, select the Binomial family, switch Response format to Grouped, and set dead as Successes and exposed as Trials. See the Grouped Binomial GLM Tutorial for step-by-step instructions.
Data source: Synthetic data created by the MIDAS project
License: CC0 (Public Domain)
Student's Sleep
Data published in 1908 by William Sealy Gosset under the pseudonym "Student" — the same paper that introduced the t-test (20 rows, 3 columns). Each of 10 subjects received two soporific drugs, and the increase in sleep compared to an unmedicated baseline was recorded. This is a paired design where the same subject received both drugs.
Columns
ID: Subject identifier (1-10)extra: Increase in hours of sleep compared to unmedicated baselinegroup: Drug administered (Drug 1, Drug 2)
Data source: Student (1908). The Probable Error of a Mean. Biometrika, 6(1), 1-25.
License: Public domain (published 1908)