SQL Editor

Use SQL to filter, aggregate, and join data.

Overview

SQL Editor allows you to execute SQL queries against datasets in MIDAS and save the results as new datasets (derived datasets). It is useful for joining multiple datasets or filtering with complex conditions.

SQL Editor basic

Basic Usage

Opening SQL Editor

Select Data > SQL Query Editor from the menu bar to open a new SQL Editor tab.

Running Queries

Enter a SQL query in the editor and click the Execute Query button, or press Cmd/Ctrl+Enter to execute.

SQL Editor query

Results are displayed in the "Query Result" section. The header shows the total row count, and the first 10 rows are shown as a preview. The saved dataset contains all rows from the query result.

SQL Editor result

If the query contains an error, such as a syntax error or a nonexistent table or column name, an error message appears below the Execute Query button. Fix the query and run it again.

Cancelling a Query

While a query is executing, the Cancel Query button is displayed. Click this button to cancel the query. After cancellation, you can execute a new query.

Saving Results

Save query results as a new dataset. Enter a name in the Output Name field and click Save as Dataset to add the derived dataset to your project.

SQL Editor save

If you enter the name of an existing dataset, a dialog appears and lets you choose to save under a suggested name with a number appended, enter a different name, or replace the existing dataset.

Writing Queries

Basic Query Examples

To retrieve specific columns, specify column names in the SELECT clause.

SELECT species, island, body_mass_g
FROM penguins

Use the WHERE clause to extract only rows matching conditions.

SELECT *
FROM penguins
WHERE body_mass_g > 4000

Combine GROUP BY with aggregate functions for group-level calculations.

SELECT species, COUNT(*) as count, AVG(body_mass_g) as avg_mass
FROM penguins
GROUP BY species

Use ORDER BY to sort results and LIMIT to restrict row count.

SELECT *
FROM penguins
ORDER BY body_mass_g DESC
LIMIT 10

Joining Multiple Tables

Use JOIN to combine multiple datasets.

SELECT a.*, b.category
FROM sales a
JOIN products b ON a.product_id = b.id

To concatenate datasets with the same column structure vertically, use UNION ALL. The number and types of the columns must match.

SELECT * FROM sales_2024
UNION ALL
SELECT * FROM sales_2025

Table Names

Datasets in your project can be referenced as SQL tables. Table names match dataset names, so write the dataset name directly in the FROM clause. You can check dataset names in Project Overview. If your project has no datasets yet, import your data first by following Data Preparation and Import.

SELECT * FROM penguins

Dataset names are case-insensitive. FROM penguins and FROM Penguins resolve to the same dataset. Names containing special characters like hyphens or spaces must be enclosed in double quotes.

SELECT * FROM "bike-sharing"

Supported SQL Features

MIDAS SQL Editor is based on DuckDB, a SQL database engine for analytical workloads, and supports standard SQL features.

  • SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT
  • INNER/LEFT/RIGHT/FULL/CROSS JOIN
  • Subqueries
  • UNION, INTERSECT, EXCEPT
  • Window functions (ROW_NUMBER, RANK, LAG, LEAD, etc.)
  • WITH clause for CTEs (Common Table Expressions)
  • CASE expressions
  • Aggregate functions (COUNT, SUM, AVG, MIN, MAX, STDDEV, etc.)

See the DuckDB SQL syntax documentation for details.

Autocomplete

SQL Editor displays autocomplete suggestions as you type. The suggestions change depending on the cursor position.

  • Dataset names appear immediately after FROM or JOIN.
  • Column names appear elsewhere, based on the datasets referenced in FROM/JOIN. Column suggestions are shown together with SQL keywords (SELECT, FROM, WHERE, GROUP BY, and so on) and aggregate functions (COUNT, SUM, AVG, MIN, MAX).

SQL Editor column autocomplete

Table aliases and qualified names

Declare a table alias with FROM "dataset" AS t or the shorter form FROM "dataset" t. Typing the alias followed by a dot, such as t., shows only the columns of that dataset. SQL keywords are excluded from this list.

If you omit the alias, you can use the dataset name as a qualifier. Both the quoted form "dataset-name". and the unquoted form dataset. trigger column completion.

Triggering completion explicitly

If suggestions do not appear automatically, press Ctrl + Space to trigger completion on demand. macOS also uses Control + Space because Cmd + Space is reserved by Spotlight.

Keyboard Shortcuts

ShortcutAction
Cmd/Ctrl + EnterRun query
Ctrl + SpaceTrigger autocomplete explicitly
Cmd/Ctrl + FSearch
TabInsert indentation

Ctrl + Space uses the same key combination on macOS. Cmd + Space is not bound because it conflicts with Spotlight.

Derived Datasets

Datasets created in SQL Editor are saved as "Derived Datasets". Derived datasets record dependencies on source datasets, which can be viewed in the Project Lineage tab. When source data is updated, the derived dataset cache is invalidated and automatically recalculated the next time the data is needed (Lazy Evaluation and Caching). The measurement scales and data types of the result columns are inherited from the parent datasets. See Datasets for details.

Editing Derived Datasets

Existing derived datasets created with SQL can be edited. Right-click a dataset in the Project Lineage tab, or open the dataset menu (⋮) in Project Overview, and select Edit Operation... to open SQL Editor in edit mode.

In edit mode, if other datasets, models, or reports depend on this derived dataset, a warning is displayed showing the type and count of affected items. Modify the query and run it with Execute Query; the Update Query button then appears in the result preview. Click it to apply the changes. Derived datasets that depend on this dataset are recalculated the next time their data is needed, and dependent models are automatically re-estimated.

Limitations

MIDAS SQL Editor only supports SELECT statements (data retrieval). Data modification commands like INSERT, UPDATE, DELETE, and DDL commands like CREATE TABLE are not available. Only one statement can be executed at a time. Multiple queries separated by semicolons are not supported.

The ICU extension (timestamp-with-time-zone handling) and the JSON extension are built in. Other DuckDB extensions are not loaded, so functions that depend on the httpfs extension, such as read_csv('https://...'), cannot fetch data from external URLs in SQL queries. Outbound connections are also blocked by the browser's security mechanisms; see Privacy and Security for details. To load data from a file or URL, use the Data > Import Data... menu instead.

Additionally, since it runs in the browser, there are memory limitations that may restrict processing of very large datasets.

See also