Predicate Pushdown Demystified: From Power Query Folding to Starburst EXPLAIN Plans
Practical lessons, forum insights, and SQL rewrites to make your queries truly efficient
Have you ever wondered why some of your reports are lightning-fast while others take forever to load? The answer often comes down to an optimization technique called predicate pushdown. While it sounds technical, the concept is simple and incredibly powerful. It’s the invisible work that makes your data pipelines run faster.
Predicate vs. Pushdown: The Core Difference
Think of it this way:
A predicate is your filter. It's the "what." When you write
WHERE sales_year = 2024
, that condition is your predicate.Pushdown is the action of applying that filter as early as possible. It's the "how." The goal is to filter out irrelevant data at the source before it's even read into memory.
When pushdown works, you're not moving a massive amount of raw data across the network just to throw most of it away later. You're telling the data source, "Hey, I only need the data for 2024," and it gives you back a much smaller, pre-filtered dataset. This saves time, network bandwidth, and processing power.
Why Predicate Pushdown Matters
When we build reports or pipelines, a lot of inefficiency comes from dragging too much data to the query engine and making it do work that the source system could have done faster.
That’s where predicate pushdown comes in. Instead of filtering, aggregating, or transforming after pulling rows into Power BI, Tableau, or Starburst, you let the underlying source (SQL Server, Delta Lake, etc.) do the work.
This simple shift can mean the difference between scanning millions of rows unnecessarily versus returning only what’s needed.
Pushdown in Power Query
In Power BI Power Query, pushdown is achieved through query folding.
When folding happens, your M transformations are translated back into SQL (or the native query language of the source).
When folding breaks (due to certain operations or custom functions), the engine retrieves all data and applies transformations locally — much slower.
How to enforce pushdown in Power Query:
Apply filters and aggregations as early as possible.
Prefer reference queries over duplicates (to maximize folding).
When needed, use native SQL queries directly in Power Query.
Folding indicators in Power Query (View Native Query
) are your best friend to confirm pushdown.
What Starburst Forums Say
Sometimes it’s hard to know which operators or functions are eligible for pushdown. A very useful discussion from the Starburst community forum makes this clear:
Question:
How do we know which operator/function is eligible for pushdown and which is not? For example, are the following operators/functions eligible for pushdown:
CONCAT(col, '-', partitionCol)
col || '-' || partitionCol
ARRAY_INTERSECTS()
ANY_MATCH()
Answer:
You can runEXPLAIN
and see if the expression is still in the query plan. That is the definitive way. It’s also specific to the connector.If an aggregate function is successfully pushed down to the connector, the explain plan does not show that Aggregate operator.
… this query does not show any Aggregate operator with the count function, as this operation is now performed by the connector.
Also, you can take a look at EXPLAIN output, the Scan line (e.g., “TableScan”, “ScanFilter” or “ScanFilterProject”). The scan line will indicate the agg pushdown did happen.
This is one of the most practical checks EXPLAIN is your truth source for pushdown verification.
Real Example: Aggregate Pushdown on Delta Lake
In the Starburst forum, Lester Martin shared an example validating predicate pushdown on Delta Lake tables: Here again, the EXPLAIN plan is used as a diagnostic tool. If the Aggregate operator is missing in the plan, that means the pushdown worked and the connector itself handled the operation instead of Starburst.
The Practical Problem: An Inefficient Query Plan
Let’s look at a concrete case using Azure SQL Database with the AdventureWorksLT dataset.We want to find the MIN and MAX dates from the BigTransactionHistory
table to build a dynamic date dimension.
A common but inefficient query might look like this:
SELECT
MIN(CAST(transactiondate AS DATE)) AS mindate,
MAX(CAST(transactiondate AS DATE)) AS maxdate
FROM "bigTransactionHistory";
This works but if you check the EXPLAIN plan, you’ll see an expensive table scan.
Why? Because the CAST()
function on the indexed column prevents the connector from leveraging the existing index. The query engine is forced to pull all data and compute locally.
The Solution: Writing Pushdown-Enabled SQL
The solution is to write SQL that the query planner can easily optimize. We want to avoid functions that prevent pushdown from happening.
In this case, the CAST
function on the indexed column is the issue. A better approach is to perform the MIN
and MAX
operations directly on the indexed column and then apply the CAST
function on the result.
The Optimized SQL Query:
WITH min_max_dates AS (
SELECT
MIN(transactiondate) AS min_datetime,
MAX(transactiondate) AS max_datetime
FROM "bigTransactionHistory"
)
SELECT
CAST(min_datetime AS DATE) AS mindate,
CAST(max_datetime AS DATE) AS maxdate
FROM min_max_dates;
This small rewrite allows the planner to push MIN and MAX down to the database. The index on transactiondate
is used, and only two values are returned. The CAST is applied locally but just on two rows, not millions
Lessons Learned for Data Professionals
Optimization is a team sport. Sometimes the bottleneck isn’t your BI tool or Starburst, but the way the source system query is written.
Order matters. Prioritize filters and aggregations before transformations that break folding (like casts or custom functions).
Reference vs Duplicate. In Power Query, use references to preserve folding chains.
Check EXPLAIN. Whether in SQL or Starburst, EXPLAIN tells you if pushdown really happened.
Write pushdown-friendly SQL. Avoid wrapping indexed columns in functions when possible.
Final Takeaway
Predicate pushdown isn’t just a backend optimization trick. It’s a practical skill for data engineers, BI developers, and analysts. By writing queries and transformations that let the source system do its work, you save time, bandwidth, resources and deliver faster dashboards and pipelines.
Next time you’re tuning a Power Query step or debugging a Starburst query, check the EXPLAIN plan and ask: Did I really push this down?