[Analytics Engineering] Turbo-charge Snowflake Query Performance by Avoiding Spilling to Storage by John Ryan Snowflake Feb, 2024 Medium

Sun Mar 17 13:58:41 UTC 2024

A nice blog post by John Ryan on how to turbo-charge Query Performance in Snowflake by avoiding spilling to storage: https://medium.com/snowflake/turbo-charge-snowflake-query-performance-by-avoiding-spilling-to-storage-d693def762c2

However this strategy must be used with care, lest you consume a lot of Snowflake Credits with nominal query performance improvement. One case I recently ran into was JOINing to a history table which did not have a Cluster Key defined on the as_of_date field that was being used for the JOIN. There was spillage to local storage. Moving to a large WH avoided Spillage to Local Storage, but the query performance did not improve that much. After analyzing the query plan, we noticed the JOIN to the history table was taking 60% of the execution time. We added a Cluster Key on the field that was used for the JOIN, and the query time was cut in less than half. So just avoiding the Storage Spillage is not enough, you have to analyze the query plan to ensure that the JOINs are not causing the slowness.

Note that Automatic Clustering was enabled on the history table. But this doesn’t alway work. If you know the column that is used in the JOINs, define that as the Cluster Key on the table.

Saqib

Previous message (by thread): [Wranglers] A Goldilocks Story: How Snowflake automatically optimizes micro-partition size for “just right” query performance
Next message (by thread): [Wranglers] Now you can execute an Azure Function in response to a data change event Azure SQL DB Table

More information about the Wranglers mailing list