How Much is Your Snowflake Workload Configuration Costing You?
Did you know a significant chunk of your Snowflake costs could be incurred within your first hour of use?
Okay, this is a bit of an exaggeration. But the decisions you make during your initial Snowflake workload configuration can impact costs over time. Here are some ways to mitigate unnecessary spend on inefficient workloads—both before and after onboarding.
What is Snowflake workload configuration?
Snowflake workload configuration is the process by which data teams either assign workloads to specific warehouses or adjust warehouse settings to better handle the workloads assigned to them. To understand how this works, let’s first illustrate a few examples of common Snowflake workloads:
- Ad-hoc analytics: Self-service analytics or irregular heavy queries
- Data loading: A COPY command that pulls in data from external sources
- Data transformations: Series of commands that convert raw data into more consumable formats
- Reporting: Creating real-time dashboards and other reports either on a recurring or on-demand basis
- Applications: End-user platforms displaying data from query results
These different workloads—as well as other types not listed here—each consume different amounts of compute resources. For example, an ad-hoc analytics request may be able to run on an XS warehouse due to lower frequency, while applications with tens of thousands of daily uses will require an XL or 2XL warehouse to function without performance lags.
In addition to compute resources consumed, Snowflake workloads can be configured to run on specific warehouses based on the following factors:
- Frequency
- Concurrency
- Scan size
- Copy files
- Minutes required by SLAs
The ultimate goal of Snowflake workload configuration is to ensure queries are running on the warehouse best able to handle those workloads with the fewest number of resources consumed.
Why does Snowflake workload configuration impact costs?
Snowflake workload configuration, in a nutshell, describes how you set up your various Snowflake warehouses to meet your data needs.
For example, one of our customers uses Snowflake both for their internal data team, and to run queries and provide services to their end clients. This customer uses a warehouse cluster specifically for internal data, while also designating separate warehouses for each individual client. This makes it easier to track usage and billing on the back end. Additionally, none of their clients impact the internal data team or each other.
Because the biggest driver of Snowflake costs is compute resources, the size and number of warehouses within a given cluster significantly impact your credit consumption and, thus, your cloud spend.
And this can be a problem when your usage is spread out across multiple warehouses. Let’s say you have a cluster of three warehouses: A, B, & C. On average, Warehouse A uses 30% of its compute resources, Warehouse B uses 40%, and Warehouse C uses 110%. So not only are you paying to run multiple warehouses that, combined, would cost less (A & B), but you’re incurring penalties because one warehouse is using more credits than are provisioned (C).
This is a fairly basic scenario, but imagine you have dozens of warehouses and tens of thousands of queries. You can easily see just how quickly your spend can get out of control due to poor workload configuration.
How do organizations end up in this situation?
Not surprisingly, the answer to this question is: there’s really no one reason why Snowflake workloads are set up to consume too many resources. Usually there are multiple factors at play. Here are some of the most common.
Under-provisioning due to poor planning
No one goes out and intentionally builds inefficient Snowflake configurations. Typically it’s the result of poor (or lack of) planning. And in many cases, we’ve seen that data teams start small and build small, but as the organization grows and data requests increase, their initial configurations start to struggle with the increased complexity, and their overall performance starts to suffer.
Over-provisioning
Sometimes organizations have the foresight to think ahead to what their data resource needs will be over the life of their business. But then they run into another problem: overestimating their resource needs and, in turn, overprovisioning.
This common issue arises for one of two reasons. First is simply an overestimation of how fast the business will grow and user demands will accelerate. But another reason is a failure to consider solutions that can dynamically adjust Snowflake workloads and resource provisioning in real time. More on that below.
Data explosion
One of the major advantages of Snowflake as a platform is its ability to handle vast amounts of data from diverse sources. As your organization ingests new data sets, manipulating and transforming those data to function interoperably requires more resources. This problem only becomes more pronounced over time.
Technical rigidity
Another major flaw in most Snowflake workload configurations is their rigidity. We all know, of course, that user demands are dynamic and often unpredictable. However, most out-of-the-box Snowflake workload configuration tools are built to change from one static state to another.
So if you try to adjust workload configurations to the situation, you’ll be constantly making changes every hour, and often every minute, of every day. Not only is this unrealistic, but it’s a poor use of data resources.
How to configure a warehouse in Snowflake
Now that we’ve walked through the inherent challenges in Snowflake workload configuration, let’s see what tools Snowflake offers to assist with warehouse configuration.
Whether you’re using Snowsight or the Classic Console, you can perform the following functions natively within Snowflake:
- Increase or decrease warehouse sizes
- Set maximum warehouse cluster sizes (only for Enterprise plans)
- Set a scaling policy to conserve credit consumption (only for Enterprise plans or higher)
- Auto-suspend after a default amount of inactivity and auto-resume when a new query is submitted
Relying solely on these functions to manage Snowflake workloads can present a number of challenges:
- Continual need for manual adjustment—as mentioned above, dynamic user demands and workload needs will require constant manual adjustment to avoid over-provisioning or under-performance
- High risk of performance degradation—if you “set and forget” your auto-suspend or Snowflake resource monitor, a sudden spike in activity can cause your queries to slow down significantly, as Snowflake will prevent your warehouses from deploying additional resources to execute the workload
- Cost savings aren’t obvious—either you deploy personnel resources to manually adjust Snowflake (which consumes costs) or you’re dragging out query execution time through performance lags (which also consumes costs); in both cases, you may not be saving as much as you think
Potential solutions to inefficient Snowflake workload configuration
Thankfully, there are a variety of solutions at your disposal that can help mitigate the risks mentioned above. Many of them use Snowflake-native tools as a starting point, but go a step further in automating and managing the workload.
Active workload monitoring
The first step is to get a handle on what your Snowflake workload actually looks like and, more to the point, find out which queries, users, and warehouses are consuming the most compute resources.
You have three options for monitoring your Snowflake workload:
- Manually query via SQL
- Use the (limited) visibility tools provided by Snowsight
- Use Snowflake Workload Intelligence—a Marketplace app that runs natively in Snowflake—to identify heavy hitters and other major drags on performance in real time
Dynamic warehouse optimization
Identifying heavy hitter queries and users is only the first step. Unless you adjust your Snowflake workload configuration in response, you’ll just keep consuming unnecessary resources and spending way too much on Snowflake.
The first option is to dynamically adjust the warehouse size to accommodate the resource demands, then re-adjust the warehouse back to its original (default) state when those demands have abated.
Obviously, responding to every uptick in real time isn’t feasible for a human engineer. That’s why you need an AI-powered solution like Keebo that handles this in the background, 24/7. So even while you’re sleeping, you can rest assured that your workloads will always respond dynamically to user demands.
Query routing
The second option is to attack the problem from the opposite angle: instead of changing the warehouse to fit the query, send the query to an under-utilized warehouse. That way, you’re using resources you’ve already paid for.
Like warehouse optimization, query routing really only works at scale when automated by an AI-powered solution like Keebo.
Query optimization
In the same vein as query routing, you can (as a best practice) change not only where queries are deployed, but how they’re written. Certain SQL commands consume more compute resources than others, and often you can achieve the same result faster and cheaper by simply adjusting how you write your code.
There are plenty of ways to do this. One obvious example is instead of using the SELECT * command, you SELECT only data from the columns you need to perform the function at hand. Reducing query redundancies and utilizing stored procedures are two other examples.
Obviously this is an involved endeavor that requires extensive training and monitoring. But if you can do it well, you can seriously cut back on the resources you consume—especially when paired with the AI-powered solutions mentioned above.
Efficient table layout
A good number of Snowflake workloads consume more resources than they need to because the data tables aren’t efficiently laid out. Specifying column lengths, storing semi-structured data in a VARIANT column, and utilizing transient tables are all recommended table layout best practices for reducing data scans within Snowflake.
Micro-partitioning data
Another tactic you can use for optimizing Snowflake workload configuration is at a structural level, but it’s often worth the effort to set up. By arranging data within micro-partitions based on similarities, Snowflake can utilize clustering keys to minimize data scans—without sacrificing the quality of the data retrieved.
Maintain referential integrity
When referential integrity isn’t enforced in your Snowflake workload configuration, it requires either extensive resources to correct the problem, or writing workaround queries that are much longer than necessary and consume more resources. Either way, referential integrity in any database, but especially Snowflake, can be costly.
Making use of CONSTRAINT commands when you CREATE TABLE or ALTER TABLE can help maintain referential integrity as you go. You can retrieve a list of all table constraints by schema by using the TABLE_CONSTRAINTS command in the Information Schema.
Key Snowflake workload configuration principles
Now that we’ve walked through the inherent challenges with Snowflake workload configuration, let’s wrap things up with some principles that, whether you’re just starting out or are a Snowflake veteran, can be helpful as you continually optimize your resource consumption.
- Don’t over-provision. Instead, use dynamic warehouse optimization tools to acquire additional compute resources only when you need them.
- Use multiple optimization strategies. The most effective configuration strategies involve deploying multiple tactics in tandem. For example, implementing query best practices and efficient table layout can keep your usage small, and automated warehouse monitoring and query routing can keep it even smaller.
- Get granular. The devil is in the details. You need monitoring and optimization tools that can give specific answers to specific questions—that way you’ll know what you need to fix.
- Be conscious of personnel resources. It’s easy to forget that your team is its own cost—not only directly, but in terms of opportunity costs. You want your high-value engineers building new data pipelines and shipping new user resources, not spending all day optimizing what you currently have.
If you’re looking to optimize your Snowflake workload configuration dynamically and in real time, contact us to get started!