What is an SLA and how do I get started? Service-Level Agreements Explained
For many enterprise users, Snowflake touches multiple business functions: product, engineering, marketing, sales, customer success, and even external clients. Each function has its own use cases, KPIs, and demands on the platform. As data teams start optimizing Snowflake, small changes in one area can significantly impact the others. To eliminate confusion, organizations will deploy service level agreements (SLAs) to set clear expectations around performance, data latency, and other business critical needs.
If you’re wondering what an SLA is and how to get started with one, you’re in the right place. Read on to find answers to common questions about SLAs and our recommendations for successful Snowflake optimization that doesn’t violate your performance expectations.
What is a service-level agreement (SLA)?
A service-level agreement (SLA) defines the terms and conditions of a particular service, the KPIs and metrics used to measure that service, and the penalties incurred if those KPIs and metrics go unmet. SLAs can exist internally among teams or departments or externally among customers or vendors.
For Snowflake users, the most common SLA structure lays out specific conditions for cost and performance. Users can dictate a limit on Snowflake credits that data teams cannot exceed. Alternatively, they can establish guaranteed performance metrics when querying, retrieving, and conducting data ETLs.
For example, a software company may require its data teams to guarantee a maximum query latency to maintain desired performance levels. That maximum latency can be placed into an SLA, with penalties incurred for every query that exceeds that maximum.
Why are service-level agreements important for Snowflake users?
Service-level agreements can be a double-edged sword. On the one hand, they offer clarity around performance expectations. This keeps all parties on the same page and can help avoid unnecessary miscommunication and friction.
On the other hand, SLAs place constraints on Snowflake engineers and their ability to manage and optimize Snowflake. For example, if a Snowflake engineer wants to use AI to automatically and dynamically adjust Snowflake warehouse size to optimize spend, performance requirements spelled out in the SLA may limit their ability to do so.
But at the end of the day, the case for SLAs outweighs the case against them:
- Clear expectations around performance and why those expectations are important to the recipient(s), which helps to drive accountability on all sides
- Defined, objective metrics that determine expected levels of service and whether those expectations are met
- Spelled-out remediation possibilities should either party fail to meet their responsibilities
- Built-in conflict resolution frameworks to help address disruptions in service and fix them as quickly as possible
- Improved alignment among various teams, ensuring mutual success across the board
- Higher retention rates as customers’ expectations are met and Snowflake contributes to their overall success
What are the key components of an SLA?
The key components of a service-level agreement include specifics of services provided, each party’s responsibilities, escalation and conflict resolution procedures, and cost-service tradeoffs.
If you’re looking for a rundown of an SLA, here’s a bare-bones outline:
- Goals and objectives to be covered in the SLA
- Stakeholders involved (typically a “provider” and “recipient” of the service)
- Periodic review period outlining the effective data, expiration date, and specific review timelines within the SLA
- Service agreement—this is the longest part of the SLA. This includes the key components for which the provider is responsible, and KPIs or metrics used to measure those components.
- Service management elements, which include dispute resolution processes, an indemnification clause to prevent the customer from litigation, mechanisms for updating the agreement, and more
For Snowflake users specifically, here are some specific areas your SLA will need to cover:
- Data availability
- Change management processes
- Compliance standards
- Data location, access, & portability
- Disaster recovery expectations
- Exit strategies
- Governance
- Performance and uptime statistics
- Security specifications, including specific encryption practices for data protection and privacy
Most service level agreements (SLAs) that Snowflake users have to contend with focus on ensuring performance, especially around uptime and query latency.
How do SLAs impact Snowflake cost optimization?
Despite these advantages, service-level agreements can present a potential obstacle to Snowflake cost optimization. Because SLAs require Snowflake users to meet specific expectations, they have to be careful not to make any adjustments that could potentially hinder them from meeting those expectations.
Any time you’re working to reduce Snowflake spend, there’s a risk those cost savings measures will hinder performance. Most of the time, these performance hits are negligible, or only impact unutilized or underutilized warehouses. In these cases, data leaders will simply make the judgment call that a slight hit to performance is worth the larger amount that they can save.
If there’s an SLA in place, however, data team leaders don’t really have the authority to make that call. They’re legally bound to meet certain performance criteria. This essentially ties their hands behind their backs, limiting what all they’re able to do.
Sometimes, SLAs can completely close the doors to certain cost optimization techniques. For example, in some cases, an SLA may prohibit the use of resource monitors in Snowflake. Because resource monitors will shut down the warehouse once you reach a certain credit usage threshold, you run the risk of violating your SLA with an unexpected warehouse suspension.
In other cases, the limitations are more subtle. Consider a common strategy of scaling warehouses up and down based on usage and demand requirements. If you have an SLA that specifies that queries can’t run longer than 15 seconds, automatically scaling down the warehouse could end up extending query run time and violating the SLA.
So before you start engaging in Snowflake warehouse optimization(s), it’s crucial that you take into account all the various SLAs from both internal and external clients. That way, you have a clear understanding of the limitations at play.
How to optimize query performance without violating your SLAs
Once you have your SLA requirements in place, you need to choose a cost optimization strategy that takes those expectations into account. At this point, you have a few solutions at your disposal:
- Increase the warehouse size temporarily, then decrease it after the high latency query has run
- Route the query to another warehouse that has more available compute resources
- Optimize queries to reduce run time
Each of these approaches have their pros and cons. Ideally, you can use all of them in tandem to leverage the unique advantages of each.
Warehouse optimization: Dynamically adjust to compute resource demands
We’ve already briefly mentioned warehouse optimization. There’s no need to run a warehouse at a 5XL if you only need that compute power 10% of the time. Instead, you can dynamically scale warehouses up and down so you only provision the resources you need when you need them. This is a major factor in controlling Snowflake costs.
However, any time you optimize Snowflake spend, you run the risk of a performance hit. Often this is worth the trade-off. But if you have an SLA, that judgment call is out of your hands. But neither can you afford to keep spending unnecessarily on cloud data. So what’s the solution?
With Keebo, we offer built-in performance guardrails that allow you to automate your cost optimizations to back off in the event your query latency, queuing time, or number of queries in the queue exceeds a certain amount.
This solution offers the best of both worlds: minimize your Snowflake spend while protecting yourself from SLA violation.
Query routing: Maximize available compute resources
While warehouse optimization is an excellent way to control Snowflake costs, it does have its limitations. One limitation is that you actually don’t have much control over individual queries themselves.
This can be a challenge if your warehouse management logic is too closely tied up with your business logic. For example, two applications operate on their own dedicated warehouses. This can be great for overall business processes, but can present a challenge when one warehouse is overutilized and the other is underutilized.
But there are plenty of areas in which query optimization isn’t the best option. For example, if you’re using Looker’s drag-and-drop interface that generates a query via machine learning, you really don’t have the opportunity to optimize that query.
Query routing can close this gap. This Snowflake optimization solution enables you to match queries to the warehouse with the most available resources to handle it, no matter how your Snowflake architecture is structured. As the name suggests, incoming queries are analyzed and, depending on the anticipated resources required to fulfill that query, it is sent to the appropriate warehouse.
Unlike warehouse optimization, query routing doesn’t provision more or fewer resources. It simply looks at the resources you already have, and figures out how to best distribute your query workload to take advantage of them. As such, the risk of violating your SLA by using query routing is minimal.
Query optimization: Reduce overall compute resource needs
Another option for cutting costs without risking an SLA violation is to implement query optimization. This tactic involves making real-time changes to queries themselves, phrasing them in such a way that they use fewer compute resources to run.
One of the most basic examples in SQL is to avoid using the SELECT * command to retrieve all columns within a table. If you don’t need all that data, that’s thousands and even millions of data you’re wasting compute resources to retrieve. In this case, a query optimization tactic would be to use the SELECT command to name only specific columns of data to retrieve.
Like query routing, query optimization doesn’t run the risk of violating your SLA. In fact, by making existing queries run more efficiently, you can meet your SLA requirements more readily. The problem in this case is that in many cases, query optimization isn’t a feasible solution.
Service-level agreement FAQs
What are the three types of service-level agreements?
Generally speaking, there are three types of service-level agreements used by Snowflake users and data and IT teams more broadly:
- Customer-based SLA, where the SLA is written to meet the needs of a specific customer
- Service-based SLA, where the SLA is written to specify a service provided to a range of internal or external customers
- Operational SLA, which sets expectations for daily service operations (e.g. uptime, maintenance scheduling)
Additionally, organizations can write multi-level SLAs, which incorporate elements of all three types into a single agreement.
What is the purpose of an SLA?
The purpose of a service-level agreement is to define the terms and conditions of a particular service. An SLA also sets specific KPIs and metrics used to measure that service. If the service provider violates those terms, an SLA will also spell out the penalties or remedies required.
What is a common SLA example?
For Snowflake users, an example of a service-level agreement could be the following: the data team agrees to maintain a maximum query latency of 12 seconds in the 95th percentile. This means that no more than 5% of queries will have a latency of more than 12 seconds. The penalty for violating this agreement could be a discount on the customer’s next billing cycle.
What should an SLA include?
A service-level agreement should include the following elements: goals and objectives, identification of parties, periodic review period, service agreement, and service management elements.
What are some common SLA mistakes?
Common SLA mistakes include a failure to establish and agree upon SLAs up front, having too many and contradictory SLAs, failure to accommodate the provider’s point of view, lack of clarity when it comes to KPIs and metrics, and viewing SLAs as a one-time exercise.
Final thoughts on service-level agreements
For data teams, and Snowflake users in particular, service-level agreements can be a mixed bag. Yes, they provide needed clarity around expectations of all parties involved. At the same time, they can hinder effective Snowflake cost optimizations.
If reducing Snowflake spend is a priority, having a cost optimization strategy that takes SLAs into account is critical. Deploying a variety of tactics—especially warehouse optimization and query routing—is key to getting the best of both worlds.
Learn more about Keebo’s deep bench of Snowflake cost optimization tools—schedule a live demo here.