Monitoring Snowflake Performance: 5 Key Metrics & Tools
“Snowflake performance” is an umbrella term that encompasses a wide range of meanings. As such, monitoring Snowflake performance is far from simple. It involves triangulating multiple dynamic, complex metrics.
In this article, we’re going to demystify several key Snowflake performance metrics you should monitor. These will help you gauge whether your resources are being deployed effectively and identify areas for continuous improvement.
What is Snowflake performance monitoring?
Snowflake performance monitoring involves tracking the performance, speed, and health of your queries, warehouses, and other components of your Snowflake infrastructure. The goal is to provide the insights necessary to make the platform run faster, meet client expectations, and avoid unnecessary spend.
Why is Snowflake performance monitoring necessary?
Performance monitoring is critical for all cloud data warehouses, not just Snowflake. Cloud platforms are notorious for their near-infinite scalability. This can be a benefit, as you won’t have the same resource restrictions as an on-prem solution.
But there’s a drawback: near-infinite scalability means that you can provision as many resources as necessary, and thus spend as much as you need. However, if you don’t keep an eye on your performance, there are no hard limits on how much you could spend.
Of course, you could always place arbitrary restrictions on cloud spend through a Snowflake resource monitor, but there are serious drawbacks to this approach:
- It’s near-impossible to predict what your actual Snowflake usage will be in advance
- Workloads change in real time, often minutes at a time, and you need to be prepared to service those workloads
- Hard restrictions on compute resources will end up hurting your performance, causing customer dissatisfaction and potentially violating contracts and service-level agreements
The only way to avoid these challenges is to monitor your Snowflake performance. By doing this, you’ll end up realizing three primary benefits:
- By measuring dips and spikes in performance, you can predict when workloads will require additional resources and provision them accordingly
- You can get a handle on average costs and more accurately forecast future spend
- With the right tools, you can drill down into specific performance fluctuations and figure out what caused them and how to address them if they recur
What are the challenges with Snowflake performance monitoring?
Snowflake performance monitoring is easier said than done. While Snowflake does offer some built-in monitoring tools through Snowsight, their functionality is limited. Specifically, while Snowsight gives you a 30,000-foot view of your performance, it lacks the detail necessary to adequately address problems arising from heavy-hitter queries, users, or queues.
Another challenge with Snowflake performance monitoring is the need for real-time insights, not monthly, weekly, or even daily summaries. For example, you could have a spike in performance at 3:07am that lasts until 3:21am. If you have a 24-hour reporting period, you may not see that report until the next day—and by then you really can’t do anything about it.
Finally, no one metric gives you a perfect window into whether your Snowflake performance is “good” or “bad.” In fact, you need to measure multiple KPIs to get a handle on the quality of your performance.
Ready to start monitoring your Snowflake performance today? Snowflake Workload Intelligence is a free plugin that can help you get started.
5 key metrics to monitor Snowflake performance
So what are the table-stakes metrics for monitoring (and optimizing) your Snowflake performance? We’ve surfaced the four that, if you start measuring them today, will give the best insight into how well Snowflake is operating for you.
Additionally, we’ll walk through how Keebo incorporates these metrics into our ongoing mission of providing the best possible Snowflake optimization tools.
1. Query latency
Query latency (also called query execution time) in Snowflake is the time it takes for a query to fetch, compute, and return data. In the vast majority of cases, it’s determined by two factors: inefficient query design and under-provisioned compute resources.
Monitoring query latency before and after execution can help you better adapt resources to service those queries. If a particular query shows signs of inefficiency, you can provision more resources to that warehouse and prevent undue latency.
Keebo uses two machine learning (ML) processes to respond and adapt query execution strategies in real time:
- Predictive adjustments. The ML algorithm analyzes and anticipates resource needs, then adjusts the warehouse configuration in real time before the query is executed.
- Pattern recognition. By analyzing previous query execution history, Keebo can identify common patterns and apply pre-learned optimizations (reducing the need for overhead in query planning and execution).
For these strategies to work, you need to have a dynamic approach to resource allocation. This approach looks at historical and current workloads and analyzes them to predict future states to avoid both over- and under-utilization.
Both of these examples require real-time execution in order to be effective. Human engineers simply can’t move quickly enough to avoid performance issues and cost increases. The best solution is to use an AI to make these changes in real time so they can respond to sudden spikes in demand or query complexity.
2. Resource utilization (CPU, memory, & storage)
Warehouse performance is also determined by how well it utilizes particular resources. Failure to properly resource can result in bottlenecks, higher operational costs, and delayed processes.
Generally speaking, when we talk about cloud warehouse resources, we’re referring to three main categories:
- Central Processing Unit (CPU). Virtual CPUs process digital signals, fetching instructions from memory, performing required tasks, and sending output back into memory.
- Memory. Memory is the virtual holding place for system instructions—the place where information is stored for immediate use. More memory means that cloud functions can operate faster and more efficiently.
- Storage. This is where your system holds data over the long term. To access data from storage, you have to query it.
If you want to maintain the stability of your Snowflake infrastructure (and any other cloud resources you use), it’s critical that you effectively manage these resources.
Like with query latency, Keebo uses both real-time analysis and predictive resource management (based on historical data) to identify trends and patterns in resource needs and make sure adequate CPU, memory, and storage are provisioned.
Keebo uses reinforcement learning (RL) ML techniques to learn from new data streams without the constant need for retraining. Instead of using a static dataset, Keebo learns in near real-time from scenarios it encounters in actual Snowflake warehouses.
In terms of tactics to improve resource utilization, Keebo leverages load balancing to maximize usage of CPU, memory, and storage; and smart caching strategies to optimize memory usage and reduce read/write cycles.
3. Query compilation time
Query compilation is the process of transforming a query from its original language into a format that the system can understand and execute. This is generally done via an optimized query plan, which outlines the order in which data operations should be executed.
High query compilation time correlates to high query latency. By optimizing and accelerating query compilation, you can not only achieve faster performance, but also increase the throughput of queries that your system can handle. This will significantly boost your operational efficiency.
Keebo’s approach to improving query compilation time has two elements: automating query pattern recognition and caching optimized query plans. Let’s look at each in detail.
Optimization sharing
In optimization sharing, Keebo uses ML techniques to identify similarities among different queries and surface common patterns and trends. In essence, Keebo can apply previously successful optimization strategies to similar queries.
The primary benefit of optimization sharing is that you don’t have to constantly reinvent the wheel. If the only difference between two queries is a slight change in conditions or filters, Keebo can apply the same optimization to both. This significantly reduces redundant processing and significantly enhances overall performance.
Adaptive query optimization
This approach leverages ongoing performance feedback to continually refine query optimization strategies. As data distributions change or new types of queries are introduced, the system will remain optimized.
Adaptive query optimization is particularly important in dynamic cloud environments, where workloads and query efficiency is constantly in flux.
4. Network latency and throughput
Another critical component of cloud computing performance is network latency and throughput. Network latency is the time it takes for data to travel from Point A to Point B across a network. Throughput is the rate at which a cloud computing system can process queries.
Network latency can seriously slow down query execution time. As such, it’s a significant factor in keeping performance at an acceptable level and doesn’t cause any major challenges.
Keebo addresses this issue through real-time network performance analysis, adjusting data distribution and query execution in response. Intelligent data placement involves the place where cloud data is stored and minimizing potential delays in data retrieval.
Additionally, Keebo uses query routing to distribute queries to warehouses that are right-sized to handle them. Large queries go to large warehouses, medium to medium, and small to small. This approach helps ensure proper load balancing so no warehouses or over- or under-utilized.
5. Heavy hitter users & queries
While having insight into high-level Snowflake metrics is a good place to start, all they really do is tell you that you have a problem. They don’t tell you how to fix it.
Yes, Keebo can adjust your warehouses to optimize spend and performance in real time. But wouldn’t it be great to find out where those resource drags are coming from so you can keep them from happening in the first place?
That’s why you need to monitor heavy hitters, whether particular users or queries, that consume significantly more resources than average. So if you have a particular BI tool or engineer that’s constantly using inefficient queries, you can potentially address that root cause and bring your overall costs down and performance up.
Wondering who your heaviest hitters are in your own Snowflake performance? Snowflake Workload Intelligence is a free plugin on the Snowflake Marketplace that will tell you within minutes.
Snowflake performance monitoring FAQs
How do I monitor user activity in Snowflake?
You can use the Query History page in Snowsight to monitor user performance in Snowflake. However, if you have a large number of queries and users, it can be hard to get usable insights from this. Using the Snowflake Workload Intelligence free plugin can give you better visibility into heavy hitting users and their queries.
What is a KPI in Snowflake?
KPI stands for Key Performance Indicator, and includes various metrics that indicate whether Snowflake is running efficiently or not. These can include query latency, heavy hitters, resource utilization, network latency and throughput are all examples of Snowflake KPIs.
How do you measure warehouse performance in Snowflake?
A number of metrics can be used in measuring Snowflake warehouse performance, including load time, query time, utilization, availability, and more.
Monitoring Snowflake is good. Optimizing it is better.
Let’s return to that example from earlier of the query spike at 3am. You can’t wake up a human engineer or data analyst in the middle of the night to make the necessary changes. By the time they’re up and at their desk, the queries may have finished running!
That’s one of the reasons why you need an AI-powered solution to automate not only Snowflake performance monitoring, but also real-time optimization. Keebo is the only tool that transforms your data warehouse operations in real time by improving speed, efficiency, and cost effectiveness.
See for yourself how Keebo came to the rescue of Costco, Komodo Health, AllBirds and many more, so they can get the most out of their Snowflake environment. Schedule a demo today.