Keebo | Snowflake Gen 2 Warehouses: What’s Really New?

Every time Snowflake unveils a major upgrade, my team at Keebo begins independent due‑diligence testing as soon as the feature hits our accounts. Our customers count on us for an unbiased take—and our platform must be ready to optimize whatever Snowflake ships next. Generation 2 Standard Warehouses (Gen 2) were no exception. We’ve already replayed production workloads side‑by‑side with Gen 1, reproduced Snowflake’s headline benchmark, and uncovered nuances that matter for your bill. Spoiler: Gen 2 can enable savings, but it won’t automatically shrink every invoice. Read on to see when it helps, when it hurts, and how to decide.

1 Why We Put Gen 2 Under the Microscope

  • Independent fact‑checking. Vendor benchmarks rarely reflect messy real‑world workloads. Our job is to validate—or debunk—the marketing slides so you don’t have to.
  • Platform readiness. Keebo’s optimization engine needs to recognize Gen 2 warehouses, predict their cost/perf curves, and steer workloads accordingly the day you flip the switch.
  • Customer guidance. Numerous Snowflake customers have reached out to us asking, “Should I switch to Gen2? Will it save me money?”—hence this deep dive.

2 A Hardware Refresh: Now on AWS Graviton 3

Snowflake’s release notes mention only “faster hardware.” Digging under the hood shows Gen 2 nodes on AWS run C7g instances powered by the Arm‑based Graviton 3 CPU (Medium deep‑dive).

Gen 1 (C6g / Graviton2)Gen 2 (C7g / Graviton3)
Arm Neoverse N1Arm Neoverse V1 (~50 % higher IPC)
NEON 128‑bit x2 SIMDSVE 256 bit ×2 SIMD
DDR4‑3200DDR5‑4800 (~50 % more bandwidth)
1 MiB L2/core2 MiB L2/core

What that jargon means in plain English

  • SIMD (Single Instruction, Multiple Data) lets one CPU instruction crunch many numbers at once—picture scanning eight spreadsheet cells in parallel instead of one‑by‑one. Doubling the lane width means columnar scans and hash aggregations chew through twice as many values per tick.
  • More memory bandwidth & cache widen the on‑ramp to the CPU: data arrives faster, joins spill less, and skewed group‑bys stall less often.
  • Higher IPC (instructions per cycle) means each core does more useful work every nanosecond.

AWS backs this up with micro‑benchmarks showing up to 60 % higher integer throughput and 2× analytics perf versus Graviton 2 (AWS Graviton 3 overview).

Fun note: I’ve heard (word of mouth) that Graviton 3 was architected by some of University of Michigan alum. Go Blue!

(Azure regions haven’t published the backing SKU yet, but expect a similar jump to AMD Genoa or Intel Sapphire Rapids when Gen 2 lands there.)

My own area of expertise is in databases and machine learning so in this case I also reached out to one of my colleagues at the University of Michigan, prof Nishil Talati, who’s an expert in modern hardware architecture and sought his opinion. Here’s what he told me “As modern hardware continues to advance with each generation-delivering greater compute density, memory bandwidth, and specialized accelerators-effectively mapping complex workloads in software such as data analytics and AI is essential. Realizing the full potential of hardware capabilities requires not only raw performance scaling, but also intelligent orchestration of compute, memory, and communication to match the increasing heterogeneity and specialization in today’s systems.” And that’s exactly what Snowflake has done in this case. They have also made changes to their software to better take advantage of the new hardware, which I will talk about later.

3 The Benchmark That Sparked the Buzz

Snowflake’s launch blog cites a TPC‑DS 1 TB “power” run showing ≈ 25 – 40 % faster execution (May 5 release notes). Independent engineers quickly validated it:

  • Jason Holt measured a 25 % speed‑up on XS warehouses with only 1 % extra cost (LinkedIn).
  • Masato Takada saw 30 – 40 % median gains (up to 70 %) across 22 TPC‑H queries (Medium).
  • Our researchers at Keebo replayed the Snowflake‑hosted TPC‑DS sample schema and hit the same ballpark.

Great—but benchmarks are controlled, cache‑flushed, perfectly parallel ideal worlds. Real workloads are uneven.

4 Why “Faster” Doesn’t Always Mean “Cheaper”

Gen 2 credits cost ~35 % more per hour on AWS and GCP and ~25 % more on Azure than Gen 1 of the same size (see credit table here). That changes the math:

Simple case: warehouse running 1 single query

If your warehouse is running a single query, and that queries runs 25 % faster on Gen2 but the hourly rate is 35 % higher, you’ve lost 1.25 %.

Example:

  • One query, wall‑clock 60 min on Gen 1, credit burn = 60 min × 1.00 = 60 credit‑minutes.
  • Same query, 25 % faster on Gen 2 → 45 min, but rate = 1.35 credit/min → 45 × 1.35 = 60.75 credit‑minutes. No savings.

The break‑even point in this case is when your query speed‑up % > 1 – (1/ cost multiplier %). For instance, if your cost per credit is up by only 25%, then you need to see at least 20% (=1-1/1.25) speedup to break even (0.8 x 1.25 = 1). 

General case: warehouse running multiple queries concurrently

Now imagine a mixed batch where 19 queries speed up 50 %, but one monster query is running the whole time and speeds up by only 5 %. The warehouse still runs until the slowest query finishes. Total uptime shrinks only 5 %, yet you’re paying 25% more per minute—net cost rises (0.95 x 1.25 = 1.18%), even though one could technically claim that your 20 queries sped up by (19×0.5+1×0.05)/20=0.478%!

Conversely, if every query in a batch shrinks proportionally, or if you suspend the warehouse the second the last query ends, you can save money. The break‑even point is when the total running time of your warehouse shrinks by at least (1 – 1/ cost multiplier %). 

5 When Gen 2 Does Make Sense

Lets have a look at workload characteristics and strategic considerations when switching from a Gen1 to a Gen2 warehouse.

5.1 What will benefit: Compute-intensive Workloads

Many database operators benefit from Gen 2 warehouses due to their higher main memory bandwidth, larger caches, wider SIMD units, and generally increased IPC. Among them are:

  • Joins 
  • Aggregations
  • Sorting
  • Deduplication
  • Window functions 
  • Function evaluation (e.g., REGEXP_LIKE, math UDFs)
  • Compression / Decompression
  • Encoding / Decoding

These operators are the workhorses of any analytical database system. However, a workload must be dominated by executing these operators to result in significant performance improvements and potential cost savings.

5.2 What won’t benefit: I/O-bound workloads

Computations are by no means the only limiting factor in a database system. Any operation that requires I/O and reaches beyond the scarce main memory or the even scarcer CPU cache, won’t benefit from the improvements in a Gen 2 warehouse. Most notably, accessing the following resources won’t benefit:

  • Remote storage (loading and storing uncached data, remote spillage)
  • Local storage (accessing data cached in local SSD storage, local spillage)
  • External resources (remote services, external functions, accessing external stages)

Accessing transient and persistent storage is an integral part of an analytical database system – at least as important as performing computations over data is. The more time a workload spends on I/O—e.g., due to cold caches or heavy use of working memory—the less it will benefit from enhanced main memory and compute resources in a Gen 2 warehouse. To put it in a nutshell: The best compute resources won’t help if they are idle waiting for data to arrive instead of executing operators. 

5.3 Strategic Considerations: Graceful scaling instead of doubling up

Increasing warehouse sizes is rarely cost-neutral – especially due to I/O bound portions in the workload. Choosing between a Gen 1 and a Gen 2 warehouse introduces a new dimension to the cost-performance trade-off: If a Medium Gen 2 costs ~25–35 % more than Medium Gen 1 but can deliver 20–40 % more throughput, that gentler slope may be more attractive than doubling both compute resources and cost by moving to a Large Gen 1.

5.4 Summary

Compute-intensive portions of a workload benefit from the new Gen 2 warehouses, while I/O-bound portions reduce the return on the increased cost. Cache is king: the more of your data that resides in the warehouse’s main memory and CPU caches, the less time will be spent on I/O. Making the switch comes with an increase in cost and predicting whether it pays off requires a complex analysis of the overall workload on a warehouse. 

However, caching isn’t the only factor that matters in Gen 2 warehouses—it’s also influenced by concurrency and data arrival patterns, as you’ll see next.

6 Concurrency, Arrival Patterns & Real‑World Savings

Even if every query individually speeds up more than the Gen 2 price premium, concurrency patterns decide the final bill. Picture two scenarios:

PatternGen 1 uptimeGen 2 speed‑upGen 2 uptimeNet cost vs. Gen 1
Synchronous batch – 100 heavy queries launch together at 01:00, finish together60 min30 %42 min≈ +4 % (30 % faster vs 35 % cost premium)
Staggered arrivals – same 100 queries trickle in over one hour60 min (warehouse busy entire hour)30 %60 min (tails overlap)≈ +35 %worse than Gen 1

Unless all concurrent queries shrink enough to reduce total uptime, Gen 2’s higher per‑second price can erase gains. This is why simply multiplying “average speed‑up” by “average cost” is misleading—arrival curves and tail‑latency matter.

7 Keebo’s Reinforcement Learning to the Rescue

Keebo’s AI engine already models arrival patterns, CPU vs. I/O mix, and tail‑latency risk. Because it’s built on reinforcement learning (RL) rather than static heuristics, it adapts whenever Snowflake releases new hardware or pricing tiers: expose the new warehouse type as an action, let RL explore, and the policy re‑optimizes itself.

We’re rolling out Gen 2 awareness in private preview. If you’d like early access—Keebo customer or not—drop us a line.

8 Bottom Line

Snowflake’s Gen 2 warehouses bring serious silicon upgrades, but cost savings appear only when system‑wide runtime drops more than the per‑credit premium. For CPU‑bound batches that suspend promptly, Gen 2 shines. For bursty, staggered dashboards—or workloads dominated by I/O waits—the premium can bite.

Keebo’s RL‑driven engine already crunches these variables so you don’t have to. Reach out for a personalized Gen 1 vs. Gen 2 break‑even study—or to join the early‑access rollout.

Questions or war‑stories? Comment below or ping me directly—always happy to compare notes.

Author

Barzan Mozafari
Barzan Mozafari
Articles: 0