Keebo | Storage Cost Optimization for Snowflake — How to Reduce Your TB Spend

Storage Cost Optimization for Snowflake — How to Reduce Your TB Spend

  1. Snowflake Storage 101: Understanding Storage Unit Rates
  2. Hidden Drivers: Time-Travel & Fail-Safe Storage
  3. Strategy #1: Squeeze Every Byte with Compression & Smart Partitioning
  4. Strategy #2: Lifecycle Management & Object Tagging
  5. Strategy #3: Cut Bills by Leveraging AWS S3 Storage Cost with External Tables
  6. Strategy #4: Zero-Copy Cloning & Data Sharing
  7. Strategy #5: Monitor, Alert & Iterate
  8. Future Outlook: Operationalising Iceberg & Tiered “Platinum / Gold / Bronze” Storage
  9. Conclusion & Next Steps

Snowflake bills you for storage every single day—whether you query that data or not. One thing I have seen a lot is that because storage feels “cheap” compared to compute, it’s easy for micro-partitions to balloon quietly until the bill stings. In this post, I’ll walk through the mechanics of Snowflake storage unit rates, expose the hidden culprits (hi, Time Travel), and show you a few high-impact tactics—compression & smart partitioning—that can shrink your footprint before month-end close.

Snowflake Storage 101: Understanding Storage Unit Rates

At its simplest, Snowflake charges a flat storage unit rate—currently $23 per terabyte per month in AWS us-east-1 for on-demand accounts . A few nuances to keep in mind:

Capacity vs. On-Demand contracts

If you make an upfront commitment, the rates can drop by 10–30% depending on the size of your commitment (and your negotiation skills of course!).

What counts as storage?

Compressed table data, external-stage files left hanging around, Time Travel copies, and Fail-safe snapshots all accrue at the same $/TB.

Daily averaging

Snowflake snapshots storage once a day and averages the month—deleting data on the 30th won’t save you 29 days of charges.

Need the full breakdown of every line item? My previous 2025 Snowflake pricing guide walks through cost classes and regional uplifts.

Hidden Drivers: Time-Travel & Fail-Safe Storage

Snowflake’s point-in-time restore features are fantastic for “oops” moments—but they silently duplicate micro-partitions:

FeatureDefault RetentionTypical Storage Overhead
Time Travel1 day (can raise to 90)~3–5 % per extra day
Fail-safeFixed 7 days~7–10 % of table size

According to Snowflake’s official docs, transient or temporary tables escape these fees though.

You can build your own dashboard in your BI tool to watch Time Travel’s share of the bill. You can also use the Storage Health functionality of Keebo’s Workload Intelligence to surface the exact dollar impact automatically and flag tables whose retention windows have crept up. See screenshot below:

Keebo vs build-your-own FinOps dashboard comparison

Quick win: set unused dev schemas to a 1-day Time Travel retention, then run
ALTER DATABASE … SET DATA_RETENTION_TIME_IN_DAYS = 1;.

Next, I am going to share 5 effective strategies for reducing your overall storage cost with Snowflake. Depending on your business requirements, some may not be feasible for you but my hope is that you will find at least a couple that you can benefit from.

Strategy #1: Squeeze Every Byte with Compression & Smart Partitioning

Snowflake’s columnar engine already does a lot of the work for you—typical compression runs 3–4× smaller than raw data . You can push that even further with a few habits:

Use the right data types

Replace padded VARCHAR(1000) columns with trimmed VARCHAR or numeric types. A lot of the time, data teams do not want to carefully think through the optimal data type for every column when they design their database, but that’s an effort worth putting in. Not only does it reduce your storage footprint, it can also drastically improve your query performance (e.g., using INTversus VARCHAR).

Flatten JSON into VARIANT

Snowflake’s VARIANT type stores semi-structured data (JSON, Avro, Parquet) in a binary-columnar format instead of plain text:

  1. Parsing & typing once
    When you run COPY INTO table … from a JSON stage, Snowflake tokenizes each record, converts leaf values to native types (NUMBER, STRING, BOOLEAN), and stores them in mini sub-columns.
  2. Columnar compression kicks in
    Because every sub-column now holds uniform, highly repetitive data, Snowflake’s dictionary + LZ4 compression achieves 3–5× size reduction compared with raw JSON bytes.
  3. Query performance improves
    Dot-notation paths (select data:user_id::variant …) can filter on individual attributes without scanning the entire JSON blob.

Quick demo

— Landing raw events

CREATE OR REPLACE TABLE raw_events (data VARIANT);

COPY INTO raw_events

  FROM @my_stage/events/ FILE_FORMAT = (TYPE = ‘JSON’);

— Compressed size check

SELECT SYSTEM$ESTIMATE_TABLE_SIZE(‘RAW_EVENTS’);

Typical results on click-stream data: 120 GB raw → ~30 GB in VARIANT. 

For maximum savings, avoid storing the same JSON in a VARCHAR column—let Snowflake do the flattening for you.

Terabyte to gigabyte conversion refresher: Snowflake (like AWS) uses binary math—1 TB = 1024 GB.

Let micro-partition pruning work

Snowflake automatically groups data into micro-partitions (50–500 MB uncompressed). During a query it scans only the partitions whose metadata range matches your predicates—this is called partition pruning.

Why “less than ~20 %” is the magic number

  • If a filter touches >80 % of partitions, clustering won’t help; the engine must scan almost everything anyway.
  • If a filter touches ≤20 %, you’re already getting near-optimal I/O elimination, so extra clustering may waste compute and inflate storage.

How to measure

— For a representative query

EXPLAIN USING ‘TEXT’

SELECT * FROM sales WHERE order_dt >= ‘2025-01-01’;

— Look for “partitions scanned” vs “partitions total”

For a broader view:

SELECT *

FROM TABLE( SYSTEM$CLUSTERING_INFORMATION(

           ‘SALES’, ‘(ORDER_DT, CUSTOMER_ID)’) );

If the CLUSTERING_DEPTH metric is under 1.5 and PERCENT_PARTITION_SELECTED stays below 20 %, skip ALTER TABLE … CLUSTER BY; let natural ordering do the job.

Archive wide history tables

Fact tables with years of history may be 98 % read-only—yet you still pay hot-storage rates plus Time-Travel and Fail-safe overhead. Shift aged partitions to cheap object storage:

  1. Copy old slices to Parquet in S3

COPY INTO @archive_stage/sales/<=2023/

  FROM (SELECT * FROM sales WHERE order_dt < ‘2024-01-01’)

  FILE_FORMAT = (TYPE = PARQUET COMPRESSION = SNAPPY);

  1. Create an external Iceberg/Parquet table

CREATE OR REPLACE EXTERNAL TABLE sales_archive

  WITH LOCATION = ‘@archive_stage/sales/<=2023/’

  FILE_FORMAT = (TYPE = PARQUET);

  1. Drop or truncate the hot copy

DELETE FROM sales WHERE order_dt < ‘2024-01-01’;

  1. Union-all in views so dashboards stay unchanged

CREATE OR REPLACE VIEW sales_all AS

SELECT * FROM sales

UNION ALL

SELECT * FROM sales_archive;

Result: cold partitions now cost ≤ $1/TB-mo (Glacier Deep Archive) instead of $23, yet analysts still query a single logical table. Amazon S3 Glacier Deep Archive is AWS’s lowest-cost object-storage class, designed for data you hardly ever need to read but must keep for years. (Terabyte to gigabyte conversion refresher: Remember that 1 TB equals 1024 GB when you forecast archive size).

Strategy #2: Lifecycle Management & Object Tagging

Snowflake never deletes a byte unless you ask it to. A simple tag-and-lifecycle loop prevents “forgotten” objects from bloating storage forever.

Why lifecycle rules matter

  • Temp upload files, one-off staging schemas, and abandoned sandboxes linger at full price.
  • Time-Travel and Fail-safe snapshots compound the waste if those objects stick around.

Tag data tiers up front

— Tag prod objects as “hot” (30-day retention)

ALTER TABLE finance.transactions 

  ADD TAG data_tier = ‘hot’;

— Tag staging schema as “cold” (7-day retention)

ALTER SCHEMA staging 

  SET TAG data_tier = ‘cold’;

Tags propagate to clones, so governance follows the data wherever it goes.

Automate drops & alerts

  1. Create a Nightly task like the following one:

CREATE TASK drop_cold_objects

  WAREHOUSE = admin_wh

  SCHEDULE  = ‘USING CRON 0 3 * * * America/Detroit’

AS

  CALL admin.drop_cold_objects();

  1. The stored procedure here checks whether data_tier = ‘cold’ and age > 7 days, then issues DROP.
  2. Alert when cold tier creeps up. Use a simple daily check against TABLE_STORAGE_METRICS to spot growth in data tagged cold. If the total exceeds your safe threshold (say > 5 TB or grows > 10 % week-over-week), send a Slack or email alert.

— Measure today’s cold-tier footprint

WITH today AS (

  SELECT

      SUM(active_bytes + time_travel_bytes + failsafe_bytes) AS bytes_cold

  FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS

  WHERE tag_value = ‘cold’

    AND date = CURRENT_DATE() – 1            — yesterday’s snapshot

),

— Same metric seven days ago

week_ago AS (

  SELECT

      SUM(active_bytes + time_travel_bytes + failsafe_bytes) AS bytes_cold

  FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS

  WHERE tag_value = ‘cold’

    AND date = CURRENT_DATE() – 8

)

SELECT

    ROUND(today.bytes_cold / POWER(1024, 4), 2)  AS tb_cold_today,

    ROUND(week_ago.bytes_cold / POWER(1024, 4), 2) AS tb_cold_week_ago,

    ROUND(

        (today.bytes_cold – week_ago.bytes_cold) 

        / NULLIF(week_ago.bytes_cold, 0)

        * 100, 1)  AS pct_change

FROM today, week_ago;

If pct_change > 10 or tb_cold_today > 5, raise an alert:

— Example: notify via email

CALL system$send_email(

      ‘storage_alerts’,

      ‘Cold-tier storage exceeded threshold’,

      ‘Cold data grew more than 10 % this week—review lifecycle rules.’);

Drop this query into a scheduled task and you’ll know the moment your cold data footprint starts to drift. The Data Health feature of Keebo’s Workload Intelligence view goes further by flagging tables that have been written to but not read in 90 days—prime candidates for archival or deletion as well tables that have not been written to or read from. Filter by schema to spot forgotten sandboxes, then drop or offload those tables to cut both compute and storage waste. See a screenshot from Keebo’s Data Health feature below:

AD 4nXdxER055V4vwCnPwfS4K6jn8PfqVp pZeTPH9c 5mU6T 0 O2qLfrnGcYZq2YIyKY5EcX OWUpZU5nJT KTMW2OHxarCLi2Rmy0Djlr09VTifSGCrPWg66kiGzfiiyQtoaDRFOU?key=KJKg

Strategy #3: Cut Bills by Leveraging AWS S3 Storage Cost with External Tables

Snowflake compressed storage runs ≈ $23/TB-month in AWS us-east-1, so understanding **AWS S3 storage cost** tiers is critical before you off-load. Offloading archival partitions to S3 lets you downgrade costs in tiers:

AWS tierTypical use-caseCost ($/GB-mo)*
S3 Standardhot/active0.023
S3 Intelligent-Tieringauto hot/cold0.013 → 0.023
S3 Glacier Deep Archivecompliance copies0.00099

*This is based on the published AWS price, June 2025.

Offload workflow

  1. Export to Parquet

COPY INTO @cold_stage/2025_q1/

  FROM analytics.clickstream

  FILE_FORMAT = (TYPE = PARQUET COMPRESSION = SNAPPY);

  1. Create an external table

CREATE OR REPLACE EXTERNAL TABLE clickstream_ext

  WITH LOCATION = ‘@cold_stage/2025_q1/’

  FILE_FORMAT  = (TYPE = PARQUET);

  1. Add S3 lifecycle rules
    • 0–30 days → Standard
    • 30 days → Intelligent-Tiering
    • 90 days → Glacier Deep Archive (this is the cheapest tier)

Queries can UNION native and external tables transparently, so analysts don’t need to know where the bytes live.

Keep queries predictable

  • Partition Parquet files by date (dt=YYYY-MM-DD/part-00001.parquet) so pruning still works.
  • Prefetch large scans with COPY INTO temp if latency matters.
  • Track S3 request fees—cheap storage can turn pricey if you scan terabytes daily.

With lifecycle tagging guarding hot data (Strategy #2) and S3 offloading sheltering the coldest bytes, you’ll cut storage spend without sacrificing recoverability—or rewriting a single dashboard.

Strategy #4: Zero-Copy Cloning & Data Sharing

Snowflake’s zero-copy clone feature lets you fork a complete database, schema, or table in seconds—without copying a single byte. The new object merely references the source micro-partitions; storage is consumed only when you write to the clone (copy-on-write).

Why clones can still explode

  • Each DML against the clone materializes new micro-partitions, so an “inactive” dev sandbox can silently grow to terabytes if testers run UPDATE/DELETE statements.
  • Nested clones amplify the problem—cloning a clone duplicates any already-materialized partitions.

Best practices for dev/test sandboxes

  1. Track clones with tags

ALTER DATABASE myapp_dev_clone 

  ADD TAG env = ‘dev’, ttl_days = 14;

  1. A scheduled task can drop anything where ttl_days is exceeded.
  2. Prefer transient clones for scratch work to avoid Fail-safe fees.

CREATE TRANSIENT DATABASE scratch_clone 

CLONE prod_db AT (TIMESTAMP => TO_TIMESTAMP(‘2025-05-01 00:00:00’));

  1. Avoid DML on large fact tables inside clones—use CREATE TABLE … AS SELECT on a filtered subset instead.
  2. Drop early, drop often

DROP DATABASE IF EXISTS myapp_dev_clone;

Use Data Sharing instead of cloning

When another team (or an external partner) only needs read access, use Secure Data Sharing. Shares expose the same micro-partitions in place—no extra storage, no copy-on-write risk—while keeping access control at the object level.


Strategy #5: Monitor, Alert & Iterate

Storage optimization is not a one-and-done project; it’s a feedback loop (unfortunately, like most things in life!).

Build a lightweight cost-governance dashboard

  • Query STORAGE_DAILY_HISTORY for total bytes, broken down by active, Time Travel, and Fail-safe.
  • Add TOP 20 objects by storage delta week-over-week to spot sudden spikes.
  • Publish a time-series chart of $ per TB to your BI tool or a simple Streamlit app.

Key metrics to track

MetricWhy it mattersAlert threshold
Active storage Δ 7-dayDetects uncompressed dumps or rogue imports> 10 %
Time Travel % of totalReveals retention mis-configurations> 25 %
Fail-safe % of totalFlags need for transient tables> 15 %

DIY Dashboards vs. Keebo Workload Intelligence

DIY dashboards work—but they require upkeep and domain expertise. My post on Lessons from Instacart and Snowflake outlines the trade-offs of buy versus build in detail, but below is a summary table.

What you needKeeboBuild Your Own
Time-to-Value✅ Instant visibility with pre-built, FinOps-ready dashboards❌ Hours to spin up a dashboard; months to harden end-to-end tooling
Accuracy of Attribution✅ Granular $ attribution by user, team, workload & query❌ Delayed, error-prone reports with limited context
Pro-active Cost Control✅ Real-time alerts + AI-prioritised fixes❌ Static thresholds; manual monitoring; delayed reaction
Engineering Overhead✅ Fully managed—zero maintenance❌ High ongoing cost & engineering distraction
Clarity on Fixes✅ AI shows top issues and how to resolve them❌ You spot the problem → now hunt for a fix yourself
Governance & Accountability✅ Built-in attribution drives ownership across teams❌ Hard to enforce; manual scaffolding needed

If you’d rather skip maintenance, Keebo’s battle-tested Workload Intelligence surfaces storage anomalies, sends proactive alerts, and even recommends the right retention policy—all without writing a single line of SQL.

Pro tip: Set a weekly calendar reminder to review the dashboard—or let Workload Intelligence ping you in Slack when any metric crosses its threshold.

With continuous monitoring in place, every optimization from Strategies #1-4 stays locked in, and new issues surface before they balloon into next month’s bill.

Future Outlook: Operationalising Iceberg & Tiered “Platinum / Gold / Bronze” Storage

Snowflake’s May 2024 GA for Iceberg Tables decoupled where you store bytes from how you query them. The 2025 Summit upgrades—Catalog-Linked Databases (CLDB) and Dynamic Iceberg Tables—have pushed that decoupling even further.

Why this matters

  1. Cost leverage – Store 80-90 % of cold data in S3 (¢/GB) while still querying it through Snowflake’s engine.
  2. Vendor flexibility – A single Iceberg catalog can serve Snowflake, Databricks, Presto & Hive—no more copy-and-paste data lakes.
  3. Governance continuity – CLDB syncs object-storage ACLs and Iceberg metadata back into Snowflake RBAC, so security teams keep a single control plane.

What teams should do now

PriorityActionPay-off
HighPilot an Iceberg landing zone. Export a non-critical fact table ≥ 1 TB to Iceberg (Parquet) and create an external database via CLDB.Validate query latency & cost; build muscle memory for lifecycle rules.
MediumUpdate lifecycle policies. Route RAW_* tables → Gold after 30 days → Bronze (Iceberg + Glacier) after 90–180 days.Automatic cost drops with no ETL.
MediumAdd Iceberg support to CI/CD. Extend db-change scripts so every new table has a heat tier tag (platinum/gold/bronze).Keeps tiering consistent as new data lands.
Low (watchlist)Preview Search Optimization & QAS on Iceberg. Private preview now; GA expected 2026.Brings sub-second queries to Bronze data without re-ingesting. Disclaimer: I expect both Search Optimization & WAS to increase compute costs for most customers, but the latency improvements may justify that.

Roadmap signals to watch

  • Auto-tiering – Snowflake hinted at policies that move native tables to Iceberg once they age beyond an SLA. Storing only the hottest 5 – 10 % of bytes in **platinum storage** keeps premium spend under control.
  • Cross-cloud catalogs – Polaris (open-source) aims to unify Iceberg metadata across AWS, Azure, GCP.
  • Column-cache acceleration – Expect a vendored cache layer that keeps hottest Iceberg columns in SS-RAM for sub-second BI.

In short, the platinum / gold / bronze playbook is becoming real: keep only the fastest-needed 5-10 % in premium storage, warm the next 20 % in native Snowflake, and shove the rest into Iceberg on S3 plus Glacier Deep Archive. Start small now so you’re ready when auto-tiering goes GA.

Conclusion & Next Steps

Storage often feels cheap—until it quietly doubles your bill. Applying the five levers I covered in this guide can typically trim 20–40% of that spend:

  1. Right-size data types & compression
  2. Prune Time Travel and Fail-safe windows
  3. Enforce lifecycle tags & drops
  4. Offload archives to S3 / Iceberg
  5. Govern with continuous monitoring

Ready to pinpoint your biggest storage-savings wins? Spin up a free trial of Keebo Workload Intelligence—no commitment, no card required—connect Snowflake once, let the platform run its automated scan, and within 30 minutes you’ll have a prioritized, actionable savings plan, start your free Keebo trial →.

Happy optimizing!

Author

Barzan Mozafari
Barzan Mozafari
Articles: 0