Creating a new table in Snowflake might seem straightforward—after all, it’s just defining columns and data types, right? But if you’ve ever dealt with slow queries, excessive storage costs, or hard-to-maintain schemas, you know that poor table design can lead to long-term headaches.
A well-structured table is more than just a place to store data. It affects query performance, storage efficiency, data governance, and even security. Making informed choices upfront can save you hours of troubleshooting later and ensure that your analytics workflows run smoothly.
In this guide, I’ll walk through best practices for designing tables in Snowflake, covering everything from schema definition and query optimization to data loading strategies and access control. Whether you’re setting up a fresh database or improving an existing one, these principles will help you build scalable, high-performance tables that stand the test of time.
Define the Schema Thoughtfully
The schema defines how data is structured in the table. Choosing the right column types and constraints upfront is crucial for efficiency, query performance, and long-term usability.
Use appropriate data types
Snowflake supports various data types like INTEGER
, NUMBER
, VARCHAR
, DATE
, TIMESTAMP
, and VARIANT
. Choosing the wrong type can lead to excessive storage usage and slow queries.
Example: If a column only stores small whole numbers (like ages), using NUMBER(38,0)
(which supports massive numbers) is overkill—INTEGER
would be a better choice.
You should also be careful when using VARCHAR
. Unlike some databases, Snowflake doesn’t enforce max length on VARCHAR
. Defining a reasonable size, such as VARCHAR(255)
helps with documentation and clarity.
Default to TIMESTAMP_NTZ
unless time zones are needed
Snowflake has three timestamp types:
TIMESTAMP_NTZ
(No Time Zone) – Best when time zones don’t matter.
TIMESTAMP_TZ
(With Time Zone) – Includes time zone information.
TIMESTAMP_LTZ
(Local Time Zone) – Converts based on the session time zone.
TIMESTAMP_NTZ
is generally preferred unless you specifically need time zones, as it simplifies calculations and avoids conversion issues.
Optimize for Query Performance
Snowflake automatically optimizes storage with micro-partitioning, but you can further enhance performance with good table design.
Use clustering keys for large tables
Snowflake automatically partitions data into micro-partitions, but queries on large tables can still slow down if they have to scan too many partitions.
If queries frequently filter on a certain column (like ORDER_DATE
in an ORDERS
table), setting it as a clustering key improves query speed.
Example:
ALTER TABLE orders CLUSTER BY (order_date);
There are times to avoid clustering keys, however. If a table is small (<10M rows), clustering won’t make much difference. If queries don’t consistently filter on a single column, clustering might not help.
Store semi-structured data efficiently with VARIANT
Snowflake supports semi-structured data (JSON
, Avro
, Parquet
) in VARIANT
columns.
This is useful when dealing with dynamic schemas (e.g., data from APIs), but structured tables perform better for reporting.
If a JSON field is frequently queried, consider extracting it into separate columns for performance.
Avoid unnecessary row-level locks
Snowflake doesn’t use traditional row-level locking, but excessive updates/deletes can lead to fragmented micro-partitions.
Will your table have frequent updates? Consider using MERGE
instead of DELETE + INSERT
. More specifics on this recommendation down below.
Consider Table Type & Retention
Snowflake offers different table types that impact storage, cost, and fail-safe recovery.
Transient vs. Permanent vs. Temporary Tables
- Permanent tables (default): Data is fully recoverable with Time Travel (up to 90 days) and Fail-Safe (7 days).
- Transient tables: No Fail-Safe recovery, making them cheaper (good for intermediate processing).
- Temporary tables: Exist only for the session, automatically dropped when it ends.
Example use cases for table types:
- Permanent → Business-critical data (
ORDERS
,CUSTOMERS
). - Transient → ETL processing tables.
- Temporary → Ad-hoc analysis tables.
Set appropriate Time Travel retention
Snowflake retains deleted data for a configurable period (1-90 days
for Enterprise plans).
Remember: Longer retention = higher storage costs.
Example:
CREATE TABLE orders
(id INT, order_date DATE)
DATA_RETENTION_TIME_IN_DAYS = 30;
We can’t all use Mr. Fusion, so time travel responsibly! If you don’t need Time Travel, set it to 1
day to reduce costs.
Be Intentional with Data Loading
Efficient data loading ensures better performance and reduces storage fragmentation.
Use bulk inserts over row-by-row operations
Snowflake is optimized for bulk inserts (batching), and inserting data row by row is slow.
Good approach:
INSERT INTO customers (id, name, email)
SELECT * FROM staging_customers;
Bad approach:
INSERT INTO customers VALUES (1, 'Alice', 'alice@email.com');
When loading large files, use COPY INTO
instead:
COPY INTO orders FROM @my_stage FILE_FORMAT = (FORMAT_NAME = 'my_csv_format');
Use MERGE
instead of DELETE + INSERT
When updating existing records, DELETE + INSERT
creates unnecessary storage overhead.
Better alternative:
MERGE INTO customers AS target
USING new_customers AS source
ON target.id = source.id
WHEN MATCHED THEN UPDATE SET target.name = source.name
WHEN NOT MATCHED THEN INSERT (id, name) VALUES (source.id, source.name);
Govern Access & Security from the Start
Security should be a f0undational priority, not an afterthought.
Apply RBAC (Role-Based Access Control)
Avoid granting privileges directly to users. Instead, use roles.
Example:
GRANT SELECT ON orders TO ROLE analyst_role;
GRANT ROLE analyst_role TO USER analyst_user;
Use row-level security for sensitive data
Snowflake supports row access policies to restrict data visibility based on user roles.
Example:
CREATE OR REPLACE ROW ACCESS POLICY eu_only AS (country STRING) RETURNS BOOLEAN ->
CASE
WHEN CURRENT_ROLE() = 'EU_ROLE' AND country = 'EU' THEN TRUE
ELSE FALSE
END;
ALTER TABLE customers ADD ROW ACCESS POLICY eu_only ON (country);
Leverage dynamic data masking for Personally Identifiable Information (PII)
Snowflake can mask PII fields (things like e-mail addresses, IP addresses, etc) so only authorized users see full data.
Example:
CREATE MASKING POLICY mask_email
AS (email STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() = 'admin' THEN email
ELSE '********'
END;
ALTER TABLE users MODIFY COLUMN email SET MASKING POLICY mask_email;
Final Thoughts
Creating tables in Snowflake is more than just defining columns. It requires performance planning, storage optimization, and security governance. By following these best practices, you can:
✅ Improve query speed
✅ Reduce storage costs
✅ Ensure data integrity & security
Do you have any other non-negotiable best practices for Snowflake table design? Let me know!