The single most common performance problem I see in Delta Lake tables is bad partitioning. Either too many partitions (small files problem) or too few (full table scans on filtered queries). Both are easy to create and surprisingly expensive to fix once a table is in production.
This article covers how to think about partitioning correctly before writing your first row, and what to do when you need to fix a table that was designed wrong.
What partitioning does
When you partition a Delta table, Databricks stores rows with the same partition column value in the same directory on ADLS. A query with a filter on the partition column only reads those directories — Databricks never opens files in other partitions.
# Without partitioning: every query reads all files
df.write.format("delta").saveAsTable("silver.events")
# With date partitioning: queries filtered by date read only relevant files
df.write \
.format("delta") \
.partitionBy("event_date") \
.saveAsTable("silver.events")
For a table with 3 years of events, a query WHERE event_date = '2025-03-01' on a date-partitioned table reads 1/1095 of the data. On an unpartitioned table, it reads everything.
The small files trap
The most common partitioning mistake is over-partitioning — creating partitions with too few files or files that are too small (under ~100MB each).
Every Parquet file has fixed overhead: metadata reads, file open operations, S3/ADLS API calls. When a query needs to read 10,000 files of 5MB each, that overhead dominates execution time. The same data in 100 files of 500MB each is dramatically faster.
How over-partitioning happens:
# BAD: partitioning by a high-cardinality column
# If you have 1M customers, this creates 1M partition directories
df.write.partitionBy("customer_id").saveAsTable("silver.customers")
# BAD: partitioning by timestamp (each unique second = one partition)
df.write.partitionBy("event_timestamp").saveAsTable("silver.events")
# GOOD: partition by date (reasonable cardinality)
df.withColumn("event_date", F.to_date("event_timestamp")) \
.write.partitionBy("event_date").saveAsTable("silver.events")
Rule of thumb: Each partition should contain at least 128MB of data. If your daily partition is 5MB, you're over-partitioned — consider weekly or monthly partitioning, or no partitioning at all.
Z-Ordering: the alternative for high-cardinality columns
For columns you filter on frequently but that have too many distinct values to partition by (customer_id, product_id, user_id), use Z-Ordering instead of partitioning:
spark.sql("""
OPTIMIZE silver.events
ZORDER BY (customer_id, event_type)
""")
Z-Ordering co-locates rows with similar values in the same files using a space-filling curve. It doesn't partition the data — all files remain in a single directory — but Delta Lake's data skipping can skip large portions of the table when you filter on Z-ordered columns.
Z-Ordering works best when:
- The column has high cardinality (thousands to millions of distinct values)
- Queries frequently filter on this column in combination with partition filters
- Files are large enough that skipping within a file matters (100MB+)
Combining partition and Z-Order is the most powerful pattern:
# Partition by date (coarse filter), Z-Order by customer_id (fine filter)
spark.sql("""
OPTIMIZE silver.events
WHERE event_date >= '2025-01-01'
ZORDER BY (customer_id, event_type)
""")
A query like WHERE event_date = '2025-03-01' AND customer_id = 12345 first skips 99.9% of the table via partition pruning, then skips most of the remaining files via Z-Order data skipping.
Liquid Clustering: the modern alternative
Databricks introduced Liquid Clustering as a replacement for static partitioning. Instead of pre-declaring partition columns at table creation, you define clustering columns and Databricks manages file layout automatically.
# Create table with liquid clustering
spark.sql("""
CREATE TABLE silver.events (
event_id BIGINT,
customer_id BIGINT,
event_type STRING,
event_date DATE,
payload STRING
)
USING DELTA
CLUSTER BY (event_date, customer_id)
""")
The key advantage: you can change clustering columns without rewriting the table.
# Change clustering columns on an existing table
spark.sql("ALTER TABLE silver.events CLUSTER BY (customer_id, event_type)")
With static partitioning, changing the partition column requires a full table rewrite. With Liquid Clustering, the next OPTIMIZE run applies the new clustering incrementally.
Prefer Liquid Clustering over static partitioning when your query patterns change over time, when you're creating new tables and want operational flexibility, or when your partition column has irregular distribution (some dates have 10x more data than others).
How to fix a badly partitioned table
You have a production table partitioned wrong. Here are your options, from least to most disruptive.
Option 1 — Add Z-Ordering (non-destructive). If the table has no partitioning or is partitioned on the right column but the wrong cardinality, Z-Ordering can compensate without changing the table structure:
spark.sql("OPTIMIZE silver.events ZORDER BY (customer_id)")
Option 2 — Change to Liquid Clustering. If you're on Databricks Runtime 13.3+, migrate to Liquid Clustering:
spark.sql("ALTER TABLE silver.events CLUSTER BY (event_date, customer_id)")
# Next OPTIMIZE will start applying the new clustering
spark.sql("OPTIMIZE silver.events")
Option 3 — Full rewrite (most disruptive but cleanest result). Create a new table with the correct partitioning, backfill it, and swap:
# Create new table with correct partitioning
spark.sql("""
CREATE TABLE silver.events_v2
USING DELTA
PARTITIONED BY (event_date)
LOCATION 'abfss://silver@storage.dfs.core.windows.net/events_v2'
AS SELECT *, TO_DATE(event_timestamp) AS event_date
FROM silver.events
""")
# After validation, drop old and rename
spark.sql("DROP TABLE silver.events")
spark.sql("ALTER TABLE silver.events_v2 RENAME TO silver.events")
The rewrite requires downtime or a cutover window. Optimize it by writing the historical data first, then processing only the delta since the backfill started.
A decision framework
When deciding on partitioning strategy for a new table, answer these questions in order:
-
What columns do queries filter on most frequently? Those are your partition/cluster candidates.
-
What is the cardinality of each candidate column?
- Low cardinality (fewer than 1000 distinct values, e.g., status, region, date): good for static partitioning
- High cardinality (more than 10,000 distinct values, e.g., customer_id): use Z-Order, not partitioning
-
How much data is in each partition?
- Target: 100MB to 1GB per partition file
- Less than 100MB per partition: too granular, consider coarser granularity or no partitioning
- More than 10GB per partition: fine for batch, might be slow for point queries — add Z-Ordering
-
Will query patterns change?
- Yes: prefer Liquid Clustering
- No: static partitioning is simpler and sufficient
-
Are write patterns append-only or do you have frequent upserts?
- Frequent upserts into specific partitions cause data skew — Liquid Clustering handles this better
Following this framework doesn't guarantee perfect partitioning on the first try. But it reduces the chance of needing a painful full table rewrite six months into production.