What is Unity Catalog?
Unity Catalog is Databricks' data governance layer that unifies access control, auditing, and lineage for all data in the Lakehouse — Delta tables, ADLS files, volumes, ML models, and SQL queries.
Before Unity Catalog, each Databricks workspace had its own isolated Hive Metastore. Unity Catalog replaces that with a single metastore per region, shared across multiple workspaces.
Three-level hierarchy
Metastore (1 per region)
└── Catalog
└── Schema (Database)
└── Table / View / Volume / Function
Three-part naming
All object references use catalog.schema.table:
-- Without Unity Catalog (Hive)
SELECT * FROM silver.orders;
-- With Unity Catalog
SELECT * FROM catalog_prod.silver.orders;
Example structure for Lakehouse
catalog_prod/
├── bronze/
│ ├── orders_raw
│ └── customers_raw
├── silver/
│ ├── orders
│ └── customers
└── gold/
├── revenue_daily
└── customer_ltv
catalog_dev/
├── bronze/
├── silver/
└── gold/
Granular permissions
Unity Catalog uses a cascading permission model — permissions at the catalog level inherit down to schemas and tables.
-- Catalog level
GRANT USE CATALOG ON CATALOG catalog_prod TO `data-engineers`;
GRANT USE CATALOG ON CATALOG catalog_prod TO `data-analysts`;
-- Schema level
GRANT USE SCHEMA ON SCHEMA catalog_prod.gold TO `data-analysts`;
GRANT USE SCHEMA ON SCHEMA catalog_prod.silver TO `data-engineers`;
-- Table level
GRANT SELECT ON TABLE catalog_prod.gold.revenue_daily TO `data-analysts`;
GRANT SELECT, MODIFY ON TABLE catalog_prod.silver.orders TO `data-engineers`;
-- Mask sensitive column for restricted users
CREATE VIEW catalog_prod.silver.orders_masked AS
SELECT order_id, customer_id,
REGEXP_REPLACE(email, '(.)(.*)(@.*)', '$1***$3') AS email,
total_amount, status
FROM catalog_prod.silver.orders;
External locations
Unity Catalog manages access to ADLS Gen2 via External Locations — instead of passing the account key directly, you register the path and associate it with a Storage Credential.
-- Create Storage Credential (references managed identity or service principal)
CREATE STORAGE CREDENTIAL sc_adls_prod
WITH AZURE_MANAGED_IDENTITY = (
CREDENTIAL = '/subscriptions/.../resourceGroups/.../providers/...'
);
-- Register External Location
CREATE EXTERNAL LOCATION el_bronze
URL 'abfss://bronze@datalakeprod.dfs.core.windows.net/'
WITH (STORAGE CREDENTIAL sc_adls_prod);
-- Grant access
GRANT READ FILES ON EXTERNAL LOCATION el_bronze TO `data-engineers`;
Data lineage
Unity Catalog automatically tracks data lineage — which tables feed which, which queries read which tables, and who ran what.
# View lineage via API
import requests
response = requests.get(
"https://<workspace>.azuredatabricks.net/api/2.0/lineage-tracking/table-lineage",
headers={"Authorization": f"Bearer {token}"},
params={"table_name": "catalog_prod.gold.revenue_daily"}
)
In the Databricks UI: Data > [table] > Lineage displays the visual dependency graph.
Column-level security and row filters
Column masks
-- Mask SSN for unauthorized users
CREATE FUNCTION catalog_prod.security.mask_ssn(ssn STRING)
RETURN CASE
WHEN is_member('pii-access') THEN ssn
ELSE REGEXP_REPLACE(ssn, '\\d{3}-\\d{2}-\\d{4}', '***-**-****')
END;
ALTER TABLE catalog_prod.silver.customers
ALTER COLUMN ssn SET MASK catalog_prod.security.mask_ssn;
Row filters
-- Analysts only see data from their own department
CREATE FUNCTION catalog_prod.security.filter_by_department(dept STRING)
RETURN is_member('admin') OR dept = current_user();
ALTER TABLE catalog_prod.gold.revenue_daily
SET ROW FILTER catalog_prod.security.filter_by_department ON (department);
Audit logs
All access to Unity Catalog generates audit events in Azure Monitor / Databricks audit logs:
{
"serviceName": "unityCatalog",
"actionName": "getTable",
"userIdentity": { "email": "analyst@company.com" },
"requestParams": { "full_name": "catalog_prod.gold.revenue_daily" },
"timestamp": "2026-04-01T10:30:00Z"
}
Export to ADLS and query via Databricks SQL for compliance and auditing.
Lessons from the field
- Plan the hierarchy before creating anything — migrating tables between schemas later has a real cost
- One catalog per environment (dev/staging/prod), not one schema per environment inside the same catalog
- Use Service Principals, not personal user accounts, for programmatic access
- Create External Locations per layer (bronze, silver, gold) to simplify permission management
- Row filters and column masks are worth the investment for PII data — they eliminate the need for N separate masking views