Unity Catalog·2026-03-19·3 min read·

Unity Catalog: centralized data governance in Databricks

How to structure Unity Catalog for real production governance — hierarchy, granular permissions, lineage and external locations.

What is Unity Catalog?

Unity Catalog is Databricks' data governance layer that unifies access control, auditing, and lineage for all data in the Lakehouse — Delta tables, ADLS files, volumes, ML models, and SQL queries.

Before Unity Catalog, each Databricks workspace had its own isolated Hive Metastore. Unity Catalog replaces that with a single metastore per region, shared across multiple workspaces.

Three-level hierarchy

Metastore (1 per region)
└── Catalog
    └── Schema (Database)
        └── Table / View / Volume / Function

Three-part naming

All object references use catalog.schema.table:

-- Without Unity Catalog (Hive)
SELECT * FROM silver.orders;

-- With Unity Catalog
SELECT * FROM catalog_prod.silver.orders;

Example structure for Lakehouse

catalog_prod/
├── bronze/
│   ├── orders_raw
│   └── customers_raw
├── silver/
│   ├── orders
│   └── customers
└── gold/
    ├── revenue_daily
    └── customer_ltv

catalog_dev/
├── bronze/
├── silver/
└── gold/

Granular permissions

Unity Catalog uses a cascading permission model — permissions at the catalog level inherit down to schemas and tables.

-- Catalog level
GRANT USE CATALOG ON CATALOG catalog_prod TO `data-engineers`;
GRANT USE CATALOG ON CATALOG catalog_prod TO `data-analysts`;

-- Schema level
GRANT USE SCHEMA ON SCHEMA catalog_prod.gold TO `data-analysts`;
GRANT USE SCHEMA ON SCHEMA catalog_prod.silver TO `data-engineers`;

-- Table level
GRANT SELECT ON TABLE catalog_prod.gold.revenue_daily TO `data-analysts`;
GRANT SELECT, MODIFY ON TABLE catalog_prod.silver.orders TO `data-engineers`;

-- Mask sensitive column for restricted users
CREATE VIEW catalog_prod.silver.orders_masked AS
SELECT order_id, customer_id,
       REGEXP_REPLACE(email, '(.)(.*)(@.*)', '$1***$3') AS email,
       total_amount, status
FROM catalog_prod.silver.orders;

External locations

Unity Catalog manages access to ADLS Gen2 via External Locations — instead of passing the account key directly, you register the path and associate it with a Storage Credential.

-- Create Storage Credential (references managed identity or service principal)
CREATE STORAGE CREDENTIAL sc_adls_prod
WITH AZURE_MANAGED_IDENTITY = (
    CREDENTIAL = '/subscriptions/.../resourceGroups/.../providers/...'
);

-- Register External Location
CREATE EXTERNAL LOCATION el_bronze
URL 'abfss://bronze@datalakeprod.dfs.core.windows.net/'
WITH (STORAGE CREDENTIAL sc_adls_prod);

-- Grant access
GRANT READ FILES ON EXTERNAL LOCATION el_bronze TO `data-engineers`;

Data lineage

Unity Catalog automatically tracks data lineage — which tables feed which, which queries read which tables, and who ran what.

# View lineage via API
import requests

response = requests.get(
    "https://<workspace>.azuredatabricks.net/api/2.0/lineage-tracking/table-lineage",
    headers={"Authorization": f"Bearer {token}"},
    params={"table_name": "catalog_prod.gold.revenue_daily"}
)

In the Databricks UI: Data > [table] > Lineage displays the visual dependency graph.

Column-level security and row filters

Column masks

-- Mask SSN for unauthorized users
CREATE FUNCTION catalog_prod.security.mask_ssn(ssn STRING)
RETURN CASE 
  WHEN is_member('pii-access') THEN ssn
  ELSE REGEXP_REPLACE(ssn, '\\d{3}-\\d{2}-\\d{4}', '***-**-****')
END;

ALTER TABLE catalog_prod.silver.customers 
ALTER COLUMN ssn SET MASK catalog_prod.security.mask_ssn;

Row filters

-- Analysts only see data from their own department
CREATE FUNCTION catalog_prod.security.filter_by_department(dept STRING)
RETURN is_member('admin') OR dept = current_user();

ALTER TABLE catalog_prod.gold.revenue_daily
SET ROW FILTER catalog_prod.security.filter_by_department ON (department);

Audit logs

All access to Unity Catalog generates audit events in Azure Monitor / Databricks audit logs:

{
  "serviceName": "unityCatalog",
  "actionName": "getTable",
  "userIdentity": { "email": "analyst@company.com" },
  "requestParams": { "full_name": "catalog_prod.gold.revenue_daily" },
  "timestamp": "2026-04-01T10:30:00Z"
}

Export to ADLS and query via Databricks SQL for compliance and auditing.

Lessons from the field

  1. Plan the hierarchy before creating anything — migrating tables between schemas later has a real cost
  2. One catalog per environment (dev/staging/prod), not one schema per environment inside the same catalog
  3. Use Service Principals, not personal user accounts, for programmatic access
  4. Create External Locations per layer (bronze, silver, gold) to simplify permission management
  5. Row filters and column masks are worth the investment for PII data — they eliminate the need for N separate masking views