Databricks Hardening Guide
Data platform security for workspace access, Unity Catalog, and secrets management
Overview
Databricks serves 10,000+ customers with Unity Catalog governing data lake access. OAuth federation with Snowflake, service principal credentials, and cluster access tokens create attack vectors. Databricks workspaces contain raw enterprise data, ML models, and training datasets making them high-value targets for data exfiltration and IP theft.
Intended Audience
- Security engineers hardening data platforms
- Data engineers configuring Databricks
- GRC professionals assessing data governance
- Third-party risk managers evaluating analytics integrations
How to Use This Guide
- L1 (Baseline): Essential controls for all organizations
- L2 (Hardened): Enhanced controls for security-sensitive environments
- L3 (Maximum Security): Strictest controls for regulated industries
Scope
This guide covers Databricks security configurations including authentication, Unity Catalog governance, cluster security, and secrets management.
Table of Contents
- Authentication & Access Controls
- Unity Catalog Security
- Cluster Security
- Secrets Management
- Monitoring & Detection
- Compliance Quick Reference
1. Authentication & Access Controls
1.1 Enforce SSO with MFA
Profile Level: L1 (Baseline) NIST 800-53: IA-2(1)
Description
Require SAML SSO with MFA for all Databricks access.
ClickOps Implementation
Step 1: Configure SAML SSO
- Navigate to: Admin Settings → Identity and Access → Single Sign-On
- Configure:
- IdP Entity ID: From your identity provider
- SSO URL: IdP login endpoint
- Certificate: Upload IdP certificate
Step 2: Enforce SSO
- Enable: Require users to log in with SSO
- Disable: Allow local password login
Step 3: Configure SCIM Provisioning
- Navigate to: Admin Settings → Identity and Access → SCIM Provisioning
- Configure connector with your IdP
- Enable: Automatic user provisioning
1.2 Implement Service Principal Security
Profile Level: L1 (Baseline) NIST 800-53: IA-5
Description
Secure service principals used for automation and integrations.
Rationale
Why This Matters:
- Service principals enable programmatic access
- OAuth tokens for service principals can have long validity
- Compromised service principal = bulk data access
Attack Scenario: Compromised service principal accesses data lakehouse; malicious notebook executes data exfiltration.
ClickOps Implementation
Step 1: Create Purpose-Specific Service Principals
- Navigate to: Admin Settings → Identity and Access → Service Principals
- Create principals for each integration:
svc-etl-pipeline(ETL jobs)svc-ml-training(ML workloads)svc-reporting(BI tools)
Step 2: Assign Minimal Permissions
- Navigate to: Unity Catalog → Grants
- For each service principal:
- Grant only required catalogs
- Grant only required schemas
- Prefer SELECT over ALL PRIVILEGES
Step 3: Configure OAuth Tokens
- Generate OAuth tokens for service principals
- Set appropriate token lifetime
- Store tokens in secrets manager
- Rotate tokens quarterly
1.3 Configure IP Access Lists
Profile Level: L2 (Hardened) NIST 800-53: AC-3(7)
Description
Restrict Databricks access to known IP ranges.
ClickOps Implementation
Step 1: Configure IP Access Lists
- Navigate to: Admin Settings → Security → IP Access Lists
- Add allowed IP ranges:
- Corporate network
- VPN egress
- Approved integration IPs
- Enable: Block public access (L2)
2. Unity Catalog Security
2.1 Implement Data Governance
Profile Level: L1 (Baseline) NIST 800-53: AC-3
Description
Configure Unity Catalog for centralized data governance.
ClickOps Implementation
Step 1: Create Catalog Structure
-- Create catalogs by environment
CREATE CATALOG IF NOT EXISTS production;
CREATE CATALOG IF NOT EXISTS staging;
CREATE CATALOG IF NOT EXISTS development;
-- Create schemas by domain
CREATE SCHEMA IF NOT EXISTS production.finance;
CREATE SCHEMA IF NOT EXISTS production.customer_data;
CREATE SCHEMA IF NOT EXISTS production.ml_features;
Step 2: Configure Granular Permissions
-- Grant specific permissions
GRANT USE CATALOG ON CATALOG production TO `data_analysts`;
GRANT USE SCHEMA ON SCHEMA production.finance TO `finance_team`;
GRANT SELECT ON TABLE production.finance.transactions TO `finance_team`;
-- Restrict sensitive tables
DENY SELECT ON TABLE production.customer_data.pii TO `general_users`;
Step 3: Enable Column-Level Security
-- Create row filter function
CREATE FUNCTION production.filters.region_filter()
RETURNS STRING
RETURN CASE
WHEN is_account_group_member('us_team') THEN 'region = "US"'
WHEN is_account_group_member('eu_team') THEN 'region = "EU"'
ELSE 'FALSE'
END;
-- Apply to table
ALTER TABLE production.customer_data.orders
SET ROW FILTER production.filters.region_filter ON (region);
2.2 Configure Data Masking
Profile Level: L2 (Hardened) NIST 800-53: SC-28
Description
Implement dynamic data masking for sensitive columns.
-- Create masking function
CREATE FUNCTION production.masks.mask_ssn(ssn STRING)
RETURNS STRING
RETURN CASE
WHEN is_account_group_member('pii_admin') THEN ssn
ELSE CONCAT('XXX-XX-', RIGHT(ssn, 4))
END;
-- Apply mask to column
ALTER TABLE production.customer_data.customers
ALTER COLUMN ssn SET MASK production.masks.mask_ssn;
2.3 Audit Logging for Data Access
Profile Level: L1 (Baseline) NIST 800-53: AU-2, AU-3
Description
Enable comprehensive audit logging for data access.
ClickOps Implementation
Step 1: Enable System Tables
- Navigate to: Admin Settings → System Tables
- Enable: Access audit logs
- Configure retention period
Step 2: Query Audit Logs
-- Query data access audit logs
SELECT
event_time,
user_identity.email as user_email,
action_name,
request_params.full_name_arg as table_accessed,
source_ip_address
FROM system.access.audit
WHERE action_name IN ('getTable', 'commandSubmit')
AND event_time > current_timestamp() - INTERVAL 24 HOURS
ORDER BY event_time DESC;
3. Cluster Security
3.1 Configure Cluster Policies
Profile Level: L1 (Baseline) NIST 800-53: CM-7
Description
Implement cluster policies to enforce security configurations.
ClickOps Implementation
Step 1: Create Secure Cluster Policy
- Navigate to: Compute → Policies → Create Policy
- Configure:
{
"spark_version": {
"type": "allowlist",
"values": ["13.3.x-scala2.12", "14.0.x-scala2.12"]
},
"node_type_id": {
"type": "allowlist",
"values": ["Standard_DS3_v2", "Standard_DS4_v2"]
},
"autotermination_minutes": {
"type": "range",
"minValue": 10,
"maxValue": 120,
"defaultValue": 30
},
"custom_tags.Environment": {
"type": "fixed",
"value": "production"
},
"spark_conf.spark.databricks.cluster.profile": {
"type": "fixed",
"value": "serverless"
},
"init_scripts": {
"type": "fixed",
"value": []
}
}
Step 2: Assign Policy to Users
- Navigate to: Admin Settings → Workspace → Cluster Policies
- Assign policy to appropriate groups
- Set as default for users
3.2 Network Isolation
Profile Level: L2 (Hardened) NIST 800-53: SC-7
Description
Deploy Databricks with network isolation.
Implementation
VPC/VNet Deployment:
- Deploy workspace in customer-managed VPC
- Configure private endpoints
- Disable public IP addresses for clusters
# Terraform example - Private workspace
resource "databricks_mws_workspaces" "this" {
account_id = var.databricks_account_id
workspace_name = "secure-workspace"
deployment_name = "secure"
aws_region = var.region
network_id = databricks_mws_networks.this.network_id
# Private configuration
private_access_settings_id = databricks_mws_private_access_settings.this.private_access_settings_id
}
resource "databricks_mws_private_access_settings" "this" {
private_access_settings_name = "secure-pas"
region = var.region
public_access_enabled = false
}
4. Secrets Management
4.1 Use Databricks Secret Scopes
Profile Level: L1 (Baseline) NIST 800-53: SC-28
Description
Store credentials in Databricks secret scopes rather than notebooks.
ClickOps Implementation
Step 1: Create Secret Scope
# Create secret scope backed by Databricks
databricks secrets create-scope --scope production-secrets
# Add secrets
databricks secrets put --scope production-secrets --key db-password
databricks secrets put --scope production-secrets --key api-key
Step 2: Configure Access Controls
# Grant read access to specific group
databricks secrets put-acl \
--scope production-secrets \
--principal data_engineers \
--permission READ
Step 3: Use Secrets in Notebooks
# Access secrets in notebook
db_password = dbutils.secrets.get(scope="production-secrets", key="db-password")
# Secret is redacted in logs
print(db_password) # Shows [REDACTED]
4.2 External Secret Store Integration
Profile Level: L2 (Hardened) NIST 800-53: SC-28
Description
Integrate with external secrets managers.
Azure Key Vault Integration
# Create Key Vault-backed secret scope
databricks secrets create-scope \
--scope azure-kv-scope \
--scope-backend-type AZURE_KEYVAULT \
--resource-id /subscriptions/.../resourceGroups/.../providers/Microsoft.KeyVault/vaults/my-vault \
--dns-name https://my-vault.vault.azure.net/
5. Monitoring & Detection
5.1 Security Monitoring
Profile Level: L1 (Baseline) NIST 800-53: SI-4
Detection Queries
-- Detect bulk data access
SELECT
user_identity.email,
request_params.full_name_arg as table_name,
COUNT(*) as access_count
FROM system.access.audit
WHERE action_name = 'commandSubmit'
AND event_time > current_timestamp() - INTERVAL 1 HOUR
GROUP BY user_identity.email, request_params.full_name_arg
HAVING COUNT(*) > 100;
-- Detect unusual export operations
SELECT *
FROM system.access.audit
WHERE action_name IN ('downloadResults', 'exportResults')
AND event_time > current_timestamp() - INTERVAL 24 HOURS
ORDER BY event_time DESC;
-- Detect service principal anomalies
SELECT
user_identity.email,
source_ip_address,
COUNT(*) as request_count
FROM system.access.audit
WHERE user_identity.email LIKE 'svc-%'
AND source_ip_address NOT IN (SELECT ip FROM trusted_ips)
AND event_time > current_timestamp() - INTERVAL 1 HOUR
GROUP BY user_identity.email, source_ip_address;
6. Compliance Quick Reference
SOC 2 Mapping
| Control ID | Databricks Control | Guide Section |
|---|---|---|
| CC6.1 | SSO enforcement | 1.1 |
| CC6.2 | Unity Catalog permissions | 2.1 |
| CC6.7 | Data masking | 2.2 |
Appendix A: Edition Compatibility
| Control | Standard | Premium | Enterprise |
|---|---|---|---|
| SSO (SAML) | ❌ | ✅ | ✅ |
| Unity Catalog | ✅ | ✅ | ✅ |
| IP Access Lists | ❌ | ✅ | ✅ |
| Customer-Managed VPC | ❌ | ✅ | ✅ |
| Private Link | ❌ | ❌ | ✅ |
Changelog
| Date | Version | Maturity | Changes | Author |
|---|---|---|---|---|
| 2025-12-14 | 0.1.0 | draft | Initial Databricks hardening guide | Claude Code (Opus 4.5) |