v0.1.0-draft AI Drafted

Databricks Hardening Guide

Data Platform Last updated: 2025-12-14

Data platform security for workspace access, Unity Catalog, and secrets management

Overview

Databricks serves 10,000+ customers with Unity Catalog governing data lake access. OAuth federation with Snowflake, service principal credentials, and cluster access tokens create attack vectors. Databricks workspaces contain raw enterprise data, ML models, and training datasets making them high-value targets for data exfiltration and IP theft.

Intended Audience

  • Security engineers hardening data platforms
  • Data engineers configuring Databricks
  • GRC professionals assessing data governance
  • Third-party risk managers evaluating analytics integrations

How to Use This Guide

  • L1 (Baseline): Essential controls for all organizations
  • L2 (Hardened): Enhanced controls for security-sensitive environments
  • L3 (Maximum Security): Strictest controls for regulated industries

Scope

This guide covers Databricks security configurations including authentication, Unity Catalog governance, cluster security, and secrets management.


Table of Contents

  1. Authentication & Access Controls
  2. Unity Catalog Security
  3. Cluster Security
  4. Secrets Management
  5. Monitoring & Detection
  6. Compliance Quick Reference

1. Authentication & Access Controls

1.1 Enforce SSO with MFA

Profile Level: L1 (Baseline) NIST 800-53: IA-2(1)

Description

Require SAML SSO with MFA for all Databricks access.

ClickOps Implementation

Step 1: Configure SAML SSO

  1. Navigate to: Admin Settings → Identity and Access → Single Sign-On
  2. Configure:
    • IdP Entity ID: From your identity provider
    • SSO URL: IdP login endpoint
    • Certificate: Upload IdP certificate

Step 2: Enforce SSO

  1. Enable: Require users to log in with SSO
  2. Disable: Allow local password login

Step 3: Configure SCIM Provisioning

  1. Navigate to: Admin Settings → Identity and Access → SCIM Provisioning
  2. Configure connector with your IdP
  3. Enable: Automatic user provisioning

1.2 Implement Service Principal Security

Profile Level: L1 (Baseline) NIST 800-53: IA-5

Description

Secure service principals used for automation and integrations.

Rationale

Why This Matters:

  • Service principals enable programmatic access
  • OAuth tokens for service principals can have long validity
  • Compromised service principal = bulk data access

Attack Scenario: Compromised service principal accesses data lakehouse; malicious notebook executes data exfiltration.

ClickOps Implementation

Step 1: Create Purpose-Specific Service Principals

  1. Navigate to: Admin Settings → Identity and Access → Service Principals
  2. Create principals for each integration:
    • svc-etl-pipeline (ETL jobs)
    • svc-ml-training (ML workloads)
    • svc-reporting (BI tools)

Step 2: Assign Minimal Permissions

  1. Navigate to: Unity Catalog → Grants
  2. For each service principal:
    • Grant only required catalogs
    • Grant only required schemas
    • Prefer SELECT over ALL PRIVILEGES

Step 3: Configure OAuth Tokens

  1. Generate OAuth tokens for service principals
  2. Set appropriate token lifetime
  3. Store tokens in secrets manager
  4. Rotate tokens quarterly

1.3 Configure IP Access Lists

Profile Level: L2 (Hardened) NIST 800-53: AC-3(7)

Description

Restrict Databricks access to known IP ranges.

ClickOps Implementation

Step 1: Configure IP Access Lists

  1. Navigate to: Admin Settings → Security → IP Access Lists
  2. Add allowed IP ranges:
    • Corporate network
    • VPN egress
    • Approved integration IPs
  3. Enable: Block public access (L2)

2. Unity Catalog Security

2.1 Implement Data Governance

Profile Level: L1 (Baseline) NIST 800-53: AC-3

Description

Configure Unity Catalog for centralized data governance.

ClickOps Implementation

Step 1: Create Catalog Structure

-- Create catalogs by environment
CREATE CATALOG IF NOT EXISTS production;
CREATE CATALOG IF NOT EXISTS staging;
CREATE CATALOG IF NOT EXISTS development;

-- Create schemas by domain
CREATE SCHEMA IF NOT EXISTS production.finance;
CREATE SCHEMA IF NOT EXISTS production.customer_data;
CREATE SCHEMA IF NOT EXISTS production.ml_features;

Step 2: Configure Granular Permissions

-- Grant specific permissions
GRANT USE CATALOG ON CATALOG production TO `data_analysts`;
GRANT USE SCHEMA ON SCHEMA production.finance TO `finance_team`;
GRANT SELECT ON TABLE production.finance.transactions TO `finance_team`;

-- Restrict sensitive tables
DENY SELECT ON TABLE production.customer_data.pii TO `general_users`;

Step 3: Enable Column-Level Security

-- Create row filter function
CREATE FUNCTION production.filters.region_filter()
RETURNS STRING
RETURN CASE
    WHEN is_account_group_member('us_team') THEN 'region = "US"'
    WHEN is_account_group_member('eu_team') THEN 'region = "EU"'
    ELSE 'FALSE'
END;

-- Apply to table
ALTER TABLE production.customer_data.orders
SET ROW FILTER production.filters.region_filter ON (region);

2.2 Configure Data Masking

Profile Level: L2 (Hardened) NIST 800-53: SC-28

Description

Implement dynamic data masking for sensitive columns.

-- Create masking function
CREATE FUNCTION production.masks.mask_ssn(ssn STRING)
RETURNS STRING
RETURN CASE
    WHEN is_account_group_member('pii_admin') THEN ssn
    ELSE CONCAT('XXX-XX-', RIGHT(ssn, 4))
END;

-- Apply mask to column
ALTER TABLE production.customer_data.customers
ALTER COLUMN ssn SET MASK production.masks.mask_ssn;

2.3 Audit Logging for Data Access

Profile Level: L1 (Baseline) NIST 800-53: AU-2, AU-3

Description

Enable comprehensive audit logging for data access.

ClickOps Implementation

Step 1: Enable System Tables

  1. Navigate to: Admin Settings → System Tables
  2. Enable: Access audit logs
  3. Configure retention period

Step 2: Query Audit Logs

-- Query data access audit logs
SELECT
    event_time,
    user_identity.email as user_email,
    action_name,
    request_params.full_name_arg as table_accessed,
    source_ip_address
FROM system.access.audit
WHERE action_name IN ('getTable', 'commandSubmit')
    AND event_time > current_timestamp() - INTERVAL 24 HOURS
ORDER BY event_time DESC;

3. Cluster Security

3.1 Configure Cluster Policies

Profile Level: L1 (Baseline) NIST 800-53: CM-7

Description

Implement cluster policies to enforce security configurations.

ClickOps Implementation

Step 1: Create Secure Cluster Policy

  1. Navigate to: Compute → Policies → Create Policy
  2. Configure:
{
  "spark_version": {
    "type": "allowlist",
    "values": ["13.3.x-scala2.12", "14.0.x-scala2.12"]
  },
  "node_type_id": {
    "type": "allowlist",
    "values": ["Standard_DS3_v2", "Standard_DS4_v2"]
  },
  "autotermination_minutes": {
    "type": "range",
    "minValue": 10,
    "maxValue": 120,
    "defaultValue": 30
  },
  "custom_tags.Environment": {
    "type": "fixed",
    "value": "production"
  },
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "fixed",
    "value": "serverless"
  },
  "init_scripts": {
    "type": "fixed",
    "value": []
  }
}

Step 2: Assign Policy to Users

  1. Navigate to: Admin Settings → Workspace → Cluster Policies
  2. Assign policy to appropriate groups
  3. Set as default for users

3.2 Network Isolation

Profile Level: L2 (Hardened) NIST 800-53: SC-7

Description

Deploy Databricks with network isolation.

Implementation

VPC/VNet Deployment:

  1. Deploy workspace in customer-managed VPC
  2. Configure private endpoints
  3. Disable public IP addresses for clusters
# Terraform example - Private workspace
resource "databricks_mws_workspaces" "this" {
  account_id     = var.databricks_account_id
  workspace_name = "secure-workspace"
  deployment_name = "secure"

  aws_region = var.region

  network_id = databricks_mws_networks.this.network_id

  # Private configuration
  private_access_settings_id = databricks_mws_private_access_settings.this.private_access_settings_id
}

resource "databricks_mws_private_access_settings" "this" {
  private_access_settings_name = "secure-pas"
  region                       = var.region
  public_access_enabled        = false
}

4. Secrets Management

4.1 Use Databricks Secret Scopes

Profile Level: L1 (Baseline) NIST 800-53: SC-28

Description

Store credentials in Databricks secret scopes rather than notebooks.

ClickOps Implementation

Step 1: Create Secret Scope

# Create secret scope backed by Databricks
databricks secrets create-scope --scope production-secrets

# Add secrets
databricks secrets put --scope production-secrets --key db-password
databricks secrets put --scope production-secrets --key api-key

Step 2: Configure Access Controls

# Grant read access to specific group
databricks secrets put-acl \
  --scope production-secrets \
  --principal data_engineers \
  --permission READ

Step 3: Use Secrets in Notebooks

# Access secrets in notebook
db_password = dbutils.secrets.get(scope="production-secrets", key="db-password")

# Secret is redacted in logs
print(db_password)  # Shows [REDACTED]

4.2 External Secret Store Integration

Profile Level: L2 (Hardened) NIST 800-53: SC-28

Description

Integrate with external secrets managers.

Azure Key Vault Integration

# Create Key Vault-backed secret scope
databricks secrets create-scope \
  --scope azure-kv-scope \
  --scope-backend-type AZURE_KEYVAULT \
  --resource-id /subscriptions/.../resourceGroups/.../providers/Microsoft.KeyVault/vaults/my-vault \
  --dns-name https://my-vault.vault.azure.net/

5. Monitoring & Detection

5.1 Security Monitoring

Profile Level: L1 (Baseline) NIST 800-53: SI-4

Detection Queries

-- Detect bulk data access
SELECT
    user_identity.email,
    request_params.full_name_arg as table_name,
    COUNT(*) as access_count
FROM system.access.audit
WHERE action_name = 'commandSubmit'
    AND event_time > current_timestamp() - INTERVAL 1 HOUR
GROUP BY user_identity.email, request_params.full_name_arg
HAVING COUNT(*) > 100;

-- Detect unusual export operations
SELECT *
FROM system.access.audit
WHERE action_name IN ('downloadResults', 'exportResults')
    AND event_time > current_timestamp() - INTERVAL 24 HOURS
ORDER BY event_time DESC;

-- Detect service principal anomalies
SELECT
    user_identity.email,
    source_ip_address,
    COUNT(*) as request_count
FROM system.access.audit
WHERE user_identity.email LIKE 'svc-%'
    AND source_ip_address NOT IN (SELECT ip FROM trusted_ips)
    AND event_time > current_timestamp() - INTERVAL 1 HOUR
GROUP BY user_identity.email, source_ip_address;

6. Compliance Quick Reference

SOC 2 Mapping

Control ID Databricks Control Guide Section
CC6.1 SSO enforcement 1.1
CC6.2 Unity Catalog permissions 2.1
CC6.7 Data masking 2.2

Appendix A: Edition Compatibility

Control Standard Premium Enterprise
SSO (SAML)
Unity Catalog
IP Access Lists
Customer-Managed VPC
Private Link

Changelog

Date Version Maturity Changes Author
2025-12-14 0.1.0 draft Initial Databricks hardening guide Claude Code (Opus 4.5)