v0.2.0-draft AI Drafted

Databricks Hardening Guide

Data Last updated: 2026-02-19

Data platform security for workspace access, Unity Catalog, and secrets management

Overview

Databricks serves 10,000+ customers with Unity Catalog governing data lake access. OAuth federation with Snowflake, service principal credentials, and cluster access tokens create attack vectors. Databricks workspaces contain raw enterprise data, ML models, and training datasets making them high-value targets for data exfiltration and IP theft.

Intended Audience

  • Security engineers hardening data platforms
  • Data engineers configuring Databricks
  • GRC professionals assessing data governance
  • Third-party risk managers evaluating analytics integrations

How to Use This Guide

  • L1 (Baseline): Essential controls for all organizations
  • L2 (Hardened): Enhanced controls for security-sensitive environments
  • L3 (Maximum Security): Strictest controls for regulated industries

Scope

This guide covers Databricks security configurations including authentication, Unity Catalog governance, cluster security, and secrets management.


Table of Contents

  1. Authentication & Access Controls
  2. Unity Catalog Security
  3. Cluster Security
  4. Secrets Management
  5. Monitoring & Detection
  6. Compliance Quick Reference

1. Authentication & Access Controls

1.1 Enforce SSO with MFA

Profile Level: L1 (Baseline) NIST 800-53: IA-2(1)

Description

Require SAML SSO with MFA for all Databricks access.

ClickOps Implementation

Step 1: Configure SAML SSO

  1. Navigate to: Admin Settings → Identity and Access → Single Sign-On
  2. Configure:
    • IdP Entity ID: From your identity provider
    • SSO URL: IdP login endpoint
    • Certificate: Upload IdP certificate

Step 2: Enforce SSO

  1. Enable: Require users to log in with SSO
  2. Disable: Allow local password login

Step 3: Configure SCIM Provisioning

  1. Navigate to: Admin Settings → Identity and Access → SCIM Provisioning
  2. Configure connector with your IdP
  3. Enable: Automatic user provisioning
Code Pack: Terraform
hth-databricks-1.01-enforce-sso-with-mfa.tf View source on GitHub ↗
# Enforce SSO-only login by disabling local password authentication
resource "databricks_workspace_conf" "sso_enforcement" {
  custom_config = {
    "enableTokensConfig"     = false
    "enableIpAccessLists"    = var.profile_level >= 2
  }
}

1.2 Implement Service Principal Security

Profile Level: L1 (Baseline) NIST 800-53: IA-5

Description

Secure service principals used for automation and integrations.

Rationale

Why This Matters:

  • Service principals enable programmatic access
  • OAuth tokens for service principals can have long validity
  • Compromised service principal = bulk data access

Attack Scenario: Compromised service principal accesses data lakehouse; malicious notebook executes data exfiltration.

ClickOps Implementation

Step 1: Create Purpose-Specific Service Principals

  1. Navigate to: Admin Settings → Identity and Access → Service Principals
  2. Create principals for each integration:
    • svc-etl-pipeline (ETL jobs)
    • svc-ml-training (ML workloads)
    • svc-reporting (BI tools)

Step 2: Assign Minimal Permissions

  1. Navigate to: Unity Catalog → Grants
  2. For each service principal:
    • Grant only required catalogs
    • Grant only required schemas
    • Prefer SELECT over ALL PRIVILEGES

Step 3: Configure OAuth Tokens

  1. Generate OAuth tokens for service principals
  2. Set appropriate token lifetime
  3. Store tokens in secrets manager
  4. Rotate tokens quarterly
Code Pack: Terraform
hth-databricks-1.02-service-principal-security.tf View source on GitHub ↗
# Create purpose-specific service principals for automation
resource "databricks_service_principal" "automation" {
  for_each = var.service_principals

  display_name = each.value.display_name
  active       = true
}

# Grant workspace-level permissions to service principals
# Each service principal gets CAN_ATTACH_TO on designated clusters only
resource "databricks_permissions" "service_principal_cluster_usage" {
  for_each = var.service_principals

  cluster_policy_id = databricks_cluster_policy.hardened.id

  access_control {
    service_principal_name = databricks_service_principal.automation[each.key].application_id
    permission_level       = "CAN_USE"
  }
}

1.3 Configure IP Access Lists

Profile Level: L2 (Hardened) NIST 800-53: AC-3(7)

Description

Restrict Databricks access to known IP ranges.

ClickOps Implementation

Step 1: Configure IP Access Lists

  1. Navigate to: Admin Settings → Security → IP Access Lists
  2. Add allowed IP ranges:
    • Corporate network
    • VPN egress
    • Approved integration IPs
  3. Enable: Block public access (L2)
Code Pack: Terraform
hth-databricks-1.03-ip-access-lists.tf View source on GitHub ↗
# Allowlist: Restrict workspace access to known corporate IP ranges (L2+)
resource "databricks_ip_access_list" "allow_corporate" {
  count = var.profile_level >= 2 && length(var.allowed_ip_cidrs) > 0 ? 1 : 0

  label        = "HTH - Corporate Network Allow List"
  list_type    = "ALLOW"
  ip_addresses = var.allowed_ip_cidrs

  depends_on = [databricks_workspace_conf.sso_enforcement]
}

# Blocklist: Deny access from known-bad IP ranges (L2+)
resource "databricks_ip_access_list" "block_bad_ips" {
  count = var.profile_level >= 2 && length(var.blocked_ip_cidrs) > 0 ? 1 : 0

  label        = "HTH - Blocked IP Ranges"
  list_type    = "BLOCK"
  ip_addresses = var.blocked_ip_cidrs

  depends_on = [databricks_workspace_conf.sso_enforcement]
}

2. Unity Catalog Security

2.1 Implement Data Governance

Profile Level: L1 (Baseline) NIST 800-53: AC-3

Description

Configure Unity Catalog for centralized data governance.

ClickOps Implementation

Step 1: Create Catalog Structure

Create catalogs by environment (production, staging, development) and schemas by domain. See the DB Query Code Pack below for the full SQL.

Step 2: Configure Granular Permissions

Grant specific catalog, schema, and table permissions to functional roles. See the DB Query Code Pack below for permission examples.

Step 3: Enable Column-Level Security

Create row filter functions to restrict data visibility by group membership and apply them to tables. See the DB Query Code Pack below.

Code Pack: Terraform
hth-databricks-2.01-data-governance.tf View source on GitHub ↗
# Unity Catalog grants are managed via SQL statements.
# This resource configures workspace-level Unity Catalog settings.
resource "databricks_workspace_conf" "unity_catalog_governance" {
  custom_config = {
    "enableUnityCatalog" = true
  }
}

# Restrict catalog creation to admins only via SQL global config
resource "databricks_sql_global_config" "governance" {
  security_policy = "DATA_ACCESS_CONTROL"

  data_access_config = {
    "spark.databricks.unityCatalog.enabled" = "true"
  }
}
Code Pack: DB Query
hth-databricks-2.01-data-governance.sql View source on GitHub ↗
-- Create catalogs by environment
CREATE CATALOG IF NOT EXISTS production;
CREATE CATALOG IF NOT EXISTS staging;
CREATE CATALOG IF NOT EXISTS development;

-- Create schemas by domain
CREATE SCHEMA IF NOT EXISTS production.finance;
CREATE SCHEMA IF NOT EXISTS production.customer_data;
CREATE SCHEMA IF NOT EXISTS production.ml_features;
-- Grant specific permissions
GRANT USE CATALOG ON CATALOG production TO `data_analysts`;
GRANT USE SCHEMA ON SCHEMA production.finance TO `finance_team`;
GRANT SELECT ON TABLE production.finance.transactions TO `finance_team`;

-- Restrict sensitive tables
DENY SELECT ON TABLE production.customer_data.pii TO `general_users`;
-- Create row filter function
CREATE FUNCTION production.filters.region_filter()
RETURNS STRING
RETURN CASE
    WHEN is_account_group_member('us_team') THEN 'region = "US"'
    WHEN is_account_group_member('eu_team') THEN 'region = "EU"'
    ELSE 'FALSE'
END;

-- Apply to table
ALTER TABLE production.customer_data.orders
SET ROW FILTER production.filters.region_filter ON (region);

2.2 Configure Data Masking

Profile Level: L2 (Hardened) NIST 800-53: SC-28

Description

Implement dynamic data masking for sensitive columns. Create masking functions that return the full value for privileged roles and masked values for all others, then apply them to sensitive columns.

Code Pack: Terraform
hth-databricks-2.02-data-masking.tf View source on GitHub ↗
# Note: Dynamic data masking in Unity Catalog is configured via SQL statements
# (CREATE FUNCTION + ALTER TABLE ... SET MASK). This control enforces the
# workspace-level configuration that enables column masking support.
#
# The SQL masking functions below should be applied via databricks_sql_query
# or a separate SQL migration pipeline:
#
#   CREATE FUNCTION production.masks.mask_ssn(ssn STRING)
#   RETURNS STRING
#   RETURN CASE
#       WHEN is_account_group_member('pii_admin') THEN ssn
#       ELSE CONCAT('XXX-XX-', RIGHT(ssn, 4))
#   END;
#
#   ALTER TABLE production.customer_data.customers
#   ALTER COLUMN ssn SET MASK production.masks.mask_ssn;

# Enable table access control to support column-level masking (L2+)
resource "databricks_workspace_conf" "data_masking" {
  count = var.profile_level >= 2 ? 1 : 0

  custom_config = {
    "enableTableAccessControl" = true
  }
}
Code Pack: DB Query
hth-databricks-2.02-data-masking.sql View source on GitHub ↗
-- Create masking function for SSN
CREATE FUNCTION production.masks.mask_ssn(ssn STRING)
RETURNS STRING
RETURN CASE
    WHEN is_account_group_member('pii_admin') THEN ssn
    ELSE CONCAT('XXX-XX-', RIGHT(ssn, 4))
END;

-- Apply mask to column
ALTER TABLE production.customer_data.customers
ALTER COLUMN ssn SET MASK production.masks.mask_ssn;

2.3 Audit Logging for Data Access

Profile Level: L1 (Baseline) NIST 800-53: AU-2, AU-3

Description

Enable comprehensive audit logging for data access.

ClickOps Implementation

Step 1: Enable System Tables

  1. Navigate to: Admin Settings → System Tables
  2. Enable: Access audit logs
  3. Configure retention period

Step 2: Query Audit Logs

Query the system.access.audit table to review data access events. See the DB Query Code Pack below for the full audit log query.

Code Pack: Terraform
hth-databricks-2.03-audit-logging.tf View source on GitHub ↗
# Enable system tables for audit log access
# System tables (system.access.audit) provide comprehensive audit logging
# for all workspace events including data access, cluster operations, and
# administrative changes.
resource "databricks_workspace_conf" "audit_logging" {
  custom_config = {
    "enableDbfsFileBrowser" = false
    "enableExportNotebook"  = var.profile_level >= 3 ? false : true
  }
}

# Note: Verbose audit log queries should be scheduled via Databricks SQL:
#
#   SELECT event_time, user_identity.email as user_email,
#          action_name, request_params.full_name_arg as table_accessed,
#          source_ip_address
#   FROM system.access.audit
#   WHERE action_name IN ('getTable', 'commandSubmit')
#     AND event_time > current_timestamp() - INTERVAL 24 HOURS
#   ORDER BY event_time DESC;
Code Pack: DB Query
hth-databricks-2.03-audit-logging.sql View source on GitHub ↗
-- Query data access audit logs
SELECT
    event_time,
    user_identity.email as user_email,
    action_name,
    request_params.full_name_arg as table_accessed,
    source_ip_address
FROM system.access.audit
WHERE action_name IN ('getTable', 'commandSubmit')
    AND event_time > current_timestamp() - INTERVAL 24 HOURS
ORDER BY event_time DESC;

3. Cluster Security

3.1 Configure Cluster Policies

Profile Level: L1 (Baseline) NIST 800-53: CM-7

Description

Implement cluster policies to enforce security configurations.

ClickOps Implementation

Step 1: Create Secure Cluster Policy

  1. Navigate to: Compute –> Policies –> Create Policy
  2. Configure the cluster policy JSON to restrict allowed Spark versions, node types, auto-termination, and init scripts. See the Code Pack below for the full policy definition.

Step 2: Assign Policy to Users

  1. Navigate to: Admin Settings → Workspace → Cluster Policies
  2. Assign policy to appropriate groups
  3. Set as default for users
Code Pack: Terraform
hth-databricks-3.01-cluster-policies.tf View source on GitHub ↗
# Hardened cluster policy enforcing approved runtimes, node types,
# auto-termination, and init script restrictions
resource "databricks_cluster_policy" "hardened" {
  name = "HTH - Hardened Cluster Policy"

  definition = jsonencode({
    "spark_version" = {
      "type"   = "allowlist"
      "values" = var.allowed_spark_versions
    }
    "node_type_id" = {
      "type"   = "allowlist"
      "values" = var.allowed_node_types
    }
    "autotermination_minutes" = {
      "type"         = "range"
      "minValue"     = 10
      "maxValue"     = var.autotermination_minutes_max
      "defaultValue" = var.autotermination_minutes_default
    }
    "custom_tags.Environment" = {
      "type"  = "fixed"
      "value" = "production"
    }
    "custom_tags.ManagedBy" = {
      "type"  = "fixed"
      "value" = "howtoharden"
    }
    "init_scripts" = {
      "type"  = "fixed"
      "value" = []
    }
    "enable_elastic_disk" = {
      "type"         = "fixed"
      "value"        = true
      "hidden"       = true
    }
  })
}

# Grant CAN_USE on the hardened policy to all users
resource "databricks_permissions" "cluster_policy_usage" {
  cluster_policy_id = databricks_cluster_policy.hardened.id

  access_control {
    group_name       = "users"
    permission_level = "CAN_USE"
  }
}

3.2 Network Isolation

Profile Level: L2 (Hardened) NIST 800-53: SC-7

Description

Deploy Databricks with network isolation.

Implementation

VPC/VNet Deployment:

  1. Deploy workspace in customer-managed VPC
  2. Configure private endpoints
  3. Disable public IP addresses for clusters

The account-level Terraform example for private workspace deployment with VPC isolation is included in the Code Pack below.

Code Pack: Terraform
hth-databricks-3.02-network-isolation.tf View source on GitHub ↗
# Enforce no public IP addresses on cluster nodes (L2+)
# This adds a cluster policy overlay that prevents public IP assignment
resource "databricks_cluster_policy" "network_isolation" {
  count = var.profile_level >= 2 ? 1 : 0

  name = "HTH - Network Isolation Policy"

  definition = jsonencode({
    "enable_local_disk_encryption" = {
      "type"  = "fixed"
      "value" = true
    }
    "azure_attributes.availability" = {
      "type"         = "allowlist"
      "values"       = ["ON_DEMAND_AZURE"]
      "defaultValue" = "ON_DEMAND_AZURE"
    }
    "custom_tags.NetworkIsolation" = {
      "type"  = "fixed"
      "value" = "enabled"
    }
  })
}

# Grant CAN_USE on the network isolation policy (L2+)
resource "databricks_permissions" "network_isolation_usage" {
  count = var.profile_level >= 2 ? 1 : 0

  cluster_policy_id = databricks_cluster_policy.network_isolation[0].id

  access_control {
    group_name       = "users"
    permission_level = "CAN_USE"
  }
}
# Account-level: Private workspace deployment with VPC isolation
resource "databricks_mws_workspaces" "this" {
  account_id      = var.databricks_account_id
  workspace_name  = "secure-workspace"
  deployment_name = "secure"

  aws_region = var.region

  network_id = databricks_mws_networks.this.network_id

  # Private configuration
  private_access_settings_id = databricks_mws_private_access_settings.this.private_access_settings_id
}

resource "databricks_mws_private_access_settings" "this" {
  private_access_settings_name = "secure-pas"
  region                       = var.region
  public_access_enabled        = false
}

4. Secrets Management

4.1 Use Databricks Secret Scopes

Profile Level: L1 (Baseline) NIST 800-53: SC-28

Description

Store credentials in Databricks secret scopes rather than notebooks.

ClickOps Implementation

Step 1: Create Secret Scope

  1. Navigate to: Databricks CLI or Admin Settings → Secrets
  2. Create a Databricks-backed secret scope for your environment
  3. Add required secrets (database passwords, API keys)

Step 2: Configure Access Controls

  1. Set ACLs on the secret scope
  2. Grant READ access to groups that need credential access
  3. Restrict MANAGE access to administrators only

Step 3: Use Secrets in Notebooks

Access secrets via dbutils.secrets.get() in notebooks. Secret values are automatically redacted in logs. See the SDK Code Pack below for an example.

Code Pack: Terraform
hth-databricks-4.01-secret-scopes.tf View source on GitHub ↗
# Create Databricks-backed secret scopes for credential storage
resource "databricks_secret_scope" "managed" {
  for_each = var.secret_scopes

  name                     = each.key
  initial_manage_principal = each.value.initial_manage_principal
}

# Grant READ access to the data_engineers group on each secret scope
resource "databricks_secret_acl" "data_engineers_read" {
  for_each = var.secret_scopes

  scope      = databricks_secret_scope.managed[each.key].name
  principal  = "data_engineers"
  permission = "READ"
}
Code Pack: CLI Script
hth-databricks-4.01-manage-secret-scopes.sh View source on GitHub ↗
# Create secret scope backed by Databricks
databricks secrets create-scope --scope production-secrets

# Add secrets
databricks secrets put --scope production-secrets --key db-password
databricks secrets put --scope production-secrets --key api-key

# Grant read access to specific group
databricks secrets put-acl \
  --scope production-secrets \
  --principal data_engineers \
  --permission READ
Code Pack: SDK Script
hth-databricks-4.01-use-secrets-in-notebook.py View source on GitHub ↗
# Access secrets in notebook
db_password = dbutils.secrets.get(scope="production-secrets", key="db-password")

# Secret is redacted in logs
print(db_password)  # Shows [REDACTED]

4.2 External Secret Store Integration

Profile Level: L2 (Hardened) NIST 800-53: SC-28

Description

Integrate with external secrets managers.

Azure Key Vault Integration

Create an Azure Key Vault-backed secret scope so secrets are fetched directly from Key Vault at runtime rather than stored in Databricks. This provides centralized secret lifecycle management and audit logging through Azure.

Code Pack: Terraform
hth-databricks-4.02-external-secret-store.tf View source on GitHub ↗
# Azure Key Vault-backed secret scope (L2+, Azure only)
# Secrets are fetched directly from Key Vault at runtime rather than
# stored in Databricks, providing centralized secret lifecycle management.
resource "databricks_secret_scope" "azure_keyvault" {
  count = var.profile_level >= 2 && var.azure_keyvault_resource_id != "" ? 1 : 0

  name = "azure-kv-scope"

  keyvault_metadata {
    resource_id = var.azure_keyvault_resource_id
    dns_name    = var.azure_keyvault_dns_name
  }
}

# Note: For AWS deployments, use AWS Secrets Manager or Parameter Store
# integration via instance profiles and IAM roles on the cluster. Configure
# the cluster policy to enforce the required instance profile ARN.
#
# For GCP deployments, use Google Secret Manager via workload identity
# federation configured on the cluster service account.
Code Pack: CLI Script
hth-databricks-4.02-azure-keyvault-scope.sh View source on GitHub ↗
# Create Key Vault-backed secret scope
databricks secrets create-scope \
  --scope azure-kv-scope \
  --scope-backend-type AZURE_KEYVAULT \
  --resource-id /subscriptions/.../resourceGroups/.../providers/Microsoft.KeyVault/vaults/my-vault \
  --dns-name https://my-vault.vault.azure.net/

5. Monitoring & Detection

5.1 Security Monitoring

Profile Level: L1 (Baseline) NIST 800-53: SI-4

Detection Queries

Detection queries for bulk data access, unusual exports, and service principal anomalies are provided in the DB Query Code Pack below.

Code Pack: Terraform
hth-databricks-5.01-security-monitoring.tf View source on GitHub ↗
# Workspace configuration for security monitoring
# Disables risky features that complicate audit trails
resource "databricks_workspace_conf" "security_monitoring" {
  custom_config = {
    # Disable DBFS file browser to prevent unaudited file access
    "enableDbfsFileBrowser" = false

    # Restrict notebook export to prevent data exfiltration (L3)
    "enableExportNotebook" = var.profile_level >= 3 ? false : true

    # Disable results download for non-admin users (L2+)
    "enableResultsDownloading" = var.profile_level >= 2 ? false : true
  }
}

# Note: Detection queries should be scheduled as Databricks SQL alerts:
#
# Bulk data access detection:
#   SELECT user_identity.email, request_params.full_name_arg as table_name,
#          COUNT(*) as access_count
#   FROM system.access.audit
#   WHERE action_name = 'commandSubmit'
#     AND event_time > current_timestamp() - INTERVAL 1 HOUR
#   GROUP BY user_identity.email, request_params.full_name_arg
#   HAVING COUNT(*) > 100;
#
# Unusual export detection:
#   SELECT * FROM system.access.audit
#   WHERE action_name IN ('downloadResults', 'exportResults')
#     AND event_time > current_timestamp() - INTERVAL 24 HOURS;
#
# Service principal anomaly detection:
#   SELECT user_identity.email, source_ip_address, COUNT(*) as request_count
#   FROM system.access.audit
#   WHERE user_identity.email LIKE 'svc-%'
#     AND event_time > current_timestamp() - INTERVAL 1 HOUR
#   GROUP BY user_identity.email, source_ip_address;
Code Pack: DB Query
hth-databricks-5.01-security-monitoring.sql View source on GitHub ↗
-- Detect bulk data access (>100 queries/hour to a single table)
SELECT
    user_identity.email,
    request_params.full_name_arg as table_name,
    COUNT(*) as access_count
FROM system.access.audit
WHERE action_name = 'commandSubmit'
    AND event_time > current_timestamp() - INTERVAL 1 HOUR
GROUP BY user_identity.email, request_params.full_name_arg
HAVING COUNT(*) > 100;
-- Detect unusual export operations (last 24 hours)
SELECT *
FROM system.access.audit
WHERE action_name IN ('downloadResults', 'exportResults')
    AND event_time > current_timestamp() - INTERVAL 24 HOURS
ORDER BY event_time DESC;
-- Detect service principal anomalies (access from untrusted IPs)
SELECT
    user_identity.email,
    source_ip_address,
    COUNT(*) as request_count
FROM system.access.audit
WHERE user_identity.email LIKE 'svc-%'
    AND source_ip_address NOT IN (SELECT ip FROM trusted_ips)
    AND event_time > current_timestamp() - INTERVAL 1 HOUR
GROUP BY user_identity.email, source_ip_address;

6. Compliance Quick Reference

SOC 2 Mapping

Control ID Databricks Control Guide Section
CC6.1 SSO enforcement 1.1
CC6.2 Unity Catalog permissions 2.1
CC6.7 Data masking 2.2

Appendix A: Edition Compatibility

Control Standard Premium Enterprise
SSO (SAML)
Unity Catalog
IP Access Lists
Customer-Managed VPC
Private Link

Appendix B: References

Official Databricks Documentation:

API Documentation:

Compliance Frameworks:

  • SOC 2 Type II, ISO 27001:2022, HIPAA, PCI DSS, FedRAMP Moderate (AWS SQL Serverless), HITRUST CSF (Azure) — via Databricks Compliance

Security Incidents:

  • No major public data breaches affecting Databricks customers have been identified. A platform vulnerability discovered by Orca Security in April 2023 was promptly remediated. Databricks maintains annual third-party penetration testing and a documented security incident response program.

Changelog

Date Version Maturity Changes Author
2025-12-14 0.1.0 draft Initial Databricks hardening guide Claude Code (Opus 4.5)
2026-02-19 0.2.0 draft Migrate all remaining inline code to Code Packs (sections 2.1, 2.2, 2.3, 3.1, 3.2, 4.1, 5.1); zero inline code blocks remain Claude Code (Opus 4.6)
2026-02-19 0.1.1 draft Migrate inline CLI code in sections 4.1, 4.2 to Code Pack files Claude Code (Opus 4.6)