← BACK TO BLOG

Building a Production-Ready Data Pipeline with Azure (Part 4): Migrating Mount Points to Unity Catalog

Jun 27, 202510 min read
  • Unity Catalog
  • Azure Storage
  • Security
  • Data Governance

Building a Production Ready Data Pipeline with Azure Part 4: From Mount Points to Unity Catalog Direct Storage

Welcome to Part 4 of my comprehensive series on building production-ready data pipelines with Azure! In this installment, I’ll tackle one of the most critical migrations in modern data engineering: transitioning from traditional mount points to Unity Catalog’s External Locations.

Series Overview

This article is part of my ongoing series:

Introduction

In Parts 1–3, I built a robust data pipeline using Azure Data Factory, Databricks, and Unity Catalog. However, my implementation still relied on mount points for storage access - a legacy approach that doesn’t fully leverage Unity Catalog’s capabilities.

In this article, I’ll complete the modernization journey by migrating from mount points to Unity Catalog’s External Locations, achieving true cloud-native data governance.

For official documentation, refer to:

Why This Migration Matters

If you’ve been following my series, you’ve seen how I progressively enhanced the data pipeline:

  • In Part 1, I established a solid medallion architecture with Bronze, Silver, and Gold layers
  • In Part 2, I integrated Unity Catalog for governance
  • In Part 3, I optimized table management with external and managed tables

However, I was still using mount points - a practice that limits Unity Catalog’s security and governance benefits. This final migration removes that limitation, giving me:

  • True fine-grained access control
  • Centralized credential management
  • Better audit trails
  • Simplified multi-workspace collaboration

The Problem with Mount Points

For years, mount points have been the go-to method for accessing Azure Data Lake Storage (ADLS) in Databricks. While functional, this approach has several limitations:

Security Concerns

  • Cluster-level authentication: Credentials are configured at the cluster level, giving all users the same access rights
  • No fine-grained access control: Difficult to implement row-level or column-level security
  • Credential management: Service principal credentials need to be managed and rotated manually

Operational Challenges

  • Manual setup: Each cluster requires mount point configuration
  • Lack of audit trails: Limited visibility into who accessed what data and when
  • No centralized governance: Each workspace manages its own mounts independently

Scalability Issues

  • Cross-workspace sharing: Difficult to share data securely across multiple workspaces
  • Maintenance overhead: As the number of clusters grows, mount management becomes cumbersome

Enter Unity Catalog

Unity Catalog addresses these limitations by providing:

  1. Centralized Governance: Single source of truth for all data assets
  2. Fine-grained Access Control: Permissions at catalog, schema, table, and even row/column level
  3. Built-in Audit Logging: Complete audit trail of all data access
  4. Credential-free Access: Managed identities and storage credentials handled automatically
  5. Cross-workspace Collaboration: Seamless data sharing across workspaces

Architecture Overview

Traditional Mount Point Architecture

┌─────────────────┐     ┌──────────────────┐
│  Databricks     │     │   ADLS Gen2      │
│  Cluster        │────▶│                  │
│  (Mount Points) │     │  /mnt/bronze     │
└─────────────────┘     │  /mnt/silver     │
                        │  /mnt/gold       │
                        └──────────────────┘

Unity Catalog Architecture

┌─────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│  Databricks     │────▶│  Unity Catalog   │────▶│   ADLS Gen2      │
│  Cluster        │     │                  │     │                  │
│  (UC Enabled)   │     │ Storage Creds    │     │  External Locs   │
└─────────────────┘     │ External Locs    │     └──────────────────┘
                        │ Permissions      │
                        └──────────────────┘

Migration Strategy

Building on the foundation I established in previous parts, my migration follows a systematic approach that preserves the existing medallion architecture while modernizing the storage access layer.

Phase 1: Assessment and Planning

  1. Inventory Current Mount Points
  • Document all existing mount points
  • Map mount paths to their ADLS locations
  • Identify access patterns and dependencies
  • List all notebooks and jobs using mount points
  1. Design Unity Catalog Structure
  • Define catalog hierarchy
  • Plan external location structure
  • Design permission model
  • Map mount points to Unity Catalog paths
  1. Check Prerequisites
  • Ensure Unity Catalog is enabled in your workspace
  • Verify Azure permissions for creating storage credentials
  • Confirm cluster compatibility with Unity Catalog

Phase 2: Unity Catalog Setup

  • Step 1: Create Storage Credentials

Storage credentials securely store authentication information for accessing cloud storage.

-- Create storage credential using Azure Managed Identity (Recommended) CREATE STORAGE CREDENTIAL IF NOT EXISTS adls_storage_credential
WITH (
  AZURE_MANAGED_IDENTITY
);
-- Alternative: Using Service Principal
CREATE STORAGE CREDENTIAL IF NOT EXISTS adls_storage_credential
WITH (
  AZURE_SERVICE_PRINCIPAL (
    TENANT_ID = '<tenant-id>',
    CLIENT_ID = '<client-id>',
    CLIENT_SECRET = '<client-secret>'
  )
);
  • Step 2: Create External Locations
-- Create external locations for each data layer
CREATE EXTERNAL LOCATION IF NOT EXISTS bronze_location
URL 'abfss://container@storageaccount.dfs.core.windows.net/bronze'
WITH (CREDENTIAL adls_storage_credential);
CREATE EXTERNAL LOCATION IF NOT EXISTS silver_location
URL 'abfss://container@storageaccount.dfs.core.windows.net/silver'
WITH (CREDENTIAL adls_storage_credential);
CREATE EXTERNAL LOCATION IF NOT EXISTS gold_location
URL 'abfss://container@storageaccount.dfs.core.windows.net/gold'
WITH (CREDENTIAL adls_storage_credential);
  • Step 3: Grant Permissions
-- Grant appropriate permissions
GRANT READ FILES ON EXTERNAL LOCATION bronze_location TO `data_engineers`;
GRANT WRITE FILES ON EXTERNAL LOCATION silver_location TO `data_engineers`;

Phase 3: Pipeline Migration

This phase builds directly on the work I did in Parts 1–3. My existing control tables and pipeline structure made the migration straightforward.

    1. Update Control Tables

I maintained a control table that stored paths for my data pipeline:

-- Before: Mount-based paths
-- /mnt/bronze/dataset/table
-- /mnt/silver/dataset/table
-- After: Direct ABFSS paths
-- abfss://container@storageaccount.dfs.core.windows.net/bronze/dataset/table
-- abfss://container@storageaccount.dfs.core.windows.net/silver/dataset/table
UPDATE control.tables
SET 
    bronze_path = REPLACE(bronze_path, '/mnt/', 'abfss://container@storageaccount.dfs.core.windows.net/'),
    silver_path = REPLACE(silver_path, '/mnt/', 'abfss://container@storageaccount.dfs.core.windows.net/') WHERE bronze_path LIKE '/mnt/%';
    1. Update Notebook Code

Building on the Unity Catalog integration from Part 2:

Original mount-based code:

# Old approach from Part 1
df = spark.read.parquet("/mnt/bronze/sales/orders") df.write.mode("overwrite").parquet("/mnt/silver/sales/orders")

Updated Unity Catalog approach:

# New approach - Unity Catalog handles authentication
bronze_path = "abfss://container@storageaccount.dfs.core.windows.net/bronze/sales/orders"
silver_path = "abfss://container@storageaccount.dfs.core.windows.net/silver/sales/orders"
df = spark.read.parquet(bronze_path) df.write.mode("overwrite").parquet(silver_path)
# Register as Unity Catalog table (as we learned in Part 3) df.write.mode("overwrite").saveAsTable("main.silver.sales_orders")
    1. Create Unity Catalog Tables
# Create external table pointing to existing data
spark.sql("""
    CREATE TABLE IF NOT EXISTS main.silver.sales_orders
    USING DELTA
    LOCATION 'abfss://container@storageaccount.dfs.core.windows.net/silver/sales/orders'
""")

Phase 4: Testing and Validation

I implemented comprehensive testing to ensure data integrity:

def validate_migration(mount_path, unity_path):
    """Validate data consistency between mount and Unity Catalog paths"""
    
    # Read from both sources
    mount_df = spark.read.parquet(mount_path) unity_df = spark.read.parquet(unity_path)
    
    # Compare counts
    mount_count = mount_df.count() unity_count = unity_df.count() assert mount_count == unity_count, f"Count mismatch: {mount_count} vs {unity_count}"
    
    # Compare schemas
    assert mount_df.schema == unity_df.schema, "Schema mismatch"
    
    # Sample data comparison
    mount_sample = mount_df.limit(1000).toPandas() unity_sample = unity_df.limit(1000).toPandas() assert mount_sample.equals(unity_sample), "Data mismatch"
    
    return True

Phase 5: Cutover and Cleanup

After successful validation, I performed the final cutover. Important: Only remove mount points after confirming all pipelines are working with Unity Catalog.

  • Step 1: Final Validation
# Ensure all pipelines have been migrated
def validate_no_mount_usage():
    """Check if any active code still uses mount points"""
    notebooks_to_check = [
        "/path/to/notebook1",
        "/path/to/notebook2"
    ]
    
    mount_usage = []
    for notebook in notebooks_to_check:
        content = dbutils.notebook.run(notebook, 0, {"dry_run": "true"}) if "/mnt/" in content:
            mount_usage.append(notebook) if mount_usage:
        print(f"Warning: {len(mount_usage)} notebooks still reference mount points") return False
    return True
  • Step 2: Remove Mount Points
# Remove mount points only after all validations pass
def cleanup_mounts():
    """Remove legacy mount points"""
    mounts_to_remove = ['/mnt/bronze', '/mnt/silver', '/mnt/gold']
    
    # First, list current mounts
    current_mounts = dbutils.fs.mounts() print(f"Current mounts: {len(current_mounts)}") for mount in mounts_to_remove:
        try:
            # Check if mount exists
            if any(m.mountPoint == mount for m in current_mounts):
                dbutils.fs.unmount(mount) print(f"Successfully unmounted: {mount}") else:
                print(f"Mount not found: {mount}") except Exception as e:
            print(f"Error unmounting {mount}: {str(e)}")
    
    # Verify removal
    remaining_mounts = [m.mountPoint for m in dbutils.fs.mounts() if '/mnt/' in m.mountPoint]
    if remaining_mounts:
        print(f"Warning: Some mounts still exist: {remaining_mounts}") else:
        print("All mount points successfully removed")
# Execute only after confirmation
if validate_no_mount_usage():
    cleanup_mounts() else:
    print("Cannot remove mounts - some notebooks still use them")

For more details on mount point management, see Databricks documentation.

Key Learnings and Best Practices

1. Handle Case Sensitivity

One critical issue I encountered was case sensitivity in paths. ADLS is case-sensitive, so ensure your paths match exactly:

  • /Bronze/Sales/Orders
  • /bronze/sales/orders

2. Use Unity Catalog Native Commands

Replace dbutils.fs.ls() with Unity Catalog SQL commands:

# Old way
files = dbutils.fs.ls("/mnt/bronze/sales")
# New way
files = spark.sql("LIST 'abfss://container@storage.dfs.core.windows.net/bronze/sales'")

3. Implement Proper Error Handling

Unity Catalog provides better error messages, but proper handling is still crucial:

try:
    df = spark.read.parquet(unity_path) except Exception as e:
    if "PERMISSION_DENIED" in str(e):
        print("Check Unity Catalog permissions") elif "PATH_NOT_FOUND" in str(e):
        print("Verify the external location exists") else:
        raise e

4. Leverage Managed Tables Where Appropriate

As I explored in Part 3, the choice between external and managed tables is crucial:

# External table - you control the location (from Part 3) spark.sql("""
    CREATE TABLE catalog.schema.external_table
    USING DELTA
    LOCATION 'abfss://path/to/data'
""")
# Managed table - Unity Catalog manages storage (from Part 3) df.write.saveAsTable("catalog.schema.managed_table")

5. Monitor Performance

I observed performance improvements after migration:

  • Faster metadata operations: Unity Catalog caches metadata
  • Improved query planning: Better statistics and optimization
  • Reduced authentication overhead: No mount point initialization

Benefits Realized

1. Enhanced Security

  • Fine-grained access control: Different teams have appropriate access levels
  • Audit compliance: Complete audit trail for regulatory requirements
  • Simplified credential management: Azure Managed Identity eliminates credential rotation

2. Operational Excellence

  • Centralized governance: Single place to manage all data assets
  • Better monitoring: Built-in metrics and logging
  • Simplified troubleshooting: Clear error messages and permission denials

3. Improved Collaboration

  • Cross-workspace sharing: Data easily shared across different environments
  • Consistent data discovery: Users can find and understand available datasets
  • Version control: Delta Lake integration provides time travel capabilities

4. Cost Optimization

  • Reduced cluster startup time: No mount point initialization overhead
  • Better resource utilization: Improved query optimization
  • Simplified infrastructure: Fewer components to manage

Common Pitfalls and How to Avoid Them

1. Incomplete Permission Setup

Problem: Users get permission denied errors Solution: Verify permissions at all levels:

-- Check external location permissions
SHOW GRANTS ON EXTERNAL LOCATION bronze_location;
-- Check catalog permissions
SHOW GRANTS ON CATALOG main;
-- Check schema permissions
SHOW GRANTS ON SCHEMA main.bronze;

2. Path Format Issues

Problem: Invalid path formats cause failures Solution: Always use the correct ABFSS format:

abfss://[container]@[storage_account].dfs.core.windows.net/[path]

3. Cluster Configuration

Problem: Cluster not Unity Catalog enabled Solution: Ensure cluster has Unity Catalog enabled in advanced options

4. Mixed Authentication

Problem: Conflicts between mount points and Unity Catalog Solution: Complete migration before removing mounts, avoid mixed usage

Migration Checklist

Pre-Migration

  • Document all existing mount points and their usage
  • Inventory all notebooks, jobs, and pipelines using mount points
  • Design Unity Catalog structure (catalogs, schemas, locations)
  • Verify Unity Catalog is enabled
  • Ensure proper Azure permissions for Managed Identity or Service Principal

Unity Catalog Setup

  • Create storage credentials
  • Create external locations for all paths
  • Grant appropriate permissions
  • Test access with a simple query

Code Migration

  • Update control tables with new ABFSS paths
  • Modify notebook code to use Unity Catalog paths
  • Replace dbutils.fs commands with Spark SQL where appropriate
  • Create Unity Catalog tables (external or managed)
  • Update job configurations

Validation

  • Validate data consistency between old and new paths
  • Test all critical pipelines in parallel mode
  • Verify permissions work as expected
  • Check audit logs are being generated

Mount Point Removal

  • Confirm no active code references mount points
  • Create backup of mount point configuration
  • Remove mount points using dbutils.fs.unmount()
  • Verify mount points are removed
  • Monitor for any errors post-removal

Post-Migration

  • Update documentation and runbooks
  • Train team on Unity Catalog concepts
  • Set up monitoring and alerting
  • Document lessons learned

Conclusion

Migrating from mount points to Unity Catalog represents a significant step forward in data governance and security. While the migration requires careful planning and execution, the benefits far outweigh the effort. Unity Catalog provides a foundation for secure, scalable, and governable data platforms that can grow with your organization’s needs.

The key to successful migration is taking a systematic approach, thoroughly testing each step, and ensuring all stakeholders are aligned. With proper planning and execution, you can achieve a seamless transition that enhances your data platform’s capabilities while maintaining business continuity.

Next Steps

  1. Explore Advanced Features: Leverage row-level security and column masking
  2. Implement Data Quality: Use Unity Catalog’s data quality features
  3. Optimize Performance: Fine-tune external locations and table properties
  4. Expand Governance: Implement comprehensive data classification and lineage

Remember, Unity Catalog is not just a replacement for mount points - it’s a comprehensive governance solution that enables new possibilities for your data platform. Embrace its full potential to build a truly modern lakehouse architecture.

Additional Resources

Official Documentation

Have you migrated to Unity Catalog? Share your experiences and lessons learned in the comments below!