Building Reliable Storage Infrastructure: Filesystems, SANs, and Real-World Tactics Part 1

Building Reliable Storage Infrastructure: Filesystems, SANs, and Real-World Tactics

Table of Contents - Part 1

Introduction

Introduction

In the information age, storage infrastructure lies behind virtually every significant system—enterprise databases and virtual systems, cloud services and individual backup stores—and yet storage technology remains all too frequently a complex thicket of buzzwords, stale ideas, and increasingly fleeting tools that baffle even veteran system administrators.

This article will distinguish hype from reality. It’s the definitive no-nonsense guide to today’s storage systems, relating the fundamentals of SANs, NAS, and DAS; filesystems they export; and the grim reality of deploying and supporting resilient, high-performance storage in the real world.

We’ll consider the strengths and weaknesses of traditional RAID, the revolutionary potential of ZFS and Btrfs, and the frequently chaotic licensing and community relationships that decide what sysadmins are permitted to work with in tools. By the end, you’ll get pragmatic guidelines for how to build storage systems to balance data integrity, performance, scalability, and usability in everyday work with systems.

Whether you’re running a homelab, managing a datacenter, or architecting cloud infrastructure, you must know these storage fundamentals—and the trade-offs that are involved—so you can be effective. That’s storage technology without hype or hard sell.

II. Storage System Fundamentals

Before diving deep into filesystems and redundancy schemes, it’s critical to understand the foundational building blocks of storage technology. Storage isn’t just “disks”—it’s a whole ecosystem of devices, protocols, and architectures designed to deliver data reliably, quickly, and at scale.

A. Definitions and Concepts

  1. Storage Area Network (SAN)

A SAN is a dedicated, high-speed network that provides block-level storage to servers. Unlike traditional file sharing, SANs present raw storage devices (called LUNs) over the network, making remote storage appear as if it were a local hard drive. SANs typically use Fibre Channel or iSCSI protocols and are favored in enterprise environments for their speed, flexibility, and centralized management.

  1. Network Attached Storage (NAS)

NAS devices share storage over the network at the file level, using protocols like NFS or SMB/CIFS. They are simpler to deploy and manage than SANs, making them popular for smaller environments or where file sharing is the primary need. NAS devices abstract storage as files and folders, which clients mount as network drives.

  1. Direct Attached Storage (DAS)

DAS refers to storage devices physically connected directly to a server or workstation via SATA, SAS, or USB. It offers high performance and low latency but lacks the flexibility and scalability of networked storage solutions.
B. Physical Components

Storage infrastructure is built from hardware layers that often look deceptively simple but are anything but:

Storage Arrays: Enclosures housing multiple hard drives or SSDs, often configured with RAID for redundancy.

Disk Shelves: Modular units that hold disks and connect to controllers; multiple shelves can be linked to scale capacity.

Controllers: The “brains” managing disk I/O, RAID calculations, caching, and replication.

Networking Hardware: Fibre Channel switches, iSCSI targets, or Ethernet gear that facilitate connectivity between storage and servers.

In datacenters, these components are typically racked, cooled, and powered with enterprise-grade redundancy. Understanding these physical layers helps you appreciate why storage performance and reliability depend on more than just the disks themselves.

III. Block Storage and Volumes

Once you understand the physical makeup of storage, it’s essential to grasp how storage is presented and consumed by servers—especially in modern SAN environments.

A. What Is a LUN?

A LUN (Logical Unit Number) is essentially a virtual block device carved out of a larger SAN or storage pool. Think of it as a slice of a gigantic storage pie that’s presented to a server as a raw disk.

The server sees the LUN as a block device (/dev/sdX, for example), not as files or folders.

You can partition, format, and mount a LUN just like a physical local disk.

LUNs enable fine-grained allocation of storage to different servers or VMs without exposing the entire backend storage.

B. Client Access: Block Devices vs File Shares

SANs expose block-level storage—raw disks—that hosts can use to run filesystems or virtual disks.

NAS exposes file-level storage—network shares that clients mount as directories.

Block storage gives you lower latency, higher throughput, and greater flexibility, especially for databases and VM disks.

File shares simplify sharing among many users or systems but come with higher protocol overhead.

C. Cloud Block Storage in Practice

Providers like Linode, AWS, and Azure offer block storage volumes backed by massive SAN or distributed storage arrays.

When you add a block storage volume to your VM, you’re essentially getting a remote LUN.

This enables features like live migration, where your VM’s disk can be accessed from another datacenter seamlessly.

Behind the scenes, data is replicated or accessible across multiple locations to support failover and scaling.

D. Live Migration and Failover Implications

Because the storage backend is networked and shared:

You can move VMs between hosts or datacenters without copying disks manually.

Storage replication ensures data is available even if a datacenter or disk fails.

This architecture forms the foundation of cloud resilience and disaster recovery.

IV. Enterprise SAN Features

A. Redundancy Mechanisms

  1. RAID and Erasure Coding

    RAID (Redundant Array of Independent Disks):
    Traditional RAID levels (0,1,5,6,10) are still foundational. Enterprise SANs typically go beyond simple RAID:

     RAID 5 & 6 provide parity-based protection for 1 or 2 disk failures, respectively.
    
     RAID 10 (striped mirrors) gives better write performance and faster rebuilds but at higher raw capacity cost.
    

    Erasure Coding:
    An evolution beyond RAID parity, erasure coding splits data into chunks with additional parity chunks spread across drives or nodes.

     Provides equivalent or better fault tolerance than RAID 6 but with better space efficiency.
    
     Common in software-defined storage systems (like Ceph) and some advanced SAN arrays.
    
  2. Hardware Redundancy

    Controllers: Usually dual or more controllers operate in active-passive or active-active mode. If one controller fails, the other immediately takes over without downtime.

    Power Supplies & Fans: Hot-swappable, redundant units to prevent outages due to hardware failure.

    Cache Modules: Battery-backed cache (BBU) or supercapacitors protect cached writes in power loss events.

    Network Paths: Multiple Fibre Channel or Ethernet paths ensure no single point of failure in connectivity.

B. Replication Types

  1. Synchronous Replication

    How it works: Every write operation is sent to the primary and secondary storage arrays simultaneously. The primary array waits for acknowledgment from the secondary before confirming to the host.

    Pros:

     Zero data loss (RPO=0)
    
     Immediate failover possible
    

    Cons:

     Requires very low-latency, high-bandwidth links (often metro distances only)
    
     Higher write latency due to waiting on replication
    
  2. Asynchronous Replication

    How it works: Writes are committed locally first and then replicated to the secondary site in batches or streams with some delay.

    Pros:

     Can work over long distances with limited bandwidth
    
     Lower write latency at the primary site
    

    Cons:

     Potential data loss for writes that haven’t replicated (RPO > 0)
    
     Failover requires careful handling of last data batches
    

C. Multi-site Disaster Recovery and Failover

Active-Passive: One site handles all workloads; the other is standby and takes over on failure. Easier to manage but may have failover delays.

Active-Active: Both sites actively handle workloads, often with load balancing and automatic failover. More complex but better resource utilization and availability.

Failover Orchestration: Coordinated by software (like VMware SRM, NetApp SnapMirror management, or custom scripts) to switch DNS, IPs, and storage targets seamlessly.

Testing: Regular disaster recovery drills simulate outages to validate failover plans and timing.

D. Performance Guarantees and Scalability

Guaranteed IOPS: Enterprise SANs reserve or throttle IOPS per volume to meet SLAs for latency-sensitive apps (e.g., databases).

Tiered Storage: Data moves automatically between SSDs (hot tier) and HDDs (cold tier) to balance cost and performance.

Scaling Up: SANs scale horizontally (adding shelves or nodes) or vertically (more disks/controller power) to support petabytes and thousands of hosts.

Quality of Service (QoS): Controls bandwidth and IOPS per host or application to prevent noisy neighbor effects.

E. IBM Z Mainframe Storage Overview

FICON (Fiber Connectivity): Specialized fiber channel protocol optimized for mainframe workloads.

DS8000 Series: High-performance storage arrays designed for IBM Z—offering multi-controller redundancy, encryption, and advanced caching.

Metro/Global Mirror: IBM’s versions of synchronous/asynchronous replication for mainframe disaster recovery.

Coupling Facility (CF): Hardware for multi-system data sharing and cache coherence in parallel sysplex setups.

Nonstop Operation: Designed for zero downtime, with multiple redundant paths, components, and proactive fault detection.

V. Filesystems on SAN

Understanding how filesystems interact with SAN storage is crucial. While SAN provides block-level access, what happens on top of that block device can make or break your storage strategy.
A. Typical Filesystems Used on SAN LUNs

EXT4 and XFS (Linux):
Widely used, stable, and performant for general-purpose block devices on SAN LUNs.
Pros: Mature, low overhead.
Cons: No built-in checksumming or snapshots.

NTFS (Windows):
Default for Windows servers on SAN volumes.
Supports journaling and permissions.

VMFS (VMware):
Clustered filesystem designed for multiple hosts accessing shared SAN storage simultaneously.
Enables live VM migration and distributed locking.

Clustered Filesystems (GFS2, OCFS2):
Used when multiple hosts need concurrent access to the same SAN LUN at the filesystem level.
Requires complex coordination and fencing mechanisms.

B. ZFS: Features and Position

Filesystem + Volume Manager:
ZFS merges volume management with filesystem, allowing it to manage disks directly rather than relying on an underlying RAID layer.

Data Integrity:
Checksums all data and metadata to detect and self-correct corruption (bit rot), a big advantage over traditional RAID+filesystem combos.

Snapshots and Clones:
Efficient, instantaneous snapshots enable easy backups, rollbacks, and replication.

RAID-Z:
ZFS's software RAID provides parity-based protection with more reliable rebuilds than traditional RAID.

Limitations on SAN:
ZFS isn’t inherently cluster-aware, so exposing the same ZFS pool to multiple hosts concurrently (like a SAN would) risks data corruption.

Typical Use:
ZFS runs on local or DAS storage, or as the backend on storage appliances, not usually on SAN LUNs shared by many hosts.

C. Btrfs: The Linux Alternative

Filesystem + Volume Manager:
Like ZFS, Btrfs integrates RAID and filesystem management.

Native RAID Support:
Supports RAID 0,1,5,6,10 at the filesystem level; devices can be added/removed dynamically.

Snapshots and Compression:
Supports snapshots, compression, and checksumming.

Licensing:
GPL licensed, making it easier to integrate into Linux kernels.

Maturity:
Less mature than ZFS; RAID5/6 still experimental. Sometimes seen as a “hobbyist” or “training wheels” ZFS.

Use Cases:
Popular for Linux desktops, small NAS devices, and some container storage.

VI. Filesystem Behavior in Real-World SAN and Cloud Environments

A. Traditional Filesystems (EXT4, XFS, NTFS)

Strengths

Proven stability and performance: Battle-tested over decades on SAN and local storage alike.

Low overhead: Minimal CPU and RAM demands, allowing more resources for applications.

Broad compatibility: Supported by virtually every OS and hypervisor.

Challenges

No native data integrity: No checksumming means silent data corruption (bit rot) can go unnoticed.

Limited snapshot/clone support: Snapshots rely on external storage or hypervisor tools.

Scaling in multi-host environments: Require clustered filesystems or shared storage solutions to safely support multiple simultaneous clients.

In Cloud Environments

Typically run on block volumes backed by SAN or software-defined storage.

Cloud vendors rely on underlying storage redundancy; the filesystem is mostly “dumb” and trusts the block layer.

Live migration of VMs is seamless due to block-level abstraction.

B. ZFS in Practice

Strengths

Data integrity first: Checksumming and self-healing make it unbeatable for critical data.

Snapshots and replication: Easily manage backups and DR with built-in features.

Flexible storage management: Pools, datasets, quotas, and compression give admins fine-grained control.

Challenges

Resource hungry: ZFS needs ample RAM and CPU, especially with dedup or compression.

Not cluster-aware: Sharing a ZFS pool over a SAN to multiple hosts risks data corruption.

Complex setup: Especially on root or boot devices, ZFS requires planning and expertise.

In Cloud Environments

Mostly used on local or direct-attached storage within VMs or hypervisors.

Cloud providers often run ZFS behind the scenes on their storage appliances but don’t expose it directly to customers.

Some clouds offer managed ZFS services, but it’s niche.

C. Btrfs in Practice

Strengths

Kernel integration: Btrfs ships with Linux kernels, making deployment easy.

RAID and snapshots: Flexible storage pools with snapshot capabilities.

Growing ecosystem: Used in Fedora Silverblue, openSUSE, and container storage.

Challenges

Maturity concerns: RAID5/6 are still unstable; some bugs persist.

Performance variability: Can be slower than EXT4/XFS in some workloads.

Less tooling: Compared to ZFS, Btrfs has fewer mature management tools.

In Cloud Environments

Popular for container storage backends (e.g., Docker, Kubernetes) due to snapshots and subvolumes.

Used in some NAS solutions and Linux servers that want advanced FS features without external dependencies.

Not commonly used for boot or root on major cloud platforms.

VII. Practical Filesystem Recommendations by Workload

A. General-Purpose Servers (File Servers, Web Servers)

Recommended Filesystems:

EXT4 or XFS (Linux)

NTFS (Windows)

Why:

Stable, mature, and fast for typical read/write patterns.

Low overhead means better CPU availability for applications.

Extensive tooling and community support.

Snapshots and backups usually handled by external tools or application layers.

Caveats:

No built-in corruption detection.

Rely on SAN or storage backend for redundancy.

B. Database Servers and Transactional Workloads

Recommended Filesystems:

ZFS (when available and resource budgets allow)

XFS or EXT4 with proper tuning (Linux)

NTFS (Windows)

Why:

Databases benefit from ZFS’s data integrity guarantees.

ZFS snapshots can assist in consistent backups.

XFS/EXT4 perform well under high IOPS and concurrent writes if tuned correctly.

Caveats:

ZFS requires careful resource planning (RAM, CPU).

Avoid Btrfs RAID5/6 due to stability issues.

C. Virtualization Hosts and VM Storage

Recommended Filesystems:

VMFS (VMware)

ZFS (especially in BSD-based or self-managed environments)

EXT4/XFS on SAN LUNs

Btrfs (in Linux container-focused setups)

Why:

VMFS supports multiple hosts with clustered locking.

ZFS offers excellent snapshots and cloning for VMs.

EXT4/XFS on SAN LUNs offers compatibility and performance.

Btrfs snapshots and subvolumes support container storage flexibility.

Caveats:

Avoid using ZFS pools shared between hosts over SAN.

Use cluster-aware filesystems or storage solutions for multi-host access.

D. Backup and Archival Storage

Recommended Filesystems:

ZFS (due to integrity and snapshots)

Btrfs (for Linux environments with budget constraints)

EXT4/XFS (for simple, large-volume storage)

Why:

Data integrity is paramount—ZFS excels here.

Snapshots enable quick restores.

Btrfs provides flexibility if ZFS isn’t an option.

Caveats:

Monitor resource usage and schedule regular scrubs.

VIII. Filesystem Selection for SAN and Cloud Environments

A. Considerations When Choosing Filesystems

  1. Data Integrity Needs = High for databases, critical storage; prefer ZFS.

  2. Resource Availability = ZFS requires RAM/CPU; simpler FS preferred on constrained systems.

  3. Multi-host Access = Use clustered filesystems or SAN features; avoid shared ZFS pools.

  4. Licensing and Support = Btrfs easier on Linux; ZFS licensing restricts kernel integration.

  5. Snapshots and Backup = Native snapshots preferred for ease and speed.

  6. Performance Characteristics = RAID10-style mirroring for write-heavy workloads; RAIDZ for capacity-efficient redundancy.

  7. Cloud Provider Support = Many cloud VMs use EXT4/XFS on block storage; managed ZFS less common.

B. Filesystem Recommendations for SAN

  1. Single-host SAN LUN = EXT4/XFS for general use; ZFS if local control

  2. Multi-host SAN LUN = Clustered FS (GFS2, VMFS) or NAS-style storage

  3. Performance-sensitive workloads = ZFS with RAID10 or RAIDZ2 local; hardware RAID SAN

  4. Legacy environments = Hardware RAID + EXT4/NTFS

C. Filesystem Recommendations for Cloud

  1. General VM storage = EXT4/XFS on block volumes (default)

  2. Container storage = Btrfs or overlayFS for snapshots and layering

  3. Self-managed VMs with ZFS support = ZFS on local or attached storage (manual setup)

  4. Managed Storage Services Vendor-managed solutions = (EBS, Azure Disks, etc.)

D. Practical Tips for Deployment

Test performance and stability before production rollout.

Keep backups and snapshots regular regardless of FS choice.

Understand your workload’s I/O patterns to select optimal RAID and filesystem combos.

Use monitoring tools to watch filesystem health and storage latency.

Document configurations and recovery procedures.

IX. Performance Tuning for Storage and Filesystems

A. Storage Hardware Level

  1. Disk Selection and Configuration

    SSD vs HDD:

    SSDs offer vastly superior IOPS and latency, essential for transactional workloads.
    
    HDDs still cost-effective for large capacity and sequential workloads.
    

    Drive Speed and Interface:

    Faster spindle speeds (15K RPM) and interfaces (SAS vs SATA) improve performance.
    
    NVMe drives deliver the best latency and throughput.
    

    RAID Levels:

    RAID 10 preferred for write-heavy or mixed workloads for performance and resilience.
    
    RAID-Z2 (ZFS) or RAID 6 (traditional) optimize capacity but have slower writes.
    
  2. Controller and Cache Tuning

    Write-back vs Write-through Cache:
    Write-back improves performance but requires battery-backed cache for safety.

    Cache Size and Policies:
    Larger cache buffers reduce disk I/O; tuning cache algorithms can impact latency.

    Queue Depth:
    Increasing queue depth on controllers and hosts can improve throughput but risks latency spikes.

B. Filesystem Level

  1. Mount Options and Parameters

    EXT4:

     noatime disables access time updates, reducing write overhead.
    
     data=writeback or data=ordered impact journaling behavior and performance.
    

    XFS:

     Use logbufs and logbsize tuning to optimize journaling.
    
     inode64 for large filesystems with many files.
    

    ZFS:

     Tune record size based on workload (e.g., 8K for databases, 128K for media).
    
     Adjust ARC cache size via zfs_arc_max.
    
     Use L2ARC (secondary read cache) and SLOG (separate intent log) for improved read and sync write performance.
    

    Btrfs:

     Mount options like compress=zstd improve space and sometimes I/O.
    
     Balancing data and metadata profiles according to workload.
    
  2. RAID and Pool Layout

    ZFS Pool Design:

     Use mirrors for lower latency and faster resilver.
    
     RAIDZ2 for capacity with decent protection but slower writes.
    

    Stripe Width and Vdev Size:

     Larger stripes improve sequential throughput.
    
     Smaller vdevs reduce rebuild times.
    

    Avoid mixing vdev types in one pool to maintain predictable performance.

C. Host and OS Tuning

I/O Scheduler:

    Use deadline or noop for SSD-backed storage to reduce latency.

Multipath I/O:

    Configure multipathing for redundant SAN paths, balancing load and increasing fault tolerance.

Network Settings (for iSCSI SANs):

    Jumbo frames, TCP window scaling, and offloading features optimize throughput.

D. Monitoring and Benchmarking

Tools:

    fio for synthetic I/O benchmarking.

    iostat, dstat, and zpool iostat for real-time monitoring.

    zfs-stats and Btrfs tools for filesystem-specific metrics.

Analyze latency, throughput, and CPU usage to identify bottlenecks.

Benchmark with real workloads for best tuning results.

E. Practical Example: ZFS Tuning for Database Workload

Set recordsize=8K for optimal database block alignment.

Limit ARC cache size to avoid starving other processes.

Use dedicated SLOG device (fast SSD with power loss protection) to accelerate sync writes.

Schedule regular scrubs during off-hours.

X. Storage on Root Devices: Challenges and Best Practices

A. Why Root Storage Is Special

The root filesystem is critical: if it’s not reliable or performant, your entire system’s stability is at risk.

Bootloaders and initramfs must understand your filesystem and RAID setup.

Recovery from root failures is trickier than data partitions—less room for error.

B. Filesystem Options for Root

GEOM mirror (FreeBSD):
Simple mirrored boot drives with minimal overhead and straightforward bootloader support.

ZFS on Root:
Full data integrity and advanced features but requires careful setup, enough RAM, and compatible bootloader.

Btrfs on Root (Linux):
Provides snapshots and compression; bootloader and kernel support have improved but still some complexity.

Traditional RAID + EXT4/XFS:
Simple, reliable, and widely supported, though lacking advanced integrity features.

C. Bootloader and Recovery Considerations

Must install bootloader on all mirrored or RAID devices to avoid single points of failure.

GRUB and FreeBSD bootloader have differing levels of support for advanced filesystems.

Recovery tools vary by filesystem; ZFS and Btrfs offer specialized commands.

D. Hybrid Setups

Common to use GEOM mirror or hardware RAID for boot, and ZFS or Btrfs for data storage.

Keeps boot process simple and robust, while allowing advanced features on data volumes.

E. Testing and Validation

Practice simulated drive failure and recovery on root disks.

Ensure backups and snapshots before making root filesystem changes.

Validate bootloader configuration after changes.
8 Likes

WOW, This is a lot of info!!!

2 Likes

Thank you. Home users have trouble keeping up with all the hype about filesystems.
It is clear that there is no one-solution.
I wonder if there are too many possibilities.

1 Like

Very well written - will take me a while to digest…

Horses for courses…

I’ve worked on a bunch of different sites, migrated UNIX systems (Solaris) from one SAN vendor to another - i.e. usually involves installing or utilising a secondary HBA (Host Bus Adaptor) - usually fibre optic… The thing I kinda hate about SAN stuff is fiddling with the switching “fabric”…

At one site - 2008-2011 - I did THREE SAN migrations! 100% Solaris UNIX…

I’ve also done a fair bit of SAN and NAS admin directly - e.g. EMC Clariion (did their week long training) and EMC Symmetrix, Hitachi HDS, Sun/Oracle/StorageTek SAN, and also NetApp and Oracle ZFS NAS, and also Hitachi’s NAS solution, and Nutanix AFS…

I much prefer the simplicity of NAS solutions…

Some NAS vendors also provide block level, e.g. you can do iSCSI on NetApp…

Block storage - in my experience - is much more labour intensive and makes capacity planning harder - e.g. it’s not easy to increase the size of a LUN being used by a SAN client - but its trivial to grow a network share filesystem - and usually dynamic…

And micromanaging the zoning on SAN fibre switches is 'scuse my French a headf–k! Relatively easy when you’ve got 5 SAN clients - nightmare fuel when you’ve got 30 or 40! Some platforms take some of the heavy lifting out of it - e.g. growing a VMware datastore on a SAN is usually pretty straightforward…

2 Likes

woha this is so cool! Thank you very much!

Cat Love GIF by Mochimons

2 Likes

My humble experience says that if anything requires ‘management’ then it was not designed properly in the first place.
How is it that these large data farms require so much input. I dont see that there is any more to data management than writing the stuff somewhere and leaving it there. What am I missing?

2 Likes