Building Reliable Storage Infrastructure: Filesystems, SANs, and Real-World Tactics
Table of Contents - Part 1
- I. Introduction
- II. Storage System Fundamentals
- III. Block Storage and Volumes
- IV. Enterprise SAN Features
- V. Filesystems on SAN
- VI. Filesystem Behavior in Real-World SAN and Cloud Environments
- VII. Practical Filesystem Recommendations by Workload
- VIII. Filesystem Selection for SAN and Cloud Environments
- IX. Performance Tuning for Storage and Filesystems
- X. Storage on Root Devices: Challenges and Best Practices
Introduction
Introduction
In the information age, storage infrastructure lies behind virtually every significant systemâenterprise databases and virtual systems, cloud services and individual backup storesâand yet storage technology remains all too frequently a complex thicket of buzzwords, stale ideas, and increasingly fleeting tools that baffle even veteran system administrators.
This article will distinguish hype from reality. Itâs the definitive no-nonsense guide to todayâs storage systems, relating the fundamentals of SANs, NAS, and DAS; filesystems they export; and the grim reality of deploying and supporting resilient, high-performance storage in the real world.
Weâll consider the strengths and weaknesses of traditional RAID, the revolutionary potential of ZFS and Btrfs, and the frequently chaotic licensing and community relationships that decide what sysadmins are permitted to work with in tools. By the end, youâll get pragmatic guidelines for how to build storage systems to balance data integrity, performance, scalability, and usability in everyday work with systems.
Whether youâre running a homelab, managing a datacenter, or architecting cloud infrastructure, you must know these storage fundamentalsâand the trade-offs that are involvedâso you can be effective. Thatâs storage technology without hype or hard sell.
II. Storage System Fundamentals
Before diving deep into filesystems and redundancy schemes, itâs critical to understand the foundational building blocks of storage technology. Storage isnât just âdisksââitâs a whole ecosystem of devices, protocols, and architectures designed to deliver data reliably, quickly, and at scale.
A. Definitions and Concepts
- Storage Area Network (SAN)
A SAN is a dedicated, high-speed network that provides block-level storage to servers. Unlike traditional file sharing, SANs present raw storage devices (called LUNs) over the network, making remote storage appear as if it were a local hard drive. SANs typically use Fibre Channel or iSCSI protocols and are favored in enterprise environments for their speed, flexibility, and centralized management.
- Network Attached Storage (NAS)
NAS devices share storage over the network at the file level, using protocols like NFS or SMB/CIFS. They are simpler to deploy and manage than SANs, making them popular for smaller environments or where file sharing is the primary need. NAS devices abstract storage as files and folders, which clients mount as network drives.
- Direct Attached Storage (DAS)
DAS refers to storage devices physically connected directly to a server or workstation via SATA, SAS, or USB. It offers high performance and low latency but lacks the flexibility and scalability of networked storage solutions.
B. Physical Components
Storage infrastructure is built from hardware layers that often look deceptively simple but are anything but:
Storage Arrays: Enclosures housing multiple hard drives or SSDs, often configured with RAID for redundancy.
Disk Shelves: Modular units that hold disks and connect to controllers; multiple shelves can be linked to scale capacity.
Controllers: The âbrainsâ managing disk I/O, RAID calculations, caching, and replication.
Networking Hardware: Fibre Channel switches, iSCSI targets, or Ethernet gear that facilitate connectivity between storage and servers.
In datacenters, these components are typically racked, cooled, and powered with enterprise-grade redundancy. Understanding these physical layers helps you appreciate why storage performance and reliability depend on more than just the disks themselves.
III. Block Storage and Volumes
Once you understand the physical makeup of storage, itâs essential to grasp how storage is presented and consumed by serversâespecially in modern SAN environments.
A. What Is a LUN?
A LUN (Logical Unit Number) is essentially a virtual block device carved out of a larger SAN or storage pool. Think of it as a slice of a gigantic storage pie thatâs presented to a server as a raw disk.
The server sees the LUN as a block device (/dev/sdX, for example), not as files or folders.
You can partition, format, and mount a LUN just like a physical local disk.
LUNs enable fine-grained allocation of storage to different servers or VMs without exposing the entire backend storage.
B. Client Access: Block Devices vs File Shares
SANs expose block-level storageâraw disksâthat hosts can use to run filesystems or virtual disks.
NAS exposes file-level storageânetwork shares that clients mount as directories.
Block storage gives you lower latency, higher throughput, and greater flexibility, especially for databases and VM disks.
File shares simplify sharing among many users or systems but come with higher protocol overhead.
C. Cloud Block Storage in Practice
Providers like Linode, AWS, and Azure offer block storage volumes backed by massive SAN or distributed storage arrays.
When you add a block storage volume to your VM, youâre essentially getting a remote LUN.
This enables features like live migration, where your VMâs disk can be accessed from another datacenter seamlessly.
Behind the scenes, data is replicated or accessible across multiple locations to support failover and scaling.
D. Live Migration and Failover Implications
Because the storage backend is networked and shared:
You can move VMs between hosts or datacenters without copying disks manually.
Storage replication ensures data is available even if a datacenter or disk fails.
This architecture forms the foundation of cloud resilience and disaster recovery.
IV. Enterprise SAN Features
A. Redundancy Mechanisms
-
RAID and Erasure Coding
RAID (Redundant Array of Independent Disks):
Traditional RAID levels (0,1,5,6,10) are still foundational. Enterprise SANs typically go beyond simple RAID:RAID 5 & 6 provide parity-based protection for 1 or 2 disk failures, respectively. RAID 10 (striped mirrors) gives better write performance and faster rebuilds but at higher raw capacity cost.
Erasure Coding:
An evolution beyond RAID parity, erasure coding splits data into chunks with additional parity chunks spread across drives or nodes.Provides equivalent or better fault tolerance than RAID 6 but with better space efficiency. Common in software-defined storage systems (like Ceph) and some advanced SAN arrays.
-
Hardware Redundancy
Controllers: Usually dual or more controllers operate in active-passive or active-active mode. If one controller fails, the other immediately takes over without downtime.
Power Supplies & Fans: Hot-swappable, redundant units to prevent outages due to hardware failure.
Cache Modules: Battery-backed cache (BBU) or supercapacitors protect cached writes in power loss events.
Network Paths: Multiple Fibre Channel or Ethernet paths ensure no single point of failure in connectivity.
B. Replication Types
-
Synchronous Replication
How it works: Every write operation is sent to the primary and secondary storage arrays simultaneously. The primary array waits for acknowledgment from the secondary before confirming to the host.
Pros:
Zero data loss (RPO=0) Immediate failover possible
Cons:
Requires very low-latency, high-bandwidth links (often metro distances only) Higher write latency due to waiting on replication
-
Asynchronous Replication
How it works: Writes are committed locally first and then replicated to the secondary site in batches or streams with some delay.
Pros:
Can work over long distances with limited bandwidth Lower write latency at the primary site
Cons:
Potential data loss for writes that havenât replicated (RPO > 0) Failover requires careful handling of last data batches
C. Multi-site Disaster Recovery and Failover
Active-Passive: One site handles all workloads; the other is standby and takes over on failure. Easier to manage but may have failover delays.
Active-Active: Both sites actively handle workloads, often with load balancing and automatic failover. More complex but better resource utilization and availability.
Failover Orchestration: Coordinated by software (like VMware SRM, NetApp SnapMirror management, or custom scripts) to switch DNS, IPs, and storage targets seamlessly.
Testing: Regular disaster recovery drills simulate outages to validate failover plans and timing.
D. Performance Guarantees and Scalability
Guaranteed IOPS: Enterprise SANs reserve or throttle IOPS per volume to meet SLAs for latency-sensitive apps (e.g., databases).
Tiered Storage: Data moves automatically between SSDs (hot tier) and HDDs (cold tier) to balance cost and performance.
Scaling Up: SANs scale horizontally (adding shelves or nodes) or vertically (more disks/controller power) to support petabytes and thousands of hosts.
Quality of Service (QoS): Controls bandwidth and IOPS per host or application to prevent noisy neighbor effects.
E. IBM Z Mainframe Storage Overview
FICON (Fiber Connectivity): Specialized fiber channel protocol optimized for mainframe workloads.
DS8000 Series: High-performance storage arrays designed for IBM Zâoffering multi-controller redundancy, encryption, and advanced caching.
Metro/Global Mirror: IBMâs versions of synchronous/asynchronous replication for mainframe disaster recovery.
Coupling Facility (CF): Hardware for multi-system data sharing and cache coherence in parallel sysplex setups.
Nonstop Operation: Designed for zero downtime, with multiple redundant paths, components, and proactive fault detection.
V. Filesystems on SAN
Understanding how filesystems interact with SAN storage is crucial. While SAN provides block-level access, what happens on top of that block device can make or break your storage strategy.
A. Typical Filesystems Used on SAN LUNs
EXT4 and XFS (Linux):
Widely used, stable, and performant for general-purpose block devices on SAN LUNs.
Pros: Mature, low overhead.
Cons: No built-in checksumming or snapshots.
NTFS (Windows):
Default for Windows servers on SAN volumes.
Supports journaling and permissions.
VMFS (VMware):
Clustered filesystem designed for multiple hosts accessing shared SAN storage simultaneously.
Enables live VM migration and distributed locking.
Clustered Filesystems (GFS2, OCFS2):
Used when multiple hosts need concurrent access to the same SAN LUN at the filesystem level.
Requires complex coordination and fencing mechanisms.
B. ZFS: Features and Position
Filesystem + Volume Manager:
ZFS merges volume management with filesystem, allowing it to manage disks directly rather than relying on an underlying RAID layer.
Data Integrity:
Checksums all data and metadata to detect and self-correct corruption (bit rot), a big advantage over traditional RAID+filesystem combos.
Snapshots and Clones:
Efficient, instantaneous snapshots enable easy backups, rollbacks, and replication.
RAID-Z:
ZFS's software RAID provides parity-based protection with more reliable rebuilds than traditional RAID.
Limitations on SAN:
ZFS isnât inherently cluster-aware, so exposing the same ZFS pool to multiple hosts concurrently (like a SAN would) risks data corruption.
Typical Use:
ZFS runs on local or DAS storage, or as the backend on storage appliances, not usually on SAN LUNs shared by many hosts.
C. Btrfs: The Linux Alternative
Filesystem + Volume Manager:
Like ZFS, Btrfs integrates RAID and filesystem management.
Native RAID Support:
Supports RAID 0,1,5,6,10 at the filesystem level; devices can be added/removed dynamically.
Snapshots and Compression:
Supports snapshots, compression, and checksumming.
Licensing:
GPL licensed, making it easier to integrate into Linux kernels.
Maturity:
Less mature than ZFS; RAID5/6 still experimental. Sometimes seen as a âhobbyistâ or âtraining wheelsâ ZFS.
Use Cases:
Popular for Linux desktops, small NAS devices, and some container storage.
VI. Filesystem Behavior in Real-World SAN and Cloud Environments
A. Traditional Filesystems (EXT4, XFS, NTFS)
Strengths
Proven stability and performance: Battle-tested over decades on SAN and local storage alike.
Low overhead: Minimal CPU and RAM demands, allowing more resources for applications.
Broad compatibility: Supported by virtually every OS and hypervisor.
Challenges
No native data integrity: No checksumming means silent data corruption (bit rot) can go unnoticed.
Limited snapshot/clone support: Snapshots rely on external storage or hypervisor tools.
Scaling in multi-host environments: Require clustered filesystems or shared storage solutions to safely support multiple simultaneous clients.
In Cloud Environments
Typically run on block volumes backed by SAN or software-defined storage.
Cloud vendors rely on underlying storage redundancy; the filesystem is mostly âdumbâ and trusts the block layer.
Live migration of VMs is seamless due to block-level abstraction.
B. ZFS in Practice
Strengths
Data integrity first: Checksumming and self-healing make it unbeatable for critical data.
Snapshots and replication: Easily manage backups and DR with built-in features.
Flexible storage management: Pools, datasets, quotas, and compression give admins fine-grained control.
Challenges
Resource hungry: ZFS needs ample RAM and CPU, especially with dedup or compression.
Not cluster-aware: Sharing a ZFS pool over a SAN to multiple hosts risks data corruption.
Complex setup: Especially on root or boot devices, ZFS requires planning and expertise.
In Cloud Environments
Mostly used on local or direct-attached storage within VMs or hypervisors.
Cloud providers often run ZFS behind the scenes on their storage appliances but donât expose it directly to customers.
Some clouds offer managed ZFS services, but itâs niche.
C. Btrfs in Practice
Strengths
Kernel integration: Btrfs ships with Linux kernels, making deployment easy.
RAID and snapshots: Flexible storage pools with snapshot capabilities.
Growing ecosystem: Used in Fedora Silverblue, openSUSE, and container storage.
Challenges
Maturity concerns: RAID5/6 are still unstable; some bugs persist.
Performance variability: Can be slower than EXT4/XFS in some workloads.
Less tooling: Compared to ZFS, Btrfs has fewer mature management tools.
In Cloud Environments
Popular for container storage backends (e.g., Docker, Kubernetes) due to snapshots and subvolumes.
Used in some NAS solutions and Linux servers that want advanced FS features without external dependencies.
Not commonly used for boot or root on major cloud platforms.
VII. Practical Filesystem Recommendations by Workload
A. General-Purpose Servers (File Servers, Web Servers)
Recommended Filesystems:
EXT4 or XFS (Linux)
NTFS (Windows)
Why:
Stable, mature, and fast for typical read/write patterns.
Low overhead means better CPU availability for applications.
Extensive tooling and community support.
Snapshots and backups usually handled by external tools or application layers.
Caveats:
No built-in corruption detection.
Rely on SAN or storage backend for redundancy.
B. Database Servers and Transactional Workloads
Recommended Filesystems:
ZFS (when available and resource budgets allow)
XFS or EXT4 with proper tuning (Linux)
NTFS (Windows)
Why:
Databases benefit from ZFSâs data integrity guarantees.
ZFS snapshots can assist in consistent backups.
XFS/EXT4 perform well under high IOPS and concurrent writes if tuned correctly.
Caveats:
ZFS requires careful resource planning (RAM, CPU).
Avoid Btrfs RAID5/6 due to stability issues.
C. Virtualization Hosts and VM Storage
Recommended Filesystems:
VMFS (VMware)
ZFS (especially in BSD-based or self-managed environments)
EXT4/XFS on SAN LUNs
Btrfs (in Linux container-focused setups)
Why:
VMFS supports multiple hosts with clustered locking.
ZFS offers excellent snapshots and cloning for VMs.
EXT4/XFS on SAN LUNs offers compatibility and performance.
Btrfs snapshots and subvolumes support container storage flexibility.
Caveats:
Avoid using ZFS pools shared between hosts over SAN.
Use cluster-aware filesystems or storage solutions for multi-host access.
D. Backup and Archival Storage
Recommended Filesystems:
ZFS (due to integrity and snapshots)
Btrfs (for Linux environments with budget constraints)
EXT4/XFS (for simple, large-volume storage)
Why:
Data integrity is paramountâZFS excels here.
Snapshots enable quick restores.
Btrfs provides flexibility if ZFS isnât an option.
Caveats:
Monitor resource usage and schedule regular scrubs.
VIII. Filesystem Selection for SAN and Cloud Environments
A. Considerations When Choosing Filesystems
-
Data Integrity Needs = High for databases, critical storage; prefer ZFS.
-
Resource Availability = ZFS requires RAM/CPU; simpler FS preferred on constrained systems.
-
Multi-host Access = Use clustered filesystems or SAN features; avoid shared ZFS pools.
-
Licensing and Support = Btrfs easier on Linux; ZFS licensing restricts kernel integration.
-
Snapshots and Backup = Native snapshots preferred for ease and speed.
-
Performance Characteristics = RAID10-style mirroring for write-heavy workloads; RAIDZ for capacity-efficient redundancy.
-
Cloud Provider Support = Many cloud VMs use EXT4/XFS on block storage; managed ZFS less common.
B. Filesystem Recommendations for SAN
-
Single-host SAN LUN = EXT4/XFS for general use; ZFS if local control
-
Multi-host SAN LUN = Clustered FS (GFS2, VMFS) or NAS-style storage
-
Performance-sensitive workloads = ZFS with RAID10 or RAIDZ2 local; hardware RAID SAN
-
Legacy environments = Hardware RAID + EXT4/NTFS
C. Filesystem Recommendations for Cloud
-
General VM storage = EXT4/XFS on block volumes (default)
-
Container storage = Btrfs or overlayFS for snapshots and layering
-
Self-managed VMs with ZFS support = ZFS on local or attached storage (manual setup)
-
Managed Storage Services Vendor-managed solutions = (EBS, Azure Disks, etc.)
D. Practical Tips for Deployment
Test performance and stability before production rollout.
Keep backups and snapshots regular regardless of FS choice.
Understand your workloadâs I/O patterns to select optimal RAID and filesystem combos.
Use monitoring tools to watch filesystem health and storage latency.
Document configurations and recovery procedures.
IX. Performance Tuning for Storage and Filesystems
A. Storage Hardware Level
-
Disk Selection and Configuration
SSD vs HDD:
SSDs offer vastly superior IOPS and latency, essential for transactional workloads. HDDs still cost-effective for large capacity and sequential workloads.
Drive Speed and Interface:
Faster spindle speeds (15K RPM) and interfaces (SAS vs SATA) improve performance. NVMe drives deliver the best latency and throughput.
RAID Levels:
RAID 10 preferred for write-heavy or mixed workloads for performance and resilience. RAID-Z2 (ZFS) or RAID 6 (traditional) optimize capacity but have slower writes.
-
Controller and Cache Tuning
Write-back vs Write-through Cache:
Write-back improves performance but requires battery-backed cache for safety.Cache Size and Policies:
Larger cache buffers reduce disk I/O; tuning cache algorithms can impact latency.Queue Depth:
Increasing queue depth on controllers and hosts can improve throughput but risks latency spikes.
B. Filesystem Level
-
Mount Options and Parameters
EXT4:
noatime disables access time updates, reducing write overhead. data=writeback or data=ordered impact journaling behavior and performance.
XFS:
Use logbufs and logbsize tuning to optimize journaling. inode64 for large filesystems with many files.
ZFS:
Tune record size based on workload (e.g., 8K for databases, 128K for media). Adjust ARC cache size via zfs_arc_max. Use L2ARC (secondary read cache) and SLOG (separate intent log) for improved read and sync write performance.
Btrfs:
Mount options like compress=zstd improve space and sometimes I/O. Balancing data and metadata profiles according to workload.
-
RAID and Pool Layout
ZFS Pool Design:
Use mirrors for lower latency and faster resilver. RAIDZ2 for capacity with decent protection but slower writes.
Stripe Width and Vdev Size:
Larger stripes improve sequential throughput. Smaller vdevs reduce rebuild times.
Avoid mixing vdev types in one pool to maintain predictable performance.
C. Host and OS Tuning
I/O Scheduler:
Use deadline or noop for SSD-backed storage to reduce latency.
Multipath I/O:
Configure multipathing for redundant SAN paths, balancing load and increasing fault tolerance.
Network Settings (for iSCSI SANs):
Jumbo frames, TCP window scaling, and offloading features optimize throughput.
D. Monitoring and Benchmarking
Tools:
fio for synthetic I/O benchmarking.
iostat, dstat, and zpool iostat for real-time monitoring.
zfs-stats and Btrfs tools for filesystem-specific metrics.
Analyze latency, throughput, and CPU usage to identify bottlenecks.
Benchmark with real workloads for best tuning results.
E. Practical Example: ZFS Tuning for Database Workload
Set recordsize=8K for optimal database block alignment.
Limit ARC cache size to avoid starving other processes.
Use dedicated SLOG device (fast SSD with power loss protection) to accelerate sync writes.
Schedule regular scrubs during off-hours.
X. Storage on Root Devices: Challenges and Best Practices
A. Why Root Storage Is Special
The root filesystem is critical: if itâs not reliable or performant, your entire systemâs stability is at risk.
Bootloaders and initramfs must understand your filesystem and RAID setup.
Recovery from root failures is trickier than data partitionsâless room for error.
B. Filesystem Options for Root
GEOM mirror (FreeBSD):
Simple mirrored boot drives with minimal overhead and straightforward bootloader support.
ZFS on Root:
Full data integrity and advanced features but requires careful setup, enough RAM, and compatible bootloader.
Btrfs on Root (Linux):
Provides snapshots and compression; bootloader and kernel support have improved but still some complexity.
Traditional RAID + EXT4/XFS:
Simple, reliable, and widely supported, though lacking advanced integrity features.
C. Bootloader and Recovery Considerations
Must install bootloader on all mirrored or RAID devices to avoid single points of failure.
GRUB and FreeBSD bootloader have differing levels of support for advanced filesystems.
Recovery tools vary by filesystem; ZFS and Btrfs offer specialized commands.
D. Hybrid Setups
Common to use GEOM mirror or hardware RAID for boot, and ZFS or Btrfs for data storage.
Keeps boot process simple and robust, while allowing advanced features on data volumes.
E. Testing and Validation
Practice simulated drive failure and recovery on root disks.
Ensure backups and snapshots before making root filesystem changes.
Validate bootloader configuration after changes.