Lustre (file system)

2.15.5 (latest maintenance release), [3] Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing.

[14][15] This makes Lustre file systems a popular choice for businesses with large data centers, including those in industries such as meteorology,[16][17] simulation, artificial intelligence and machine learning,[18][19] oil and gas,[20] life science,[21][22] rich media, and finance.

[24][25][26] The Lustre file system architecture was started as a research project in 1999 by Peter J. Braam, who was a staff of Carnegie Mellon University (CMU) at the time.

[28] Lustre was developed under the Accelerated Strategic Computing Initiative Path Forward project funded by the United States Department of Energy, which included Hewlett-Packard and Intel.

Braam and several associates joined the hardware-oriented Xyratex when it acquired the assets of ClusterStor,[34][35] while Barton, Dilger, and others formed software startup Whamcloud, where they continued to work on Lustre.

In February 2013, Xyratex Ltd., announced it acquired the original Lustre trademark, logo, website and associated intellectual property from Oracle.

[47] Lustre file system was first installed for production use in March 2003 on the MCR Linux Cluster at the Lawrence Livermore National Laboratory,[48] the third-largest supercomputer in the Top500 list at the time.

Lustre 1.2.0, released in March 2004, worked on Linux kernel 2.6, and had a "size glimpse" feature to avoid lock revocation on files undergoing write, and client side data write-back cache accounting (grant).

Lustre 1.4.0, released in November 2004, provided protocol compatibility between versions, could use InfiniBand networks, and could exploit extents/mballoc in the ldiskfs on-disk filesystem.

[50] Lustre 2.0, released in August 2010, was based on significant internally restructured code to prepare for major architectural advancements.

Lustre 2.3, released in October 2012, continued to improve the metadata server code to remove internal locking bottlenecks on nodes with many CPU cores (over 16).

The LFSCK feature added the ability to scan and verify the internal consistency of the MDT FID and LinkEA attributes.

Lustre 2.7, released in March 2015,[64] added LFSCK functionality to verify DNE consistency of remote and striped directories between multiple MDTs.

A new evaluation feature was added for UID/GID mapping for clients with different administrative domains, along with improvements to the DNE striped directory functionality.

The LNet Multi-Rail (LMR) feature allows bonding multiple network interfaces (InfiniBand, Omni-Path, and/or Ethernet) on a client and server to increase aggregate I/O bandwidth.

Lustre 2.13 was released on December 5, 2019[72] and added a new performance-related features Persistent Client Cache[73] (PCC), which allows direct use of NVMe and NVRAM storage on the client nodes while keeping the files part of the global filesystem namespace, and OST Overstriping[74] which allows files to store multiple stripes on a single OST to better utilize fast OSS hardware.

Lustre will take advantage of remote direct memory access (RDMA) transfers, when available, to improve throughput and reduce CPU usage.

Since Lustre 2.4, the MDT and OST can also use ZFS for the backing filesystem in addition to ext4, allowing them to effectively use JBOD storage instead of hardware RAID devices.

This allows Lustre to take advantage of improvements and features in the underlying filesystem, such as compression and data checksums in ZFS.

An OST is a dedicated filesystem that exports an interface to byte ranges of file objects for read/write operations, with extent locks to protect data consistency.

MDTs and OSTs currently use either an enhanced version of ext4 called ldiskfs, or ZFS/DMU for back-end data storage to store files/objects[85] using the open source ZFS-on-Linux port.

With this approach, bottlenecks for client-to-OSS communications are eliminated, so the total bandwidth available for the clients to read and write data scales almost linearly with the number of OSTs in the filesystem.

Using liblustre, the computational processors could access a Lustre file system even if the service node on which the job was launched is not a Linux client.

This allows the client(s) to perform I/O in parallel across all of the OST objects in the file without further communication with the MDS, avoiding contention from centralized block and lock management.

The Lustre 2.11 release also added the Data-on-Metadata (DoM) feature, which allows the first component of a PFL file to be stored directly on the MDT with the inode.

When a client initially mounts a filesystem, it is provided the 128-bit Lustre File Identifier (FID, composed of the 64-bit Sequence number, 32-bit Object ID, and 32-bit Version) of the root directory for the mountpoint.

[90] The Lustre distributed lock manager (LDLM), implemented in the OpenVMS style, protects the integrity of each file's data and metadata.

Since Lustre 2.10 the LNet Multi-Rail (MR) feature[91] allows link aggregation of two or more network interfaces between a client and server to improve bandwidth.

[93] Lustre file system high availability features include a robust failover and recovery mechanism, making server failures and reboots transparent.

[106] Vendors selling storage hardware with bundled Lustre support include Hitachi Data Systems (2012),[107] DataDirect Networks (DDN),[108] Aeon Computing, and others.