Data set (IBM mainframe)

A data set is typically stored on a direct access storage device (DASD) or magnetic tape,[1] however unit record devices, such as punch card readers, card punches, line printers and page printers can provide input/output (I/O) for a data set (file).

These parameters are specified at the time of the data set allocation (creation), for example with Job Control Language DD statements.

Programmers utilize various access methods (such as QSAM or VSAM) in programs for reading and writing data sets.

Thus data can be of any type, including binary integers, floating-point, or characters, without introducing a false end-of-record condition.

The data set is an abstraction of a collection of records, in contrast to files as unstructured streams of bytes.

This type of data set is often used to hold load modules (old format bound executable programs), source program libraries (especially Assembler macro definitions), ISPF screen definitions, and Job Control Language.

A Partitioned Data Set can only be allocated on a single volume and have a maximum size of 65,535 tracks.

Likewise, if a member is re-written, it is stored in a new spot at the back of the PDS and leaves wasted “dead” space in the middle.

(Note that in modern parlance, this kind of operation might be called defragmentation or garbage collection; data compression nowadays refers to a different, more complicated concept.)

PDS files can only reside on DASD, not on magnetic tape, in order to use the directory structure to access individual members.

An improvement of this scheme is a Partitioned Data Set Extended (PDSE or PDS/E, sometimes just libraries) introduced with DFSMSdfp for MVS/XA and MVS/ESA systems.