File format

File formats often have a published specification describing the encoding method and enabling testing of program intended functionality.

Both strategies require significant time, money, or both; therefore, file formats with publicly available specifications tend to be supported by more programs.

This has resulted in a significant decrease in the use of GIFs, and is partly responsible for the development of the alternative PNG format.

Most modern operating systems and individual applications need to use all of the following approaches to read "foreign" file formats, if not work with them completely.

One popular method used by many operating systems, including Windows, macOS, CP/M, DOS, VMS, and VM/CMS, is to determine the format of a file based on the end of its name, more specifically the letters following the final period.

Many formats still use three-character extensions even though modern operating systems and application programs no longer have this limitation.

This led most versions of Windows and Mac OS to hide the extension when listing files.

The ".exe" would be hidden and an unsuspecting user would see "Holiday photo.jpg", which would appear to be a JPEG image, usually unable to harm the machine.

Since Word generally ignores extensions and looks at the format of the file, these would open as templates, execute, and spread the virus.

[citation needed] This represents a practical problem for Windows systems where extension-hiding is turned on by default.

A folder containing many files with complex metadata such as thumbnail information may require considerable time before it can be displayed.

This can result in corrupt metadata which, in extremely bad cases, might even render the file unreadable.

The magic number approach offers better guarantees that the format will be identified correctly, and can often determine more precise information about the file.

Also, data must be read from the file itself, increasing latency as opposed to metadata stored in the directory.

Where file types do not lend themselves to recognition in this way, the system must fall back to metadata.

On the other hand, a valid magic number does not guarantee that the file is not corrupt or is of a correct type.

So-called shebang lines in script files are a special case of magic numbers.

There, the magic number consists of human-readable text within the file that identifies a specific command interpreter and options to be passed to it.

Another operating system using magic numbers is AmigaOS, where magic numbers were called "Magic Cookies" and were adopted as a standard system to recognize executables in Hunk executable file format and also to let single programs, tools and utilities deal automatically with their saved data files, or any other kind of file types when saving and loading data.

While this is also true to an extent with filename extensions— for instance, for compatibility with MS-DOS's three character limit— most forms of storage have a roughly equivalent definition of a file's data and name, but may have varying or no representation of further metadata.

The BBEdit text editor has a creator code of R*ch referring to its original programmer, Rich Siegel.

On Unix and Unix-like systems, the ext2, ext3, ext4, ReiserFS version 3, XFS, JFS, FFS, and HFS+ filesystems allow the storage of extended attributes with files.

The PRONOM Persistent Unique Identifier (PUID) is an extensible scheme of persistent, unique, and unambiguous identifiers for file formats, which has been developed by The National Archives of the UK as part of its PRONOM technical registry service.

These were originally intended as a way of identifying what type of file was attached to an e-mail, independent of the source and target operating systems.

Unless the memory images also have reserved spaces for future extensions, extending and improving this type of structured file is very difficult.

It also creates files that might be specific to one platform or programming language (for example a structure containing a Pascal string is not recognized as such in C).

The container's scope can be identified by start- and end-markers of some kind, by an explicit length field somewhere, or by fixed requirements of the file format's definition.

The identifiers are often human-readable, and classify parts of the data: for example, as a "surname", "address", "rectangle", "font name", etc.

This concept has been used again and again by RIFF (Microsoft-IBM equivalent of IFF), PNG, JPEG storage, DER (Distinguished Encoding Rules) encoded streams and files (which were originally described in CCITT X.409:1984 and therefore predate IFF), and Structured Data Exchange Format (SDXF).

Good examples of these types of file structures are disk images, executables, OLE documents TIFF, libraries.

wav-file: 2.1 megabytes.

ogg-file: 154 kilobytes.