PDF

[2][3] Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it.

[6] PDF files may contain a variety of content besides flat text and graphics including logical structuring elements, interactive elements such as annotations and form-fields, layers, rich media (including video content), three-dimensional objects using U3D or PRC, and various other data formats.

The PDF specification also provides for encryption and digital signatures, file attachments, and metadata to enable workflows requiring these features.

[7] Unlike traditional PostScript, which was tightly focused on rendering print jobs to output devices, IPS would be optimized for displaying pages to any screen and any platform.

PDF combines three technologies: PostScript is a page description language run in an interpreter to generate an image.

[7] PDF is a subset of PostScript, simplified to remove such control flow features, while graphics commands remain.

This is done by applying standard compiler techniques like loop unrolling, inlining and removing unused branches, resulting in code that is purely declarative and static.

[18] The result is then packaged into a container format, together with all necessary dependencies for correct rendering (external files, graphics, or fonts to which the document refers), and compressed.

[24] A COS tree file consists primarily of objects, of which there are nine types:[17] Comments using 8-bit characters prefixed with the percent sign (%) may be inserted.

Before PDF version 1.5, the table would always be in a special ASCII format, be marked with the xref keyword, and follow the main body composed of indirect objects.

The format is flexible in that it allows for integer width specification (using the /W array), so that for example, a document not exceeding 64 KiB in size may dedicate only 2 bytes for object offsets.

Unlike PostScript, PDF does not allow a single path to mix text outlines with lines and curves.

The original imaging model of PDF was opaque, similar to PostScript, where each object drawn on the page completely replaced anything previously marked in the same location.

A tagged PDF (see clause 14.8 in ISO 32000) includes document structure and semantics information to enable reliable text extraction and accessibility.

Tagged PDF defines a set of standard structure types and attributes that allow page content (text, graphics, and images) to be extracted and reused for other purposes.

This dictionary contains an array of Optional Content Groups (OCGs), each describing a set of information and each of which may be individually displayed or suppressed, plus a set of Optional Content Configuration Dictionaries, which give the status (Displayed or Suppressed) of the given OCGs.

PDF files may also contain embedded DRM restrictions that provide further controls that limit copying, editing, or printing.

[2] The first is the Document Information Dictionary, a set of key/value fields such as author, title, subject, creation and update dates.

[44][45][46][47][48] PDF file formats in use as of 2014[update] can include tags, text equivalents, captions, audio descriptions, and more.

[49][50] Leading screen readers, including JAWS, Window-Eyes, Hal, and Kurzweil 1000 and 3000 can read tagged PDFs.

The tags view is what screen readers and other assistive technologies use to deliver high-quality navigation and reading experience to users with disabilities.

Alongside the standard PDF action types, interactive forms (AcroForms) support submitting, resetting, and importing data.

The "submit" action transmits the names and values of selected interactive form fields to a specified uniform resource locator (URL).

[61] In November 2019, researchers from Ruhr University Bochum and Hackmanit GmbH published attacks on digitally signed PDFs.

[64] An overview of security issues in PDFs regarding denial of service, information disclosure, data manipulation, and arbitrary code execution attacks was presented by Jens Müller.

There are many software options for creating PDFs, including the PDF printing capabilities built into macOS, iOS,[72] and most Linux distributions.

In 1993, the Jaws raster image processor from Global Graphics became the first shipping prepress RIP that interpreted PDF natively without conversion to another format.

Many commercial offset printers have accepted the submission of press-ready PDF files as a print source, specifically the PDF/X-1a subset and variations of the same.

Adobe Acrobat is one example of proprietary software that allows the user to annotate, highlight, and add notes to already created PDF files.

The freeware Foxit Reader, available for Microsoft Windows, macOS and Linux, allows annotating documents.

The maximum size of an Acrobat PDF page, superimposed on a map of Europe.