Plain text

In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.).

For example, that could exclude any indication of fonts or layout (such as markup, markdown, or even tabs); characters such as curly quotes, non-breaking spaces, soft hyphens, em dashes, and/or ligatures; or other things.

Thus, representations such as SGML, RTF, HTML, XML, wiki markup, and TeX, as well as nearly all programming language source code files, are considered plain text.

A comment, a ".txt" file, or a TXT Record generally contains only plain text (without formatting) intended for humans to read.

Thus, early text projects such as Roberto Busa's Index Thomisticus, the Brown Corpus, and others had to resort to conventions such as keying an asterisk preceding letters actually intended to be upper-case.

Fred Brooks of IBM argued strongly for going to 8-bit bytes, because someday people might want to process text, and won.

The text-encoding situation became more and more complex, leading to efforts by ISO and by the Unicode Consortium to develop a single, unified character encoding that could cover all known (or at least all currently known) languages.

To properly understand or process it the recipient must know (or be able to figure out) what encoding was used; however, they need not know anything about the computer architecture that was used, or about the binary structures defined by whatever program (if any) created the data.

Another MIME type often used in both email and HTTP is "text/html; charset=UTF-8" -- plain text represented using the UTF-8 character encoding with HTML markup.

ASCII reserves the first 32 codes (numbers 0–31 decimal) for control characters known as the "C0 set": codes originally intended not to represent printable information, but rather to control devices (such as printers) that make use of ASCII, or to provide meta-information about data streams such as those stored on magnetic tape.

They are rarely used directly; when they turn up in documents which are ostensibly in an ISO 8859 encoding, their code positions generally refer instead to the characters at that position in a proprietary, system-specific encoding, such as Windows-1252 or Mac OS Roman, that use the codes to instead provide additional graphic characters.

Text file with portion of The Human Side of Animals by Royal Dixon , displayed by the command cat in an xterm window