String literal

Tcl allows both quotes (for interpolated strings) and braces (for raw strings), as in "The quick brown fox" or {The quick {brown fox}}; this derives from the single quotations in Unix shells and the use of braces in C for compound statements, since blocks of code is in Tcl syntactically the same thing as string literals – that the delimiters are paired is essential for making this feasible.

The Unicode character set includes paired (separate opening and closing) versions of both single and double quotations: These, however, are rarely used, as many programming languages will not register them (one exception is the paired double quotations which can be used in Visual Basic .NET).

Unpaired marks are preferred for compatibility, as they are easier to type on a wide range of keyboards, and so even in languages where they are permitted, many projects forbid their use for source code.

Some programming languages, such as Perl and PHP, allow string literals without any delimiters in some contexts.

For example, the following two lines of Perl are equivalent: In the original FORTRAN programming language (for example), string literals were written in so-called Hollerith notation, where a decimal count of the number of characters was followed by the letter H, and then the characters of the string: This declarative notation style is contrasted with bracketed delimiter quoting, because it does not require the use of balanced "bracketed" characters on either side of the string.

[citation needed] C++ has two styles of string, one inherited from C (delimited by "), and the safer std::string in the C++ Standard Library.

They consist, essentially of that is, after R" the programmer can enter up to 16 characters except whitespace characters, parentheses, or backslash, which form the end-of-string-id (its purpose is to be repeated to signal the end of the string, eos id for short), then an opening parenthesis (to denote the end of the eos id) is required.

Similar to C++11, D allows here-document-style literals with end-of-string ids: In D, the end-of-string-id must be an identifier (alphanumeric characters).

A further extension is the use of multiple quoting, which allows the author to choose which characters should specify the bounds of a string literal.

Lua (as of 5.1) provides a limited form of multiple quoting, particularly to allow nesting of long comments or embedded strings.

Normally one uses [[ and ]] to delimit literal strings (initial newline stripped, otherwise raw), but the opening brackets can include any number of equal signs, and only closing brackets with the same number of signs close the string.

Another option, which is rarely used in modern languages, is to use a function to construct a string, rather than representing it via a literal.

For example, early forms of BASIC did not include escape sequences or any other workarounds listed here, and thus one instead was required to use the CHR$ function, which returns a string containing the character corresponding to its argument.

In ASCII the quotation mark has the value 34, so to represent a string with quotes on an ASCII system one would write In C, a similar facility is available via sprintf and the %c "character" format specifier, though in the presence of other workarounds this is generally not used: These constructor functions can also be used to represent nonprinting characters, though escape sequences are generally used instead.

Escape sequences are not always pretty or easy to use, so many compilers also offer other means of solving the common problems.

Metacharacters have varying interpretations depending on the context and language, but are generally a kind of 'processing command' for representing printing or nonprinting characters.

For instance, in a C string literal, if the backslash is followed by a letter such as "b", "n" or "t", then this represents a nonprinting backspace, newline or tab character respectively.

In PHP 2 through 5.3, there was a feature called magic quotes which automatically escaped strings (for convenience and security), but due to problems was removed from version 5.4 onward.

These originate in shell scripts and allow a literal to be fed as input to an external command.

This is a feature of C,[7][8] C++,[9] D,[10] Ruby,[11] and Python,[12] which copied it from C.[13] Notably, this concatenation happens at compile time, during lexical analysis (as a phase following initial tokenization), and is contrasted with both run time string concatenation (generally with the + operator)[14] and concatenation during constant folding, which occurs at compile time, but in a later phase (after phrase analysis or "parsing").

In C, where the concept and term originate, string literal concatenation was introduced for two reasons:[16] In practical terms, this allows string concatenation in early phases of compilation ("translation", specifically as part of lexical analysis), without requiring phrase analysis or constant folding.

This is particularly important when used in combination with the C preprocessor, to allow strings to be computed following preprocessing, particularly in macros.

For example, in Python, one can comment a regular expression in this way:[21] Implicit string concatenation is not required by modern compilers, which implement constant folding, and causes hard-to-spot errors due to unintentional concatenation from omitting a comma, particularly in vertical lists of strings, as in: Accordingly, it is not used in most languages, and it has been proposed for deprecation from D[22] and Python.

A subtler issue is that in C and C++,[23] there are different types of string literals, and concatenation of these has implementation-defined behavior, which poses a potential security risk.

This is particularly used to indicate raw strings (no escaping), or to disable or enable variable interpolation, but has other uses, such as distinguishing character sets.

These include both a usual syntax (fixed delimiters) and a generic syntax, which allows a choice of delimiters; these include:[26] REXX uses suffix characters to specify characters or strings using their hexadecimal or binary code.

In some languages, string literals may contain placeholders referring to variables or expressions in the current context, which are evaluated (usually at run time).

For example, the following Perl code: produces the output: In this case, the metacharacter character ($) (not to be confused with the sigil in the variable assignment statement) is interpreted to indicate variable interpolation, and requires some escaping if it needs to be outputted literally.

This is contrasted with "raw" strings: which produce output like: Here the $ characters are not metacharacters, and are not interpreted to have any meaning other than plain text.

For example: Nevertheless, some languages are particularly well-adapted to produce this sort of self-similar output, especially those that support multiple options for avoiding delimiter collision.