Name mangling

For example, to correctly link a function it needs its name, the number of arguments and their types, and so on.

The simple programming languages of the 1970s, like C, only distinguished subroutines by their name, ignoring other information including parameter and return types.

For example, compilers targeted at Microsoft Windows platforms support a variety of calling conventions, which determine the manner in which parameters are sent to subroutines and results are returned.

When compiling the following C examples: 32-bit compilers emit, respectively: In the stdcall and fastcall mangling schemes, the function is encoded as _name@X and @name@X respectively, where X is the number of bytes, in decimal, of the argument(s) in the parameter list (including those passed in registers, for fastcall).

This difference may in some rare cases lead to unresolved externals when porting such code to 64 bits.

Even later, with the emergence of compilers that produced machine code or assembly directly, the system's linker generally did not support C++ symbols, and mangling was still required.

C++ also has complex language features, such as classes, templates, namespaces, and operator overloading, that alter the meaning of specific symbols based on context or usage.

The mangled symbols in this example, in the comments below the respective identifier name, are those produced by the GNU GCC 3.x compilers, according to the IA-64 (Itanium) ABI: All mangled symbols begin with _Z (note that an identifier beginning with an underscore followed by a capital letter is a reserved identifier in C, so conflict with user identifiers is avoided); for nested names (including both namespaces and classes), this is followed by N, then a series of pairs (the length being the length of the next identifier), and finally E. For example, wikipedia::article::format becomes: For functions, this is then followed by the type information; as format() is a void function, this is simply v; hence: For print_to, the standard type std::ostream (which is a typedef for std::basic_ostream >) is used, which has the special alias So; a reference to this type is therefore RSo, with the complete name for the function being: There isn't a standardized scheme by which even trivial C++ identifiers are mangled, and consequently different compilers (or even different versions of the same compiler, or the same compiler on different platforms) mangle public symbols in radically different (and thus totally incompatible) ways.

As C language definitions are unmangled, the C++ compiler needs to avoid mangling references to these identifiers.

For example, the standard strings library, , usually contains something resembling: Thus, code such as: uses the correct, unmangled strcmp and memset.

If the extern "C" had not been used, the (SunPro) C++ compiler would produce code equivalent to: Since those symbols do not exist in the C runtime library (e.g. libc), link errors would result.

It would seem that standardized name mangling in the C++ language would lead to greater interoperability between compiler implementations.

Name mangling is only one of several application binary interface (ABI) details that need to be decided and observed by a C++ implementation.

Other ABI aspects like exception handling, virtual table layout, structure, and stack frame padding also cause differing C++ implementations to be incompatible.

On the contrary, the Annotated C++ Reference Manual (also known as ARM, ISBN 0-201-51459-1, section 7.2.1c) actively encourages the use of different mangling schemes to prevent linking when other aspects of the ABI are incompatible.

Nevertheless, as detailed in the section above, on some platforms[4] the full C++ ABI has been standardized, including name mangling.

Because C++ symbols are routinely exported from DLL and shared object files, the name mangling scheme is not merely a compiler-internal matter.

This guarantees that these incompatibilities are detected at the linking phase, not when executing the software (which could lead to obscure bugs and serious stability issues).

There are instances, particularly in large, complex code bases, where it can be difficult or impractical to map the mangled name emitted within a linker error message back to the particular corresponding token/variable-name in the source.

This problem can make identifying the relevant source file(s) very difficult for build or test engineers even if only one compiler and linker are in use.

Further mangling requirements were imposed later in the evolution of the language because of the addition of modules and other features in the Fortran 90 standard.

For example: In this module, the name of the function will be mangled as __m_MOD_five (e.g., GNU Fortran), m_MP_five_ (e.g., Intel's ifort), m.five_ (e.g., Oracle's sun95), etc.

[10] Rust has used many versions of symbol mangling schemes that can be selected at compile time with an -Z symbol-mangling-version option.

Since Objective-C does not support namespaces, there is no need for the mangling of class names (that do appear as symbols in generated binaries).