Interprocedural optimization

IPO differs from other compiler optimizations by analyzing the entire program as opposed to a single function or block of code.

IPO seeks to reduce or eliminate duplicate calculations and inefficient use of memory and to simplify iterative sequences such as loops.

The actual IPO process may occur at any step between the human-readable source code and producing a finished executable binary program.

By contrast, human programmers start at the other end with a purpose and attempt to produce a program that will achieve it, preferably without expending a lot of thought in the process.

This simple optimization is foiled the moment that the implementation of F(x) becomes impure; that is, its execution involves references to parameters other than the explicit argument 6 that has been changed between the invocations, or side effects such as printing some message to a log, counting the number of evaluations, accumulating the CPU time consumed, preparing internal tables so that subsequent invocations for related parameters will be facilitated, and so forth.

A general approach to optimization would therefore be to reverse this: some or all invocations of a certain procedure are replaced by the respective code, with the parameters appropriately substituted.

In practice, LTO does not always optimize the entire program—library functions, especially dynamically linked shared objects, are intentionally kept out to avoid excessive duplication and to allow for updating.

[1] Due to performance concerns, not even the entire unit is always directly used—a program could be partitioned in a divide-and-conquer style LTO such as GCC's WHOPR.

This mode makes GCC assume that the module being compiled contains the entry point of the entire program, so that every other function in it is not externally used and can be safely optimized away.

It can be combined with LTO in the one-big-module sense, which is useful when the linker is not communicating back to GCC about what entry points or symbols are being used externally.

Suppose that its invocations are expanded in place, with parameters identified by address: the code amounts to The compiler could then in this rather small example follow the constants along the logic (such as it is) and find that the predicates of the if-statements are constant and so... And since the assignments to a, b and x deliver nothing to the outside world - they do not appear in output statements, nor as input to subsequent calculations (whose results in turn do lead to output, else they also are needless) - there is no point in this code either, and so the result is A variant method for passing parameters that appear to be "by reference" is copy-in, copy-out whereby the procedure works on a local copy of the parameters whose values are copied back to the originals on exit from the procedure.

These details are probably not carefully explained in the compiler manual, and if they are, they will likely be passed over as being not relevant to the immediate task and long forgotten by the time a problem arises.

Put another way, a carefully written test program can report on whether parameters are passed by value or reference, and if used, what sort of copy-in and copy-out scheme.

However, variation is endless: simple parameters might be passed by copy whereas large aggregates such as arrays might be passed by reference; simple constants such as zero might be generated by special machine codes (such as Clear, or LoadZ) while more complex constants might be stored in memory tagged as read-only with any attempt at modifying it resulting in immediate program termination, etc.

Full specifications would amount to a re-statement of the program's function in another form and quite aside from the time the compiler would consume in processing them, they would thus be subject to bugs.

In cases where a program reads no input (as in the example), one could imagine the compiler's analysis being carried forward so that the result will be no more than a series of print statements, or possibly some loops expediently generating such values.

In general, arbitrarily complex considerations arise (the Entscheidungsproblem) to preclude this, and there is no option but to run the code with limited improvements only.

IBM's PL/I Optimizing Compiler performed interprocedural analysis to understand the side effects of both procedure calls and exceptions (cast, in PL/I terms as "on conditions")[3] and in papers by Fran Allen.

However, the degree of optimization is limited when LTO is disabled, as IPO can only happen within an object file and non-static functions can never be eliminated.