Threaded code

The resulting executable is typically fast but, because it is specific to a hardware platform, it isn't portable.

For example, most Data General Nova, IBM 1130, and many of the first microcomputers had only 4 kB of RAM installed.

Consequently, a lot of time was spent trying to find ways to reduce a program's size, to fit in the available memory.

One solution is to use an interpreter which reads the symbolic language a bit at a time, and calls functions to perform the actions.

This means that a program consisting of many function calls may have considerable amounts of repeated code as well.

To address this, threaded code systems used pseudo-code to represent function calls in a single operator.

At run time, a tiny "interpreter" would scan over the top-level code, extract the subroutine's address in memory, and call it.

During the 1970s, hardware designers spent considerable effort to make subroutine calls faster and simpler.

On the improved designs, only a single instruction is expended to call a subroutine, so the use of a pseudo-instruction saves no room.

The addresses may be direct or indirect, contiguous or non-contiguous (linked by pointers), relative or absolute, resolved at compile time or dynamically built.

Another variable sp (Stack Pointer) contains an address elsewhere in memory that is available to hold a value temporarily.

[8] In 1970, Charles H. Moore invented a more compact arrangement, indirect threaded code (ITC), for his Forth virtual machine.

Moore arrived at this arrangement because Nova minicomputers had an indirection bit in every address, which made ITC easy and fast.

This form is simple, but may have overheads because the thread consists only of machine addresses, so all further parameters must be loaded indirectly from memory.

Where the handler operands include both values and types, the space savings over direct-threaded code may be significant.

Early compilers for ALGOL, Fortran, Cobol and some Forth systems often produced subroutine-threaded code.

The code in many of these systems operated on a last-in-first-out (LIFO) stack of operands, for which compiler theory was well-developed.

1 byte / 8-bits is the natural choice for ease of programming, but smaller sizes like 4-bits, or larger like 12 or 16 bits, can be used depending on the number of operations supported.

As long as the index width is chosen to be narrower than a machine pointer, it will naturally be more compact than the other threading types without much special effort by the programmer.

In any case, if the problem being computed involves applying a large number of operations to a small amount of data then using threaded code may be an ideal optimization.

A Huffman-threaded interpreter locates subroutines using an index table or a tree of pointers that can be navigated by the Huffman code.

This was used in Charles H. Moore's earliest Forth implementations and in the University of Illinois's experimental hardware-interpreted computer language.

HP's RPL, first introduced in the HP-18C calculator in 1986, is a type of proprietary hybrid (direct-threaded and indirect-threaded) threaded interpretive language (TIL)[12] that, unlike other TILs, allows embedding of RPL "objects" into the "runstream", i.e. the stream of addresses through which the interpreter pointer advances.

The dual-stack principle originated three times independently: for Burroughs large systems, Forth, and PostScript.