Processor consistency

This example cache consistent because P2 sees writes to individual memory locations in the order they were issued in P1.

[3] In this regard, PC performs better than SC because recovery techniques for failed speculations are not necessary, which means fewer pipeline flushes.

[3] Prefetching is the act of fetching data in advance for upcoming loads and stores before it is actually needed, to cut down on load/store latency.

[3] For example, in lock synchronization, the only operation whose behavior is not fully defined by PC is the lock-acquire store, where subsequent loads are in the critical section and their order affects the outcome.

This is due to the number of synchronization points inherent to programs that run on multiprocessor systems.

One of the main components of processor consistency is that if a write followed by a read is allowed to execute out of program order.

This essentially results in the hiding of write latency when loads are allowed to go ahead of stores.

Since many applications function correctly with this structure, systems that implement this type of relaxed ordering typically appear sequentially consistent.

Two other models that conform to this specification are the SPARC V8 TSO (Total Store Ordering) and the IBM-370.

With this, it is possible that a load returns a store that occurred that is "out of date" in terms of program order.