Memory barriers are necessary because most modern CPUs employ performance optimizations that can result in out-of-order execution.
This reordering of memory operations (loads and stores) normally goes unnoticed within a single thread of execution, but can cause unpredictable behavior in concurrent programs and device drivers unless carefully controlled.
Such code includes synchronization primitives and lock-free data structures on multiprocessor systems, and device drivers that communicate with computer hardware.
However, when the memory is shared with multiple devices, such as other CPUs in a multiprocessor system, or memory-mapped peripherals, out-of-order access may affect program behavior.
A memory barrier must be inserted before thread #2's assignment to f to ensure that the new value of x is visible to other processors at or prior to the change in the value of f. Another important point is a memory barrier must also be inserted before thread #1's access to x to ensure the value of x is not read prior to seeing the change in the value of f. Another example is when a driver performs the following sequence: If the processor's store operations are executed out-of-order, the hardware module may be triggered before data is ready in memory.
Although the effects on parallel program behavior can be similar in both cases, in general it is necessary to take separate measures to inhibit compiler reordering optimizations for data that may be shared by multiple threads of execution.
Memory-mapped I/O generally requires that the reads and writes specified in source code happen in the exact order specified with no omissions.
Omissions or reorderings of reads and writes by the compiler would break the communication between the program and the device accessed by memory-mapped I/O.