On a computer with an unlimited number of processors, this can be reduced to O(n) total time, while still taking only O(log n) memory.
Independently, it performs the same procedure starting at node N and going in reverse order.
The messages coming from both sides are required to calculate the marginal distribution for a node.
When the two message-passing procedures meet in the middle, the algorithm recurses on each half of the chain.
Since every message must be passed again at each level of depth, the algorithm takes O(n log n) time on a single processor.