Load-balanced switches are a subject of research for large routers scaled past the point of practical central arbitration.
Practical systems use imperfect arbitration heuristics (such as iSLIP) that can be computed in reasonable amounts of time.
Each buffer writes these packets into a single buffer-local memory at a combined rate of R. Simultaneously, each buffer sends packets at the head of each virtual output queue to each output line card, again at rate R/N to each card.
By adding yet more latency and buffering, the load-balanced switch can maintain packet order within flows using only local information.
FOFF has the additional benefits of removing any vulnerability to pathological traffic patterns, and providing a mechanism for implementing priorities.
Upgrading the arbiter to include load-balancing and combining these devices could have reliability, cost and throughput advantages.