Originally TCP was designed for unreliable low speed networks (such as early dial-up modems) but with the growth of the Internet in terms of backbone transmission speeds (using Optical Carrier, Gigabit Ethernet and 10 Gigabit Ethernet links) and faster and more reliable access mechanisms (such as DSL and cable modems) it is frequently used in data centers and desktop PC environments at speeds of over 1 Gigabit per second.
At these speeds the TCP software implementations on host systems require significant computing power.
In the early 2000s, full-duplex gigabit TCP communication could consume more than 80% of a 2.4 GHz Pentium 4 processor,[2] resulting in small or no processing resources left for the applications to run on the system.
These aspects include: Moving some or all of these functions to dedicated hardware, a TCP offload engine, frees the system's main CPU for other tasks.
A generally accepted rule of thumb is that 1 Hertz of CPU processing is required to send or receive 1 bit/s of TCP/IP.
This implies that 2 entire cores of a 2.5 GHz multi-core processor will be required to handle the TCP/IP processing associated with 5 Gbit/s of TCP/IP traffic.
Within the TCP protocol, a large number of small packets are created (e.g. acknowledgements) and as these are typically generated on the host CPU and transmitted across the PCI bus and out the network physical interface, this impacts the host computer IO throughput.
[4] By 2002, as the emergence of TCP-based storage such as iSCSI spurred interest, it was said that "At least a dozen newcomers, most founded toward the end of the dot-com bubble, are chasing the opportunity for merchant semiconductor accelerators for storage protocols and applications, vying with half a dozen entrenched vendors and in-house ASIC designs.
Because the HBA appears to the host as a disk controller, it can only be used with iSCSI devices and is not appropriate for general TCP/IP offload.
Almost all TCP offload engines use some type of TCP/IP hardware implementation to perform the data transfer without host CPU intervention.
Large receive offload (LRO) is a technique for increasing inbound throughput of high-bandwidth network connections by reducing central processing unit (CPU) overhead.
Linux implementations generally use LRO in conjunction with the New API (NAPI) to also reduce the number of interrupts.
[9][10][11] [12] LRO should not operate on machines acting as routers, as it breaks the end-to-end principle and can significantly impact performance.
Unlike other operating systems, such as FreeBSD, the Linux kernel does not include support for TOE (not to be confused with other types of network offload).