The TMU came about due to the compute demands of sampling and transforming a flat image (as the texture map) to the correct angle and perspective it would need to be in 3D space.
The compute operation is a large matrix multiply, which CPUs of the time (early Pentiums for example) could not cope with at an acceptable level of performance.
This is done by chip designers to closely couple shaders and the texture engines they will be working with.
Textures can be an actual image, a lightmap, or even normal maps for advanced surface lighting effects.
As a result, the X1600 XT achieves lower performance when compared to other GPUs of the same era and class (such as nVidia's 7600GT) [citation needed].
However, at the high end, the X1900 XTX has this same 3 to 1 ratio, but does just fine because screen resolutions top out and it has more than enough texture mapping power to handle any display.
It is reasonable to assume that the card with more TMUs will be faster at processing texture information.
The R300 GPU used in the Radeon 9700 had four global vertex shaders, but split the rest of the rendering pipeline in half (it was, so to speak, dual core) each half, called a quad, had four pixel shaders, four TMUs and four ROPs.
All data rendered has to travel through the ROP in order to be written to the framebuffer, from there it can be transmitted to the display.