CubeHash has a 128 byte state, uses wide pipe construction, and is ARX based.
[2] After clarifications from NIST, the author changed the proposal to Cubehash16/32, which "is approximately 16 times faster than CubeHash8/1, easily catching up to both SHA-256 and SHA-512 on the reference platform" while still maintaining a "comfortable security margin".
The internal state is defined as a five-dimensional array of words (four-byte integers), 0-1 in both dimensions.
The IV can be saved and reused for a given combination of h, b, r. The message is padded and split to b-byte blocks.
Each block is inputted by XORing to the first b bytes of the state, and then performing r rounds of transformation.
Finally, 1 is XORed to the state word [11111], and then f rounds of transformation are performed.
The 512 bit hash value is: A small change in the message, such as flipping a single bit, will wildly change the hash output, due to the avalanche effect.