Computing with memory

Reconfigurable computing platforms offer advantages in terms of reduced design cost, early time-to-market, rapid prototyping and easily customizable hardware systems.

[2][3] An effective way of reducing the impact of the PI architecture in FPGA is to place small LUTs in close proximity (referred as clusters) and to allow intra-cluster communication using local interconnects.

[4][5] Investigations have also been made to reduce the overhead due to PI in fine-grained FPGAs by mapping larger multi-input multi-output LUTs to embedded memory blocks.

[8] Each computing element incorporates a two-dimensional memory array for storing LUTs, a small controller for sequencing evaluation of sub-functions and a set of temporary registers to hold the intermediate outputs from individual partitions.

The local time-multiplexed execution inside the computing elements can drastically reduce the requirement of programmable interconnects leading to large improvement in energy-delay product and better scalability of performance across technology generations.