Overview of considerations when determining whether heterogeneous computing is suitable for your application, as well as the factors to consider when choosing between CPUs, GPUs and FPGAs. Here's a breakdown of the key points:
Evaluate whether your application can benefit from parallel computing. If there are core computations that can be parallelized and dominate the runtime, heterogeneous computing may be beneficial.
Consider Amdahl's Law, which states that the maximum speed-up of a program is limited by the fraction of the code that cannot be parallelized. Sequential portions of the code can limit the overall speed-up achievable through parallelization.
Choose the appropriate device based on factors such as speed, power efficiency, and programmability.
GPUs typically provide the best speed-up for independent, massively parallelizable computations due to their large number of cores and specialized architecture optimized for parallel processing.
FPGAs can perform as well as GPUs for parallelizable computations, but they excel in handling complex algorithms such as Fast Fourier Transform (FFT), sine/cosine calculations, and other specialized tasks. Their hardware flexibility allows for customized implementations optimized for specific algorithms.
GPUs are known to be power-hungry devices, consuming more energy compared to FPGAs. FPGAs offer superior performance per watt, making them an attractive choice for energy-efficient computing solutions, especially in applications with stringent power constraints.
GPUs are generally easier to program, with high-level programming frameworks such as CUDA and OpenCL providing abstractions for parallel computing.
FPGAs require some hardware design skills to achieve optimal hardware implementations. Programming FPGAs involves writing hardware description languages (HDLs) such as Verilog or VHDL and optimizing designs for the target FPGA architecture.
In summary, when deciding between CPUs, GPUs and FPGAs for heterogeneous computing, consider factors such as speed, power efficiency, and programmability, as well as the nature of your application's computations and algorithmic requirements. Each device has its strengths and trade-offs, and the optimal choice depends on the specific needs and constraints of your application.