GPU or CPU?

For a couple of years now, general-purpose computation on the gpu (GPGPU) has been all the rage to get the last bit of performance for many data-intensive algorithms, including computer vision on the gpu. Claims of 100-fold speed-ups are not unheard of and 10-fold is pretty common. Now, a couple of researchers from Intel have published a paper where they compare optimized CPU code (using Intels MKL) with optimized GPU code and find that, on average, speed-ups are more in the 2.5 range.

My first though was: Of course, these are Intel guys, they don't like GPUs, so they are biased. Indeed, some criticism could be levelled at the paper: For example, they only compare single algorithms, but it is well known that some overhead exists for transferring the data to the GPU. Therefore, a succession of algorithms could well achieve higher speed-ups.

However, that seems a little beside the point. I'm not surprised Intels engineers can optimize numeric algorithms well and I believe their numbers are sound. These are some of the most knowledgable experts in the area. By the same argument, I'm not surprised that GPU-experts can produce amazingly fast algorithms on the GPU.

The thing is: Can other people do it, too? It has been the case for a long time that experts could produce code that is vastly better optimized, and vastly faster, than the everyday programmer. Why should that be different with multi-core or highly-parallel processors?

The thing is: Maybe OpenCL is a game changer here, because it puts limitations on what you can express, thus making the optimizers task easier. I don't know who put more effort development into it -- the MKL guys or the CUDA guys, but my impression is that the CUDA approach is more scalable, because it has a model that is better suited. I'm putting my bets on stuff such as OpenCL4Java and I also like stuff such as ScalaCL very much. These things could put the vast computing power into the hands of many more people.