Thanks for the nice blog post with a clear intro to coarse SVM. I like the simplicity of the pointers being shared but am less sure about the performance implications in the example presented.
Is there code for your OpenCL 1.2 timing comparisons? I'm not sure I quite believe them as on an APU I have zero copy so if I have my buffers set up correctly I can do a zero penalty map of those buffers in OpenCL 1.2 just fine. Then my data structure just needs to be tweaked slightly to be offset based rather than (true) pointer based and I think I'd get almost the same performance in OCL 1.2 as OCL 2.0.