Are you running this on CPU Device (or) GPU device?
Can you run CodeXL and check how the launch configuration looks like?
Since 5 is a prime number, I believe the run-time will use a workgroup-size of (1x1) so that local size divides the global size dimension wise..
Apart from that, I don't understand why you are getting wrong values.
If you confirm the info above, I can download n check your code..
-
Bruha...