I solved the first issue myself:
Due to the weak consistency memory model, accesses to counter caused an unexpected behavior.
It was fixed by using a mem_fence:
counter = (counter + 1) % 10 ;
mem_fence(CLK_LOCAL_MEM_FENCE);
Anyway, avoiding the mem_fence caused certain instabilities in the driver, that maybe could be reviewed by the development team.
I got no way to solve the second issue. Could be posssible to force the use of c11 atomic in a OpenCL kernel launched in the CPU?
Thanks.