Quantcast
Channel: AMD Developer Forums: Message List
Viewing all articles
Browse latest Browse all 4104

Re: VGPRs as intermediate storage

$
0
0

Thanks.

 

With 3/4 my mistake in message. In real kernel loop does 4 steps.

Tried setting global size to multiple of 1024, made no effect. Originally it was multiple of work group size(256).

 

P.S

In the end managed to decrease VGPRs to 103 and gain speed by 20% (but this mostly to LDS usage i think).

Also found these actions to lower VGPR usage

1) pack data to preffered size vectors (not always)

2) using scalar operations even on vectors

a.x +b.x ... a.z + b.z  instead of a+b

3) not using manually unlooped code

even if in loop have to write something like

loop (i = 0..3) {

     do something with D

     A = (i==0) ? D : A;

     B = (i==1) ? D : B;

     C = (i==2) ? D : C

}

it works a little faster (and uses less vgprs)

than

do something with A

do something with B

do something with C

do something with D


Viewing all articles
Browse latest Browse all 4104

Trending Articles