On GCN the physical vector type is not int4, it's int64
Scalar instructions aren't working on individual vector elements, they have a separate 64 bit register space on which they work separated from (and paralell with) the vector alu.
There are instructions to extracts a specific element from a vector register into a scalar reg: v_readlane_b32, v_readfirstlane_b32. They eat 1 cycle.
"Does it help if I copy vector elements to scalar variables first?"
Why? The vector does 64x much operations than the scalar alu. Scalar is there for program control, address calculation, for the calculation of some temporary results that are common to all the 64lane wavefront, and also for some miscellaneous things.