The attn_processor will turn the hidden_states into channel last (NHWC) layout, which will have great benefit on CPU when running matmul. But the op: .contiguous() will turn the tensor into NCHW ...