Page 1 of 1

whats faster for repacking mono4 stream?

Posted: Mon Nov 24, 2014 12:34 am
by Nubeat7
quick asm question again,

for optimizing my schematics i do a lot of repacking mono4 streams, after using just 2 channels most of the time (stereo) i often pack 2 stereo signals (from 2 mono4 nodes) into one mono4, instead of using unpacking and packing again i normally always used this:

Code: Select all

fld in1[0];   fstp out1n2[0];
fld in1[1];   fstp out1n2[1];
fld in2[0];   fstp out1n2[2];
fld in2[1];   fstp out1n2[3];

but i also could use this:

Code: Select all

movaps xmm0,in1;
movaps xmm1,in2;
shufps xmm0,xmm1,68;
movaps out,xmm0;

which i think should be faster? am i right that the shufps is faster?

Re: whats faster for repacking mono4 stream?

Posted: Mon Nov 24, 2014 12:46 am
by KG_is_back
The shufps takes only one cycle on most CPUs, In the first example you read four times from memory and write 4 times to memory, While in example 2 you read twice and read once, so it's definitely faster, as far as I can tell.

Have a look at the Opcode reference I've made recently and also you can easily use Code Speed tester to inspect the actual CPU load.

Re: whats faster for repacking mono4 stream?

Posted: Mon Nov 24, 2014 9:57 am
by martinvicanek
Yes, shufps is much faster. Also avoid using the stock Pack and Unpack modules as they essentially use fld and fstp. The worst example of "Verschlimmbesserung" (sorry about the German term) is the stock Stereo Clipper, where the Pack/Unpack modules overhead outweighs by far any potential CPU savings.

Re: whats faster for repacking mono4 stream?

Posted: Mon Nov 24, 2014 4:53 pm
by Nubeat7
thanks martin for the confirmation :)

but how to do it the other way around without fld / fstp

so if i have one mono4 input (2 x stereo) and i want to route them into 2 mono4 streams again

Code: Select all

fld in[0]; fstp out1[0];
fld in[1]; fstp out1[1];
fld in[2]; fstp out2[0];
fld in[3]; fstp out2[1];


couldn't figure out a way with shufps?

Re: whats faster for repacking mono4 stream?

Posted: Mon Nov 24, 2014 7:52 pm
by martinvicanek
Like this?

Code: Select all

streamin pack;
streamout out0;
streamout out1;
int true=-1;   // binary 11111111111111111111111111111111
float mask0=01;

stage0;
fld true[0]; fst mask01[0]; fstp mask01[1];

stage 2;
movaps xmm0,pack;
movaps xmm1,xmm0;
shufps xmm1,xmm1,78;   // 0123 -> 2301 (23 are first)
andps xmm0,mask01;
movaps out0,xmm0;
andps xmm1,mask01;
movaps out1,xmm1;

Or, depending on what you do with the two outputs further on, you might even drop the masking: ;)

Code: Select all

streamin pack;
streamout out0;
streamout out1;

movaps xmm0,pack;
movaps out0,xmm0;
shufps xmm0,xmm0,78;   // 0123 -> 2301 (23 are first)
movaps out1,xmm0;