the reason for it is that the dsp code gets translated to assembler code where it produces a lot of (often unneeded) register movings
, after you only have 8 registers it gets fucked up when you do more then 8 operations in a row..
connect a text primitive to the "c" output of the dsp code primitive and you will find
which shows you that you are out of the valid registers 0-7
make the parentheses KG assumed and watch again, now only the registers from xmm0-xmm7 are used and it works
further you can copy the text code into an assembler code primitive and optimize it, there you can get rid of the most movaps and end up like this
Code: Select all
streamin in1;streamin in2;
streamin in3;streamin in4;
streamin in5;streamin in6;
streamin in7;streamin in8;
streamout out;
float F1=1;
movaps xmm0,in1;
andps xmm0,in2;
andps xmm0,in3;
andps xmm0,in4;
andps xmm0,in5;
andps xmm0,in6;
andps xmm0,in7;
andps xmm0,in8;
cmpps xmm0,F1,0;
movaps xmm1,F1;
andps xmm1,xmm0;
movaps out,xmm1;
with this version of the code you can add as many inputs you want with just appending the
and
before the cmpps...
if the chain is very long it maybe would make sense to use more registers to check up to 7 groups in parallel for faster processing.