optimization question - custom selectors

For general discussion related FlowStone
tester
Posts: 1786
Joined: Wed Jan 18, 2012 10:52 pm
Location: Poland, internet

optimization question - custom selectors

Post by tester »

I'm looking for optimized selector-like switcher for streams, since the native selector doesn't work well when there are multiple copies of it. At the moment, I'm using theme like this:

Code: Select all

streamin sw;
streamin in1;
streamin in2;
streamin in3;
streamout out1;
float a1,a2,a3,a4;

a1 = in1&(sw==0);
a2 = in2&(sw==1);
a3 = (-1*in2)&(sw==2);
a4 = in3&(sw==3);

out1 = a1+a2+a3+a4;


What would be faster way?
(mono4 compatible)

Also, from the past, I remember, there was some asm hack, that allows to "stop" some inputs from processing, like the selectors do. But I don't remember the details now.
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
User avatar
martinvicanek
Posts: 1334
Joined: Sat Jun 22, 2013 8:28 pm

Re: optimization question - custom selectors

Post by martinvicanek »

You can save CPU by not storing a1 through a4 because they are actually not needed further. The following ASM code uses about half the CPU:

Code: Select all

streamin sw;
streamin in1;
streamin in2;
streamin in3;

streamout out1;

float F0=0.0;
float F1=1.0;
float F2=2.0;
float F3=3.0;

movaps xmm0,F0; cmpps xmm0,sw,0; andps xmm0,in1;   // in1&(sw==0)
movaps xmm1,F1; cmpps xmm1,sw,0; andps xmm1,in2;   // sin2&(w==1)
movaps xmm2,F2; cmpps xmm2,sw,0; andps xmm2,in2;   // in2&(sw==2)
movaps xmm3,F3; cmpps xmm3,sw,0; andps xmm3,in3;   // in3&(sw==3)

addps xmm0,xmm1; subps xmm0,xmm2; addps xmm0,xmm3;
movaps out1,xmm0;


Further optimizations might be possible, depending on how often sw changes and if it is Mono4 or has the same value for all 4 channels. In the latter case you might do the (sw==0) comparisons in green. You might also consider hopping, but there is not really much more to gain anyway.
User avatar
HughBanton
Posts: 265
Joined: Sat Apr 12, 2008 3:10 pm
Location: Evesham, Worcestershire
Contact:

Re: optimization question - custom selectors

Post by HughBanton »

Oh, nice one. I was attempting something complicated with stream selectors a couple of weeks back, but gave it up because of some weird behaviour. I'll certainly try again now, U-turn the U-turn.

Typo with that "sin2" obviously - had me puzzled for a moment!

And presumably .. ' addps xmm0,xmm2 ' ? Or am I as confused as usual :?

H
tester
Posts: 1786
Joined: Wed Jan 18, 2012 10:52 pm
Location: Poland, internet

Re: optimization question - custom selectors

Post by tester »

Thanks Martin,

This is for switching audio signals, full mono4 usage, so hoping or removing channels rather isn't an option.

And how such asm optimized code would look like for multiplexer? (unused outs = 0)
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
adamszabo
Posts: 667
Joined: Sun Jul 11, 2010 7:21 am

Re: optimization question - custom selectors

Post by adamszabo »

HughBanton wrote:And presumably .. ' addps xmm0,xmm2 ' ? Or am I as confused as usual :?


Normally yes, but Martin made the code behave like in the very first example, so it works the same way as that.
User avatar
martinvicanek
Posts: 1334
Joined: Sat Jun 22, 2013 8:28 pm

Re: optimization question - custom selectors

Post by martinvicanek »

A simple multiplexer would go like this:

Code: Select all

// inputs
streamin switch;
streamin in;

// outputs
streamout out0;
streamout out1;
streamout out2;
streamout out3;

// constants
float F0=0;
float F1=1;
float F2=2;
float F3=3;

// code
movaps xmm6,switch;
movaps xmm7,in;

movaps xmm0,F0; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out0,xmm0;
movaps xmm1,F1; cmpps xmm1,xmm6,0; andps xmm1,xmm7; movaps out1,xmm1;
movaps xmm2,F2; cmpps xmm2,xmm6,0; andps xmm2,xmm7; movaps out2,xmm2;
movaps xmm3,F3; cmpps xmm3,xmm6,0; andps xmm3,xmm7; movaps out3,xmm3;


If the switch input does not change very often you could hop the compares, however the CPU gain is only marginal:

Code: Select all

// inputs
streamin switch;
streamin in;

// outputs
streamout out0;
streamout out1;
streamout out2;
streamout out3;

// constants
float F0=0;
float F1=1;
float F2=2;
float F3=3;

// masks
int mask0=0;
int mask1=0;
int mask2=0;
int mask3=0;

// code
mov eax,ecx; and eax,63; cmp eax,0; jnz skipCompares;
   movaps xmm0,F0; cmpps xmm0,switch,0; movaps mask0,xmm0;
   movaps xmm1,F1; cmpps xmm1,switch,0; movaps mask1,xmm1;
   movaps xmm2,F2; cmpps xmm2,switch,0; movaps mask2,xmm2;
   movaps xmm3,F3; cmpps xmm3,switch,0; movaps mask3,xmm3;
skipCompares:

movaps xmm7,in;
movaps xmm0,mask0; andps xmm0,xmm7; movaps out0,xmm0;
movaps xmm1,mask1; andps xmm1,xmm7; movaps out1,xmm1;
movaps xmm2,mask2; andps xmm2,xmm7; movaps out2,xmm2;
movaps xmm3,mask3; andps xmm3,xmm7; movaps out3,xmm3;
tester
Posts: 1786
Joined: Wed Jan 18, 2012 10:52 pm
Location: Poland, internet

Re: optimization question - custom selectors

Post by tester »

Thanks again.

I admit, my domain is rather in wiring green relationships, than messing with asm code.
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
User avatar
HughBanton
Posts: 265
Joined: Sat Apr 12, 2008 3:10 pm
Location: Evesham, Worcestershire
Contact:

Re: optimization question - custom selectors

Post by HughBanton »

No need to use xmm1, 2 or 3 in the simple multiplexer I think ...

Code: Select all

 // 1-pole, 4-way Multiplexer
streamin switch, in;
streamout out0, out1, out2, out3;

float F0=0, F1=1, F2=2, F3=3;

movaps xmm6,switch;
movaps xmm7,in;

movaps xmm0,F0; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out0,xmm0; //(sw==0)
movaps xmm0,F1; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out1,xmm0; //(sw==1)
movaps xmm0,F2; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out2,xmm0; //(sw==2)
movaps xmm0,F3; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out3,xmm0; //(sw==3)


.. may or may not matter in practice, but means you could easily turn this into a super-efficient multipole mpx using the spare xmm's. (Should you ever need such a device!)

Also note that any of these can generally be used in stage0 only, if you only need a one-off note-on lookup of something.

H
User avatar
HughBanton
Posts: 265
Joined: Sat Apr 12, 2008 3:10 pm
Location: Evesham, Worcestershire
Contact:

Re: optimization question - custom selectors

Post by HughBanton »

Since I just lerrrv messing with asm code, I just came up with this simplification for the Selector
- OR instead of ADD

Code: Select all

   //4-in, 1-out selector
streamin sw, in0, in1, in2, in3;
streamout out;

float F0=0, F1=1, F2=2, F3=3;

     movaps xmm7,sw; //xmm7=switch
movaps xmm0,xmm7; cmpps xmm0,F0,0; andps xmm0,in0; movaps xmm1,xmm0; //(sw==0)
movaps xmm0,xmm7; cmpps xmm0,F1,0; andps xmm0,in1; orps xmm1,xmm0; //(sw==1)
movaps xmm0,xmm7; cmpps xmm0,F2,0; andps xmm0,in2; orps xmm1,xmm0; //(sw==2)
movaps xmm0,xmm7; cmpps xmm0,F3,0; andps xmm0,in3; orps xmm1,xmm0; //(sw==3)
movaps out,xmm1;


... seems to work OK?
H
User avatar
martinvicanek
Posts: 1334
Joined: Sat Jun 22, 2013 8:28 pm

Re: optimization question - custom selectors

Post by martinvicanek »

HughBanton wrote:No need to use xmm1, 2 or 3 in the simple multiplexer I think ...

Correct, you can spare xmm1 etc. for something else if you need to. On the other hand, I like to use 4 lanes if I can afford so. If anything, it might help the processor to do things in parallel. ;)

HughBanton wrote:[...] simplification for the Selector- OR instead of ADD

Yes! For a plain selector OR will be somewhat lighter on CPU than ADD. I used ADD and SUB only to comply with the OP's requirement for sw==2.
Post Reply