Optimized Chorus/Flanger/Echo&PingPong Delay/Tarrabia filter

Post any examples or modules that you want to share here
Post Reply
User avatar
TrojakEW
Posts: 111
Joined: Sat Dec 25, 2010 10:12 am
Location: Slovakia

Optimized Chorus/Flanger/Echo&PingPong Delay/Tarrabia filter

Post by TrojakEW »

I'v done some adjustment/asm optimizing to some stock modules in flowstone. Nothing extraordinary and nothing for advanced users. I also want to thanks trogluddite, MyCo, infuzion, cyto for hard work on synthmaker forums. From their posts I learned a lot. So now is my turn to help other users in begining, even I'm still beginer too.

Stock Echo/Ping Pong delay are not CPU hungry but I was able to reduce usage even more. I replaced almost all math with asm version and gain 20~25% performance. I also add additional delay module with stereo input.

Stereo chorus and chorus/flanger use 40% less CPU. I also replaced stock delay with optimized fractional delay (don't know who optimized it so credits go to someone else).

Last is Tarrabia filter from audiooak module pack. Optimized version is 45~55% faster depend on selected filter type.

read more info/statistic and download here

This is for now. More to come.
infuzion
Posts: 109
Joined: Tue Jul 13, 2010 11:55 am
Location: Kansas City, USA, Earth, Sol

Re: Optimized Chorus/Flanger/Echo&PingPong Delay/Tarrabia fi

Post by infuzion »

Welcome :) Ah, a [wo]man after my own heart!

Yes ASM can bring reduce CPU cycles alot, but I'm not certain 40-50% like you are claiming; even the now old Core2Duo CPUs self-optimize decently. For AMD, I could see around 40%.
There are still a few places you can squeze out an extra cycle or 2. Or 5. But overall well done :)
User avatar
TrojakEW
Posts: 111
Joined: Sat Dec 25, 2010 10:12 am
Location: Slovakia

Re: Optimized Chorus/Flanger/Echo&PingPong Delay/Tarrabia fi

Post by TrojakEW »

Thank you.
Well I have old athlon 64 x2 4200 and for me those are real number. Of course there are some room for more improvement by combining asm block to one in order to reduce movaps but for me (and I'm sure for some other users) it's more easier to navigate trough few block rather than one big code (so I will sacrifice 2 or 5 cycles for that :mrgreen: ). That's why I choose flowstone. Of course if you have any suggestion I'm always ready to learn more.
User avatar
TrojakEW
Posts: 111
Joined: Sat Dec 25, 2010 10:12 am
Location: Slovakia

Re: Optimized Chorus/Flanger/Echo&PingPong Delay/Tarrabia fi

Post by TrojakEW »

infuzion
Posts: 109
Joined: Tue Jul 13, 2010 11:55 am
Location: Kansas City, USA, Earth, Sol

Re: Optimized Chorus/Flanger/Echo&PingPong Delay/Tarrabia fi

Post by infuzion »

TrojakEW wrote:Well I have old athlon 64 x2 4200 and for me those are real number.
I learned to optimize SM-ASM with my Athlon 64x2 also; the best CPU in it's day but it is like 7 years old now? Surprised people still use it... though I thought about reviving it to ensure my ASM is fastest. Most newer Intel CPUs (like my i7 laptop) do not allow accurate CPU% readings inside SM.

BTW, I would pick up a Core2Duo (even an old used one) if you can; not only it will help you not wear out your Athlon for ASM testing, but it is much faster in many ways, & a few ASM optimizations in Athlon actually make the Intel lose a few cycles. Not many, but I did notice it when I was helping with the toolbox's DeZip & a few other projects. C2D do some smart opcode self-interlacing, opcodes are faster (esp divides), & on some occasions can run the same XMM registers at the same time IIRC. A few SM primitives run faster also; IIRC the Selector used ~10 cycles in stream mode on the Athlon but 2-3 cycles on the C2D+, so your design choices may be different.

I have a few more tips for you in SM's forum in the past year; search "ASM" with user "infuzion".
User avatar
TrojakEW
Posts: 111
Joined: Sat Dec 25, 2010 10:12 am
Location: Slovakia

Re: Optimized Chorus/Flanger/Echo&PingPong Delay/Tarrabia fi

Post by TrojakEW »

I can't afford buy anything now even used stuff. If differcence is only few cycles than it's ok (for now). But this also shows that it's almost impossible to optimize anything for all CPU types. As for your tips from SM forum I think I read them almost all. I hope. Some are self explaining while other need time to "master". Now I'm trying to optimize arrays in asm which is little confusing for me. Most of time I end up with exact oppostie effect and code will run 4-8x slower :mrgreen: .
User avatar
TrojakEW
Posts: 111
Joined: Sat Dec 25, 2010 10:12 am
Location: Slovakia

Re: Optimized Chorus/Flanger/Echo&PingPong Delay/Tarrabia fi

Post by TrojakEW »

I have updated chorus pack. Both modules are even faster (66-67% on my CPU). Big reduction thanks to Trog Luddite optimized delay. Replaced sine LFO with asm version and logscaling with ruby version in order to reduce number of modules/components.
Post Reply