less cpu hungry power approximation?

tester · Post by **tester** » Sun Feb 06, 2022 1:12 pm

Theme like tkis in stream:

pow(base,exp);

is pretty cpu hungry. Are there any faster (non-hoped, mono4) approximations, that could do the job?

For the design, base can be an integer, starting from 2, or even power of 2; exp is in range (0;1).
The most minimalistic design requires base=2 (it's for signal scaling).

nix · Post by **nix** » Sun Feb 06, 2022 6:39 pm

square: val * val
cube: val * val * val

does that simple thought from a simple soul help?

tester · Post by **tester** » Sun Feb 06, 2022 6:48 pm

Nix, base as integer, exp as continuous range between 0 and 1.
Like 2^0.432, 2^0.456, etc.

juha_tp · Post by **juha_tp** » Sun Feb 06, 2022 7:26 pm

Hmm... how much faster approximation is depends on accuracy you need.

Is this assembler code FS compatible https://wurstcaptures.untergrund.net/as ... ricks.html ?

martinvicanek · Post by **martinvicanek** » Sun Feb 06, 2022 7:38 pm

This is my fast Mono4 2^x implementation for float x, accuracy close to machine precision.

Code: Select all

streamin x;            streamout y;   // 2^x

// 2^x Approximation
// Author: Martin Vicanek
// Relative Error < 1e-7
// CPU load 2% of built-in pow() function

// y = 2^x
// decompose x = int + frac
// compute I = 2^int by bit shifting
// approximate F = 2^frac by polynomial
// so y = I*F

float xmax=127.5;      // yields 1.#INF
float xmin=-126.5;      // yields 0
float F0P5=0.5;         float a0=1;
float a1=0.693147034;   float a2=0.2402295  ;
float a3=0.055484164;   float a4=0.009678109;
float a5=0.001243999;   float a6=0.000217193;
int I127=127;

// decompose x into int and frac parts
movaps xmm0,x; minps xmm0,xmax; maxps xmm0,xmin; 
movaps xmm1,xmm0; subps xmm1,F0P5; cvtps2dq xmm1,xmm1;
cvtdq2ps xmm2,xmm1;      // xmm1 is the int part
subps xmm0,xmm2;      // xmm0 is the frac part

// evaluate 2^int
paddd xmm1,I127; pslld xmm1,23;   // xmm1 is 2^int

// evaluate 2^frac (polynomial approx.)
movaps xmm2,a6; mulps xmm2,xmm0;
addps xmm2,a5; mulps xmm2,xmm0;
addps xmm2,a4; mulps xmm2,xmm0;
addps xmm2,a3; mulps xmm2,xmm0;
addps xmm2,a2; mulps xmm2,xmm0;
addps xmm2,a1; mulps xmm2,xmm0;
addps xmm2,a0;         // xmm2 is 2^frac
mulps xmm2,xmm1;      // put it together
movaps y,xmm2;

tester · Post by **tester** » Mon Feb 07, 2022 11:47 am

Thank you very much Martin, this should do the job for scaling cases.

nix · Post by **nix** » Mon Feb 07, 2022 10:10 pm

oh sorry
I see now that this can use decimals

thanks guys

less cpu hungry power approximation?

less cpu hungry power approximation?

Re: less cpu hungry power approximation?

Re: less cpu hungry power approximation?

Re: less cpu hungry power approximation?

Re: less cpu hungry power approximation?

Re: less cpu hungry power approximation?

Re: less cpu hungry power approximation?