DSP Robotics Support

Posted: **Mon Jun 17, 2013 11:00 pm**

It's just a delicate little baby at the moment, so treat it with care...

Stream FFT v001 Build 8.fsm: (158.47 KiB) Downloaded 2378 times

Posted: **Mon Jun 17, 2013 11:24 pm**

First what I of course did - was to change that value in order to see the crash. But - it did not crashed. It looks that this value is changable when the sound is offline.

Posted: **Tue Jun 18, 2013 7:17 pm**

Yes, that's right - it's only when the assembly blocks are running that there can be problems.
Because it is such an intensive algorithm, I left out the parts that check whether memory addresses are valid inside the code arrays - if they go out of range, there's an instant crash. Saves a fair chunk of CPU, but means that there are many hard-wired values that need to be right.

Ultimately, I'll probably just release a set of modules, each preset to a different buffer length - there's a big trade off with this job between versatility and CPU load due to the limitations of the FS assembly instruction set. There's still a lot more code to add yet too - FFT -> iFFT on its own is pretty dull; messing with the data in between is where things will get much more interesting.

Posted: **Tue Jun 18, 2013 9:52 pm**

//edit:

redirecting this to new topic.
viewtopic.php?f=4&t=1513#p6442

Posted: **Tue Jun 18, 2013 9:55 pm**

The problem with realtime FFT and iFFT is, you have to overlap the buffers and apply the correct window to the FFT. And finally you have to find the right spot where you start the overlapping FFT. That's actually a very hard task.

Have you done the conversion to SSE from x86 code? It's very hard to read, but it looks like you don't really use the benefits of SSE.

When I understand it correctly, you buffer the 1024 samples linear into "inArray" and at the end of the buffer you calculate the FFT. Wouldn't it be faster to buffer the incoming samples at the bitshifted positions in an Array? Then you could leave this step (Loop1) out of the main code.

Posted: **Tue Jun 18, 2013 10:17 pm**

I think I've found a bug in the FFT code. On line 125:

Code: Select all

movaps xmm1,xmm3;

xmm3 wasn't set before

Posted: **Tue Jun 18, 2013 10:29 pm**

Another related bug:

Code: Select all

 mov eax,1;
cmp eax,0;
jnz JgteMtest;

the Jump is always executed because 1-0 is not zero

Posted: **Tue Jun 18, 2013 10:39 pm**

@Trog:
your implementation seems to use this algorithm.

Code: Select all

class Array
   # DFT and inverse.
   # 
   # Algorithm from 
   # 'Meyberg, Vachenauer: Hoehere Mathematik II, Springer Berlin, 1991, page 332'
   #
   # See http://blog.mro.name/2011/04/simple-ruby-fast-fourier-transform/ 
   #
   def fft doinverse = true
      src = self
      # raise ArgumentError.new "Expected array input but was '#{src.class}'" unless src.kind_of? Array
      n = src.length
      nlog2 = Math.log( n ) / Math.log( 2 )
      raise ArgumentError.new "Input array size must be a power of two but was '#{n}'" unless nlog2.floor - nlog2 == 0.0
      n2 = n / 2
      phi = Math::PI / n2
      if doinverse
         phi = -phi
      else
         src.collect!{|c| c /= n.to_f}
      end
 
      # build a sine/cosine table
      wt = Array.new n2
      wt.each_index { |i| wt[i] = Complex(Math.cos(i * phi), Math.sin(i * phi)) }
 
      # bit reordering
      n1 = n - 1
      j = 0
      1.upto(n1) do |i|
         m = n2
         while j >= m
            j -= m
            m /= 2
         end
         j += m
         src[i],src[j] = src[j],src[i] if j > i
      end
 
      # 1d(N) Ueberlagerungen
      mm = 1
      begin
         m = mm
         mm *= 2
         0.upto(m - 1) do |k|
            w = wt[ k * n2 ]
            k.step(n1, mm) do |i|
               j = i + m
               src[j] = src[i] - (temp = w * src[j])
               src[i] += temp
            end
         end
         n2 /= 2
      end while mm != n
      src
   end
end

//usage
real = []
imag = []

@in.fft(false).each {|i| real.push i.real; imag.push i.imag}
output 0, real; output 1, imag

Posted: **Wed Jun 19, 2013 9:01 am**

Hi Guys

MyCo wrote: you don't really use the benefits of SSE.

Saving that for later - I'll look into that once the other coding is done, but I thought I'd keep the other SSE channels free for now to give the option of parallel (e.g. stereo) operation.

Bugs! - nope!
There are a few unconditional jumps (why no 'jmp' opcode!) - they are just used to get the execution order right to switch between testing conditions at loop start/end; so the xmm3 value is actually set later in the code.

MyCo wrote:buffer the incoming samples at the bitshifted positions in an Array? Then you could leave this step (Loop1) out of the main code.

Yes, I have tried several methods of doing that - but oddly enough, they've all come out with higher CPU load so far, despite, as you say, less read/writes and way fewer opcodes. Seems that streamlining the memory pipeline is the crucial thing here - a handful less cache misses per iteration makes a huge difference.
I'll keep working on it though - that transfer is where scaling/windowing would be applied, so it needs tightening up!

digitalwhitebyte wrote:your implementation seems to use this algorithm.

Yup, that's the one!

Posted: **Wed Jun 19, 2013 11:28 am**

I think it is a good read for understanding the world of the FFT
The FFT Demystified

DSP Robotics Support

Stream FFT and iFFT

Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT