pwtools.signal.smooth

pwtools.signal.smooth(data, kern, axis=0, edge='m', norm=True)[source]

Smooth N-dim data by convolution with a kernel kern.

Uses scipy.signal.fftconvolve().

Note that due to edge effect handling (padding) and kernal normalization, the convolution identity convolve(data,kern) == convolve(kern,data) doesn’t apply here. We always return an array of data.shape.

Parameters:
  • data (nd array) – The data to smooth. Example: 1d (N,) or (N,K,3) for trajectory

  • kern (nd array) – Convolution kernel. Example: 1d (M,) or (M,1,1) for trajectory along axis=0 (data length N)

  • axis (int) – Axis along which to do the smoothing. That is actually not needed for the convolution fftconvolve(data, kern) but is used for padding the data along axis to handle edge effects before convolution.

  • edge (str) –

    Method for edge effect handling.
    ’m’ : pad with mirror signal
    ’c’ : pad with constant values (i.e. data[0] and
    data[-1] in the 1d case)

  • norm (bool) – Normalize kernel. Default is True. This assures that the smoothed signal lies within the data. Note that this is not True for kernels with very big spread (i.e. hann(N*10) or gaussian(N/2, std=N*10). Then the kernel is effectively a constant.

Returns:

ret – Convolved signal.

Return type:

data.shape

Examples

>>> from pwtools.signal import welch
>>> from numpy.random import rand
>>> x = linspace(0,2*pi,500); a=cos(x)+rand(500)
>>> plot(a, color='0.7')
>>> k=scipy.signal.hann(21)
>>> plot(signal.smooth(a,k), 'r', label='hann')
>>> k=scipy.signal.gaussian(21, 3)
>>> plot(signal.smooth(a,k), 'g', label='gauss')
>>> k=welch(21)
>>> plot(signal.smooth(a,k), 'y', label='welch')
>>> legend()
>>> # odd kernel [0,1,0] reproduces data exactly, i.e. convolution with
>>> # delta peak
>>> figure(); title('smooth with delta [0,1,0]')
>>> x=linspace(0,2*pi,15); k=scipy.signal.hann(3)
>>> plot(cos(x))
>>> plot(signal.smooth(cos(x),k), 'r')
>>> legend()
>>> # edge effects with normal convolution
>>> figure(); title('edge effects')
>>> x=rand(20)+10; k=scipy.signal.hann(11);
>>> plot(x); plot(signal.smooth(x,k),label="smooth");
>>> plot(scipy.signal.convolve(x,k/k.sum(),'same'), label='convolve')
>>> legend()
>>> # edge effect methods
>>> figure(); title('edge effect methods')
>>> x=rand(20)+10; k=scipy.signal.hann(20);
>>> plot(x); plot(signal.smooth(x,k,edge='m'),label="edge='m'");
>>> plot(signal.smooth(x,k,edge='c'),label="edge='c'");
>>> legend()
>>> # smooth a trajectory of atomic coordinates
>>> figure(); title('trajectory')
>>> x = linspace(0,2*pi,500)
>>> a = rand(500,2,3) # (nstep, natoms, 3)
>>> a[:,0,:] += cos(x)[:,None]
>>> a[:,1,:] += sin(x)[:,None]
>>> k=scipy.signal.hann(21)[:,None,None]
>>> y = signal.smooth(a,k)
>>> plot(a[:,0,0], color='0.7'); plot(y[:,0,0],'b',
...                                   label='atom1 x')
>>> plot(a[:,1,0], color='0.7'); plot(y[:,1,0],'r',
...                                   label='atom2 x')
>>> legend()

References

[1] http://wiki.scipy.org/Cookbook/SignalSmooth

See also

welch(), lorentz()

Notes

Kernels:

Even kernels result in shifted signals, odd kernels are better. However, for N >> M, it doesn’t make a difference really.

Usual kernels (window functions) are created by e.g. scipy.signal.hann(M). For kern=scipy.signal.gaussian(M, std), two values are needed, namely M and std, where M determines the number of points calculated for the convolution kernel, as in the other cases. But what is actually important is std, which determines the “used width” of the gaussian. Say we use N=100 and M=50. That would be a massively wide window and we would smooth away all details. OTOH, using gaussian(50,3) would generate a kernel with the same number M of data points, but the gauss peak which is effectively used for convolution is much smaller. For gaussian(), M should be bigger then std. The convolved signal will converge with increasing M. Good values are M=6*std and bigger. For lorentz(), much wider kernels are needed such as M=100*std b/c of the long tails of the Lorentz function. Testing is mandatory!

Edge effects:

We use padding of the signal with M=len(kern) values at both ends such that the convolution with kern doesn’t zero the data at the signal edges. We have two methods. edge=’m’: padd with the signal mirrored at 0 and -1 or edge=’c’: use the constant values data[0] and data[-1]. Many more of these variants may be thought of. The choice of how to extend the data essentially involves an assumption about how the signal would continue, which is signal-dependent. In practice, we usually have M << N (e.g. scipy.signal.hann(M)) or std << N (scipy.signal.gaussian(M, std). Then, both methods are identical in the middle and show only very small differences at the edges. Essentially, edge effect handling shall only ensure that the smoothed signal doesn’t go to zero and that must be independent of the method, which is the case.

Memory:

For big data, fftconvolve() can easily eat up all your memory, for example:

>>> # assume axis=0 is the axis along which to convolve
>>> arr = ones((1e5,200,3))
>>> kern = scipy.signal.hann(101)
>>> ret = scipy.signal.fftconvolve(arr, kern[:,None,None])

Then it is better to loop over some or all of the remaing dimensions:

>>> ret = np.empty_like(arr)
>>> for jj in range(arr.shape[1]):
>>>     ret[:,jj,:] = smooth(arr[:,jj,:], kern[:,None])

or:

>>> for jj in range(arr.shape[1]):
>>>     for kk in range(arr.shape[2]):
>>>         ret[:,jj,kk] = smooth(arr[:,jj,kk], kern)

The size of the chunk over which you explicitely loop depends on the data of course. We do exactly this in pwtools.crys.smooth().