pwtools.signal.smooth

pwtools.signal.smooth(data, kern, axis=0, edge='m', norm=True)[source]

Smooth N-dim data by convolution with a kernel kern.

Uses scipy.signal.fftconvolve().

Note: This function is actually a special case of scipy.ndimage.convolve(), so you may also use that. See the Notes section below for details.

Note that due to edge effect handling (padding) and kernel normalization, the convolution identity convolve(data,kern) == convolve(kern,data) doesn’t apply here. We always return an array of data.shape.

Parameters:
  • data (nd array) – The data to smooth. Example: 1d (N,) or (N,K,3) for trajectory

  • kern (nd array) – Convolution kernel. Example: 1d (M,) or (M,1,1) for trajectory along axis=0 (data length N)

  • axis (int) – Axis along which to do the smoothing. That is actually not needed for the convolution fftconvolve(data, kern) but is used for padding the data along axis to handle edge effects before convolution.

  • edge (str) –

    Method for edge effect handling.
    ’m’ : pad with mirror signal
    ’c’ : pad with constant values (i.e. data[0] and
    data[-1] in the 1d case)

  • norm (bool) – Normalize kernel. Default is True. This assures that the smoothed signal lies within the data. Note that this is not True for kernels with very big spread (i.e. hann(N*10) or gaussian(N/2, std=N*10). Then the kernel is effectively a constant.

Returns:

ret – Convolved signal.

Return type:

data.shape

Examples

>>> from pwtools.signal import welch, smooth
>>> from numpy.random import rand
>>> from scipy.signal.windows import hann, gaussian
>>> from scipy.signal import convolve
>>> x = linspace(0,2*pi,500); a=cos(x)+rand(500)
>>> plot(a, color='0.7')
>>> k=hann(21)
>>> plot(smooth(a,k), 'r', label='hann')
>>> # To match hann and gaussian, use sigma 1/5 hann window length
>>> k=gaussian(51, 21/5)
>>> plot(smooth(a,k), 'g', label='gauss')
>>> k=welch(21)
>>> plot(smooth(a,k), 'y', label='welch')
>>> legend()
>>> # odd kernel [0,1,0] reproduces data exactly, i.e. convolution with
>>> # delta peak
>>> figure(); title('smooth with delta [0,1,0]')
>>> x=linspace(0,2*pi,15); k=scipy.signal.hann(3)
>>> plot(cos(x))
>>> plot(smooth(cos(x),k), 'r')
>>> legend()
>>> # edge effects with normal convolution
>>> figure(); title('edge effects')
>>> x=rand(20)+10; k=hann(11);
>>> plot(x); plot(smooth(x,k),label="smooth");
>>> plot(convolve(x,k/k.sum(),'same'), label='convolve')
>>> legend()
>>> # edge effect methods
>>> figure(); title('edge effect methods')
>>> x=rand(20)+10; k=hann(20);
>>> plot(x); plot(smooth(x,k,edge='m'),label="edge='m'");
>>> plot(smooth(x,k,edge='c'),label="edge='c'");
>>> legend()
>>> # smooth a trajectory of atomic coordinates
>>> figure(); title('trajectory')
>>> x = linspace(0,2*pi,500)
>>> a = rand(500,2,3) # (nstep, natoms, 3)
>>> a[:,0,:] += cos(x)[:,None]
>>> a[:,1,:] += sin(x)[:,None]
>>> k=hann(21)[:,None,None]
>>> y = smooth(a,k)
>>> plot(a[:,0,0], color='0.7'); plot(y[:,0,0],'b',
...                                   label='atom1 x')
>>> plot(a[:,1,0], color='0.7'); plot(y[:,1,0],'r',
...                                   label='atom2 x')
>>> legend()

References

[1] http://wiki.scipy.org/Cookbook/SignalSmooth [2] https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.convolve.html#scipy.ndimage.convolve

See also

welch(), lorentz()

Notes

scipy.ndimage tools:

We only apply padding and convolution along axis (since the main ndarray use case is pwtools.crys.smooth()), while ndimage.convolve supports this for all dimensions. Also ndimage.convolve has more edge effect handling methods available. Might be slower than scipy.signal.fftconvolve(), though.

>>> a=rand(100,200,300)
>>> k=rand(3,1,1)
>>> # Defaults
>>> np.allclose(pwtools.signal.smooth(a, k, edge="m", norm=True, axis=0),
...             scipy.ndimage.convolve(a, k/k.sum(), mode="mirror"))
>>> # Without kernel normalization
>>> np.allclose(pwtools.signal.smooth(a, k, edge="m", norm=False, axis=0),
...             scipy.ndimage.convolve(a, k, mode="mirror"))
>>> # Use another axis and another edge method
>>> k=rand(1,3,1)
>>> np.allclose(pwtools.signal.smooth(a, k, edge="c", norm=True, axis=1),
...             scipy.ndimage.convolve(a, k/k.sum(), mode="nearest"))

In addition, a Gaussian kernel can also be applied using scipy.ndimage.gaussian_filter(). This implements the same edge effect handling as ndimage.convolve.

Kernels:

Even kernels result in shifted signals, odd kernels are better. However, for N >> M, it doesn’t make a difference really.

Usual kernels (window functions) are created by e.g. scipy.signal.windows.hann(M). For kern=scipy.signal.windows.gaussian(M, std), two values are needed, namely M and std, where M determines the number of points calculated for the convolution kernel, as in the other cases. But what is actually important is std, which determines the “used width” of the gaussian. For gaussian(), M should be bigger then std. The convolved signal will converge with increasing M. Good values are M=6*std and bigger. To match the results when using a Hann window of size M_hann, use approximately std=1/5*M_hann, as in hann(21) and gaussian(51, 21/5) with a large enough M, here 51. For lorentz(), much wider kernels are needed such as M=100*std b/c of the long tails of the Lorentz function. Testing is mandatory!

Edge effects:

We use padding of the signal with M=len(kern) values at both ends such that the convolution with kern doesn’t zero the data at the signal edges. We have two methods. edge=’m’: padd with the signal mirrored at 0 and -1 or edge=’c’: use the constant values data[0] and data[-1]. Many more of these variants may be thought of. The choice of how to extend the data essentially involves an assumption about how the signal would continue, which is signal-dependent. In practice, we usually have M << N (e.g. scipy.signal.hann(M)) or std << N (scipy.signal.gaussian(M, std). Then, both methods are identical in the middle and show only very small differences at the edges. Essentially, edge effect handling shall only ensure that the smoothed signal doesn’t go to zero and that must be independent of the method, which is the case.

Memory:

For big data, fftconvolve() can easily eat up all your memory, for example:

>>> # assume axis=0 is the axis along which to convolve
>>> arr = ones((1e5,200,3))
>>> kern = scipy.signal.hann(101)
>>> ret = scipy.signal.fftconvolve(arr, kern[:,None,None])

Then it is better to loop over some or all of the remaining dimensions:

>>> ret = np.empty_like(arr)
>>> for jj in range(arr.shape[1]):
>>>     ret[:,jj,:] = smooth(arr[:,jj,:], kern[:,None])

or:

>>> for jj in range(arr.shape[1]):
>>>     for kk in range(arr.shape[2]):
>>>         ret[:,jj,kk] = smooth(arr[:,jj,kk], kern)

The size of the chunk over which you explicitly loop depends on the data of course. We do exactly this in pwtools.crys.smooth().