# SPTK: Convolution and the Convolution Theorem

Convolution is an essential element in everyone’s signal-processing toolkit. We’ll look at it in detail in this post.

This installment of the Signal Processing Toolkit series of CSP Blog posts deals with the ubiquitous signal-processing operation known as convolution. We originally came across it in the context of linear time-invariant systems. In this post, we focus on the mechanics of computing convolutions and discuss their utility in signal processing and CSP.

[Jump straight to ‘Significance of Convolution in CSP’ below.]

### Review of Convolution Mathematics

The convolution integral, which is used for continuous-time signals, is defined as

$\displaystyle y(t) = \int_{-\infty}^\infty h(u) x(t-u) \, du \hfill (1)$

$\displaystyle = \int_{-\infty}^\infty h(t-u) x(u) \, du, \hfill (2)$

while the convolution sum is used for discrete-time signals

$\displaystyle y(k) = \sum_{j=-\infty}^\infty h(j) x(k - j) \hfill (3)$

$\displaystyle = \sum_{j=-\infty}^\infty h(k-j) x(j). \hfill (4)$

Convolution is intimately involved in the study and use of linear time-invariant systems, or filters, examples of which include lowpass, bandpass, bandstop, and highpass filters. A linear time-invariant continuous-time system is characterized by its impulse-response function, typically denoted by $h(t)$, which is equal to the output of the system when the input is an impulse function (Dirac’s delta function) applied at time $t=0$. For linear shift-invariant discrete-time systems, the impulse-response function is the output of the system when the input is a discrete-time impulse (Kronecker’s delta function). The impulse-response function is typically denoted by the symbol $h$ in either case. For continuous-time systems with arbitrary input $x(t)$, the output $y(t)$ is given by (1) or, equivalently, (2). For discrete-time linear shift-invariant systems with arbitrary input $x(k)$, the output $y(k)$ is given by (3) or, equivalently, (4).

Convolution is typically difficult for new signal-processing students to comprehend and to compute with pencil-and-paper. And it can be difficult for everybody if $h(\cdot)$ and/or $x(\cdot)$ are complicated functions. So in this post we’ll go over a couple examples, illustrating how to compute convolutions on paper as well as how to compute them using MATLAB. At the end of the post, we’ll sketch how convolution is used in CSP.

### The Convolution Theorem

An important aid in computing convolutions as well as in various kinds of analysis involving linear systems is the convolution theorem. This theorem says that the Fourier transform of a convolution (say, the Fourier transform of $y(t)$ in (1)) is equal to the product of Fourier transforms for the signals undergoing the convolution. In mathematical terms, using our CSP-Blog notation,

$\displaystyle y(t) = \int_{-\infty}^\infty x(u) h(t-u) \, du \hfill (5)$

$\displaystyle = x(t) \otimes h(t) \hfill (6)$

implies that

$\displaystyle {\cal{F}} \left[ y(t) \right] = Y(f) = {\cal{F}}\left[x(t)\right] {\cal{F}} \left[ h(t) \right] = X(f) H(f). \hfill (7)$

Similarly, if we have a convolution in the frequency domain, the inverse transform of that convolution is the product of the inverse transforms of the frequency functions undergoing convolution,

$\displaystyle C(f) = A(f) \otimes B(f) \Rightarrow c(t) = a(t)b(t) \hfill (8)$

where

$\displaystyle A(f) = {\cal{F}} [a(t)] \hfill (9)$

and

$\displaystyle B(f) = {\cal{F}} [b(t)]. \hfill (10)$

These results are summed up by saying that “convolution in one domain is multiplication in the other.” This is useful because multiplication is an easy operation, and we already have good, fast, validated code for forward and inverse Fourier transforms. So we can sometimes avoid a difficult direct convolution by transforming, multiplying, and inverse transforming.

### Convolution Example “By Hand”

The canonical example of convolution is convolving a rectangle with itself, so we’ll be looking at that example in detail, through a variety of lenses. In this first look, we’ll simply go through the calculations required to fully evaluate the convolution integral.

We want to convolve a rectangle $s(t)$ with itself. A rectangle with unit height and unit width is usually denoted by the $\mbox{\rm rect}(t)$ function,

$\displaystyle \mbox{\rm rect}(t) = \left\{ \begin{array}{ll} 1, & |t| \leq 1/2 \\ 0, & \mbox{\rm otherwise.} \end{array} \right. \hfill (11)$

There are alternative definitions for $\mbox{\rm rect}$ that deal in different ways with the value of the function at $t = \pm 1/2$. These won’t matter to us here at the CSP Blog because we are dealing with simple integrable functions, and the convolution result is usually continuous. But if you see some kind of edge problem in your convolution work, let everybody know in the Comments below.

A rectangle with height $A$, width $T$, and center at $t=0$ is then easily expressed in terms of the $\mbox{\rm rect}$ function

$\displaystyle s(t) = A\mbox{\rm rect} (t/T), \hfill (12)$

as shown in Figure 1. So our present question is: What is $\displaystyle s(t) \otimes s(t)$? The convolution integral in this case is

$\displaystyle y(t) = \int_{-\infty}^\infty s(t-u) s(u) \, du. \hfill (13)$

From (13), we need to form the integrand product $s(t-u)s(u)$ for each $t$. This means we need to understand the function $s(t-u)$. The function $s(-u)$ is a time-reversed version of $s(u)$, and since $s(u)$ is symmetric about $u=0$, $s(-u) = s(u)$ in the case of our centered rectangle function.

The presence of the $t$ variable in $s(t-u)$ then shifts this rectangle by $t$ units of time. Let’s say $t$ is large in magnitude and negative. Then the shifted rectangle $s(t-u)$ will have it center at $t-u = 0 \Rightarrow u = t$, which is a large and negative number. In other words, when $t$ is large and negative, the rectangle represented by $s(t-u)$ is shifted far to the left on the $u$ axis. Similarly when $t$ is large and positive, the rectangle $s(-u)$ gets shifted to the right.

The way we proceed is to start evaluating the convolution integral for a large negative $t$, then increase $t$ until it becomes large and positive. This can be visualized as first flipping the signal (which accounts for the negative sign in $s(t-u)$), shifting it far to the left (start with large negative $t$), then sliding the flipped shifted signal to the right, computing the value of $s(u)s(t-u)$ for each $t$, and using that to evaluate the convolution integral for that value of $t$. This is illustrated with a MATLAB-based video below. But for now, we continue with the pencil-and-paper analysis.

For $t \leq -T$, the flipped and shifted rectangle $s(t-u)$ does not overlap with the signal $s(u)$, so that the product $s(u)s(t-u)$ is zero, and therefore the convolution integral is zero. Or,

$\displaystyle y(t) = 0, \ \ \ t \leq T. \hfill (14)$

Let’s keep sliding $s(t-u)$ to the right by increasing $t$. When $t=-T$, the right edge of $s(t-u)$ is coincident with the left edge of $s(u)$. As $t$ increases past $-T$, the two rectangles overlap. When we multiply $s(u)$ by $s(t-u)$ the resulting function of $u$ is non-zero only for $u$ in the overlap region.

The value of the convolution integral is therefore the area under the curve for values of $u$ that lie within the overlap region. When $t=0$, the overlap is complete. For this case of a simple rectangular overlap region, we can do this integral in our heads, but let’s write it out for the sake of really focusing on the details.

For $-T \leq t \leq 0$, then, we have

$\displaystyle y(t) = \int_{-T/2}^{t+T/2} (A) (A) \, du \hfill (15)$

$\displaystyle = \left. A^2 u \right|_{u=-T/2}^{t+T/2} \hfill (16)$

$\displaystyle = A^2 (t + T/2 - (-T/2)) = A^2 (t + T), -T \leq t \leq 0. \hfill (17)$

To check this result, note that for $t = -T$, $A^2(t+T) = 0$, as it should because that is the case of a single point of overlap. For $t = 0$, we have $A^2 T$, which is simply the area of the rectangle $s^2(u)$, which checks because at $t=0$, the two rectangles are coincident.

As $t$ increases past zero, the left edge of $s(t-u)$ slides into the interior of $s(u)$, and the right edge slides past $u=T/2$, so the overlap between the two rectangles begins to decrease. When $t=T$, the overlap region becomes a single point, and the convolution integral is again zero,

$\displaystyle y(t) = \int_{t-T/2}^{T/2} (A)(A) \, du = A^2 (T - t), 0\leq t \leq T. \hfill (18)$

Finally, when $t > T$, the two rectangles do not overlap, and the convolution is zero for all these values of $t$. Summing up the different results corresponding to different intervals for $t$, we have

$\displaystyle y(t) = \left\{ \begin{array}{ll} A^2(T - |t|), & |t| \leq T \\ 0, & \mbox{\rm otherwise.} \end{array} \right. \hfill (19)$

Graphing $y(t)$ shows that the function is a triangle with height $A^2 T$ and width $2T$, as shown in Figure 8.

You can likely see that if we were to convolve two rectangles with the same widths $T$ but different heights $A_1$ and $A_2$, the result would be a triangle with width $2T$ and height $A_1A_2T$.

### Convolution Examples “By MATLAB”

Some convolutions can be performed by straightforward application of the convolution integral (or sum in discrete time), as we’ve just seen. For many, though, that method is either tedious or impractical, and so we resort to numerical evaluation using our own code or using the convolution functions provided by a sophisticated software package like MATLAB or a programming language like python. This software can be used to do the time-domain convolution directly and it can be used to leverage the convolution theorem to perform the convolution.

Here we’ll focus on MATLAB, as we usually do at the CSP Blog. Much can be learned by writing your own generic convolver, so I do recommend you do that. But for the present post, we’ll explore how to use standard MATLAB commands to perform convolutions. Along the way, the examples I provide should also develop your intuition about what convolution ‘does’ or ‘means.’

The basic tools for convolution in MATLAB are conv.m and filter.m. To use the convolution theorem, we need the Fourier transform, which is fft.m, and the inverse Fourier transform, which is ifft.m, as well as simple pointwise vector (signal) multiplication. The following examples use conv.m when directly performing convolution, but they could easily be converted to use filter.m.

The MATLAB script used to create the following examples can be downloaded here.

#### Convolution of a Rectangle with Itself

Let’s first revisit the “by hand” example above, in which we convolved a rectangle with itself. Here we generate a rectangle as a vector in MATLAB, then convolve it with itself using conv.m. The rectangle has width $20$ samples and is shown in Figure 9. The result of using conv.m on that rectangle is also shown in Figure 9. From our “by hand” analysis above, we expect a rectangle with width $2T = 40$ samples and height $A^2 T = T = 20$, which expectations are confirmed.

To make the plots, you need the vector that is the rectangle, and you need the vector that is the convolution of the rectangle with itself–the output of conv.m. But you also need to specify the x-axis values that correspond to the values in these two vectors. Here is the code I use (but do download the full .m file):

rect = [zeros(1, Tpad) ones(1, T) zeros(1, Tpad)];
tvec = [1:length(rect)] - length(rect)/2.0 - 0.5;
rect_conv_rect = conv(rect, rect);
tvec2 = [1:length(rect_conv_rect)] - length(rect_conv_rect)/2.0 - 0.5;

#### Convolution of a Rectangle with Itself Twice

Continuing on, let’s use conv.m to convolve a rectangle with itself, then convolve that result with the rectangle. If the rectangle is denoted by $s(t)$, we are attempting to find $s(t) \otimes s(t) \otimes s(t)$. The result is shown in Figure 10. Notice that the result of the convolution gets wider each time, so that $s(t) \otimes s(t)$ is twice as wide as $s(t)$, and $s(t) \otimes s(t) \otimes s(t)$ is wider than $s(t) \otimes s(t)$ by the width of $s(t)$. This is a general property of convolving with a ‘pulse-like function’ such as $s(t)$: the output of the convolution is smoother than the input. Whereas $s(t)$ contains two step-function discontinuities, and so is not particularly smooth, $s(t) \otimes s(t)$ is continuous, but has a discontinuous derivative (at $t=-20$, $t=0$, and $t = 20$). The function $s(t) \otimes s(t) \otimes s(t)$ is, by visual inspection, continuous, and has continuous first derivative. The convolution outputs are smoother (fewer discontinuities) than the convolution inputs. So sometimes convolution with a pulse-like function (e.g., a rectangle) is referred to as a smoothing operation.

#### Convolution of a Triangle with Itself

Now let’s tackle a more difficult convolution two ways. The first way is by using conv.m and the second way is by using the convolution theorem. The signal is the triangle in Figure 10, which results from convolving the rectangle $s(t)$ with itself. We wish to determine the convolution of the triangle with itself. We can easily apply conv.m to get the top plot in Figure 11.

rect = [zeros(1, Tpad) ones(1, T) zeros(1, Tpad)];
tvec = [1:length(rect)] - length(rect)/2.0 + 1;
triangle = conv(rect, rect, 'same');
tri_conv_tri = conv(triangle, triangle, 'same');

Note that this function appears similar to the bottom plot of Figure 10, which is the convolution of the rectangle with itself twice, but is a bit smoother still, since it is actually the convolution of the rectangle with itself three times, or the convolution of the triangle with itself once.

The convolution theorem says we can also obtain the convolution of the triangle with itself by Fourier transforming the triangle, multiplying that transform by itself, and then inverse transforming the result.


exp_vec = exp(sqrt(-1)*2*pi*fvec*(T/2 + Tpad - 1));
ft_of_tri = ft_of_tri .* exp_vec;

ft_of_tri_sq = ft_of_tri .* ft_of_tri;
inv_ft_of_tri_sq = fftshift(ifft(fftshift(ft_of_tri_sq)));
tvec3 = [1:length(inv_ft_of_tri_sq)] - 1;
tvec3 = tvec3 - length(tvec3)/2;

The Fourier transform of the triangle is shown in the middle plot of Figure 11, and the output of our application of the convolution theorem is the final plot. We see that the top and bottom plots of Figure 11 match, as they should.

#### Convolution of a Modulated Impulse Train with Various Pulse-Shaping Functions

In this sequence of examples, I hope to strengthen your intuition about the effects and uses of convolution in the RF communication context. We’ll focus on convolving a modulated impulse train with various pulse-like functions, which is a good model for how a baseband RF communication signal is actually generated in practice.

A modulated impulse train is a set of impulses (here Kronecker delta functions) separated by a fixed amount, say $T$, and each multiplied by a random number called a symbol. We’ll restrict our symbols to the binary set $\{\pm 1\}$. A typical binary modulated impulse train with impulse (symbol) separation $T = 30$ is shown in the top plot of Figure 12.

The middle plot of Figure 12 shows the modulated pulse train after convolving it with a rectangular pulse with width much less than $T$. Imagine flipping that rectangle, shifting it toward negative infinity, then sliding it along the time axis to the right. At each time instant (slide value), multiply the pulse and the impulse train together and find the sum of the resulting function values. You’ll see that what should happen is that the narrow rectangle is replicated at the location of each impulse, but it is multiplied by the random symbol value associated with the impulse. And that’s exactly what we see in the middle plot. The bottom plot in Figure 12 shows the same process, but now the rectangle is the same width as the separation between impulses, so there is no gap between adjacent replications of the pulse. So we can create a pulse-amplitude-modulated (PAM) signal with any pulse function we wish by starting with the modulated impulse train and then simply convolving it with the desired pulse.

If we multiply the signal in the bottom plot of Figure 12 by a complex-valued sine wave we obtain the textbook rectangular-pulse BPSK signal that is the signal used throughout the CSP Blog to illustrate and unify all discussed aspects of cyclostationarity.

In Figure 13 we perform a variation of the experiment of Figure 12, replacing the rectangular pulses with half-period sine functions. The middle plot shows the case of a relatively narrow (with respect to $T$) half-cosine pulse and the bottom plot shows the case where the width of the half period is equal to the separation between the impulses. The bottom plot is exactly what the minimum shift keyed (MSK) signal uses as its inphase component; it uses a delayed version of the signal in the bottom plot with independently chosen symbols for the quadrature component. The delay is half the symbol period, $T/2$, as in all conventional offset quadrature phase-shift keyed (OQPSK) signals, such as OQPSK, MSK, and GMSK.

Finally, in Figure 14 we repeat the impulse-train experiment with a highly practical pulse: the square-root raised-cosine (SRRC) pulse. Keeping with the method employed for Figures 11 and 12, we show two cases: a narrow-in-time version of the SRRC pulse (middle) and an SRRC pulse designed for the symbol rate implied by the separation of the impulses in the top plot. The bottom signal is a good model for a perfectly downconverted noise-free BPSK signal that uses SRRC pulses.

#### Convolution of a Modulated Rect-Pulse Train with a Multipath Channel

In this final “by MATLAB” convolution example, we look at the effect of a multipath channel on a simple binary rectangular-pulse signal. I choose a pulse width that is smaller than the symbol interval so we can more clearly see the effect of the channel’s impulse response on the signal. The top plot of Figure 15 shows the undisturbed binary signal. For this signal, the symbol rate is $1/T$ but the pulse width is $T/3 < T$. So the pulses stand out clearly.

The impulse response of the multipath channel is shown in the middle plot of Figure 15. Since it is complex-valued, I show the real and imaginary parts separately. It is a simple channel with non-zero magnitudes only for three distinct delays $\tau = 0, 4,$ and $6$ samples.

The signal obtained by convolving the binary signal in the top plot with the channel impulse response in the middle plot is shown in the bottom plot. Because the pulse width is relatively narrow, we can clearly see the effects of the channel on one symbol before the next one arrives.

### Convolution Example “By Video”

In this section we present several videos that show convolution-in-action. Each frame of a video shows the two functions undergoing integration in the convolution integral, their product, and the result of the integration itself. That is, we consider convolutions between $x(t)$ and $y(t)$ given by

$\displaystyle z(t) = \int_{-\infty}^\infty y(u)x(t-u)\, du, \hfill (20)$

and we plot $x(t-u)$ and $y(u)$, $y(u)x(t-u)$, and $z(t)$ on separate axes. Each video frame corresponds to a different value for $t$. That is, these videos animate the ‘flip-slide-integrate’ operations inherent in convolution.

The first video is shown in Video 1, and it corresponds to convolving a unit-height rectangle $x(t) = \mbox{\rm rect}(t/40)$ with width $40$ with another rectangle $y(t) = 2\mbox{\rm rect}(t/40)$, which has height $2$ and width $40$. From our previous work, we know the result should be a triangle with width $2*40 = 80$ and height $2*1*40 =80$.

In the second convolution video we convolve two triangles with width (base) of $40$ samples and two different heights. This is shown in Video 2.

Finally, in Video 3 we show the effect of a smoothing operation on a noisy sine wave. The noisy sine wave is convolved with a rectangle with unit area. The width of the rectangle is smaller than half a period of the sine wave. As can be seen by the convolution result $z(t)$, the effect of the convolution is to remove some of the noise experienced by the sine wave, more clearly revealing it in a plot. This convolution operation is also a low-pass filter because if you think of the rectangle as the impulse response of an LTI system, then the corresponding transfer function is a sinc function centered at the origin of the frequency axis. Only spectral components that lie within the main lobe of that sinc function are passed with significant amplitude by the linear system, thereby attenuating the noise components that lie outside of the main lobe. We talk more about low-pass, band-pass, and high-pass filters in the ideal filters Signal Processing ToolKit post.

### A Weird and Kind-of-Cool Convolution Theorem Result

Let’s reconsider the humble unit-height rectangle function with width $T$, which is denoted by $\displaystyle \mbox{\rm rect}(t/T)$. We’ve already seen that the Fourier transform of the rectangle is the sinc function:

$\displaystyle \mbox{\rm rect}(t/T) \Leftrightarrow T \mbox{\rm sinc} (fT), \hfill (20)$

as shown in Figure 16. Since the rectangle has unit height, when we multiply it by itself, it is unchanged,

$\displaystyle \mbox{\rm rect}^2(t/T) = \mbox{\rm rect}(t/T). \hfill (21)$

This immediately implies that the Fourier transform of the squared unit-height rectangle is also the sinc function,

$\displaystyle \mbox{\rm rect}^2(t/T) \Leftrightarrow T \mbox{\rm sinc}(fT). \hfill (22)$

But by the convolution theorem, the Fourier transform of the product of the two rectangles is the convolution of their individual Fourier transforms,

$\displaystyle {\cal{F}} \left[ \mbox{\rm rect}(t/T) \mbox{\rm rect}(t/T) \right] = \left(T\mbox{\rm sinc}(fT) \right) \otimes \left(T\mbox{\rm sinc}(fT)\right). \hfill (23)$

This implies that the sinc function is invariant under convolution–a sinc convolved with a sinc is a sinc,

$\displaystyle T \mbox{\rm sinc}(fT) = \left(T\mbox{\rm sinc}(fT) \right) \otimes \left(T\mbox{\rm sinc}(fT)\right). \hfill (24)$

Since $\displaystyle \mbox{\rm rect}^n(t/T) = \mbox{\rm rect}(t/T)$, the convolution of $n$ sincs is also just the original sinc. In Figure 17 we plot the Fourier transform of the $\mbox{\rm rect}(t/T)$ function and also the convolution of that result with itself, and they are indeed identical.

The result (24) also implies that the sinc is equal to itself convolved with itself $N$ times,

$\displaystyle T \mbox{\rm sinc}(fT) = \left(T\mbox{\rm sinc}(fT)\right) \otimes \ldots \left(T \mbox{\rm sinc}(ft) \right) \otimes\ldots \left(T\mbox{\rm sinc}(fT) \right). \hfill (25)$

Here is a movie that shows the sinc function being convolved with itself, resulting in itself:

I can think of a few more signals that possess this property. If you can too, consider leaving a description in the Comments below.

### Convolution in Probability Theory

Convolution comes up in many contexts, but an important one for us, as statistical signal processors, is in probability theory. We look at random variables and random processes in detail in other Signal Processing ToolKit posts, but for now I note that if we add together two independent random variables, the probability density function of the sum is the convolution of the individual probability density functions.

That is, let the random variable $X$ have probability density function $P_X(x)$ and the random variable $Y$ have probability density function $P_Y(y)$. If the two random variables are statistically independent (sometimes simply referred to as independent), then their joint probability density function is the product of their individual (‘marginal‘) density functions,

$\displaystyle P_{XY}(x, y) = P_X(x) P_Y(y). \hfill (26)$

If we define the random variable $Z$ by

$\displaystyle Z = X + Y, \hfill (27)$

then the density for $Z$ is given by (The Literature [R149])

$\displaystyle P_Z(z) = \int_{-\infty}^\infty P_X(x) P_Y(z-x) \, dx. \hfill (28)$

Convolution also figures prominently in modern machine learning in the convolutional neural networks used to perform various recognition and classification tasks. The ‘convolutional layers’ in the multi-layer deep-learning neural networks perform two-dimensional convolution for two-dimensional inputs such as images, a subject we did not cover in this post, but which is a straightforward generalization of the one-dimensional convolution we typically focus on in signal processing. (Image processing, which has greatly benefited from machine learning, employs two-dimensional convolution for various image-processing tasks because the basic data structure of interest is a two-dimensional signal.)

### Significance of Convolution and the Convolution Theorem in CSP

Convolution is ubiquitous. The most obvious example for us is the frequency-smoothing method (FSM) of spectral correlation function (SCF) estimation. The FSM estimates the SCF by first calculating the cyclic periodogram for $T$ seconds of input $x(t)$,

$\displaystyle I_{x_T}^\alpha(t, f) = \frac{1}{T} X_T(t, f+\alpha/2) X_T^*(t, f-\alpha/2), \hfill (29)$

where $X_T(t, f)$ is the sliding Fourier transform of $x(t)$ (see the FSM post for details).

After calculating the cyclic periodogram, the FSM then smooths it by convolving it with a pulse-like function (such as a rectangle), which reduces the variability of the estimate and pushes it closer to the underlying theoretical SCF for the signal(s) in the data $x(t)$,

$\displaystyle \hat{S}_{x_T}^\alpha (t, f) = I_{x_T}^\alpha(t, f) \otimes g(f). \hfill (30)$

In addition, any time we talk about narrowband spectral components of a signal, we are actually talking about the application of a linear time-invariant system (‘filter’) to some data, and we have established that the output of an LTI system is the convolution of the impulse-response of the system with the input signal. So the narrowband spectral components that we extract and correlate to find spectral correlation are fundamentally connected to convolution.