SPTK: Sampling and The Sampling Theorem – Cyclostationary Signal Processing

Previous SPTK Post: Random Processes Next SPTK Post: Echo Detection

In this Signal Processing ToolKit post we take a close look at the basic sampling theorem used daily by signal-processing engineers. Application of the sampling theorem is a way to choose a sampling rate for converting an analog continuous-time signal to a digital discrete-time signal. The former is ubiquitous in the physical world–for example all the radio-frequency signals whizzing around in the air and through your body right now. The latter is ubiquitous in the computing-device world–for example all those digital-audio files on your ~~Discman~~ ~~Itunes~~ ~~Ipod~~ ~~DVD~~ ~~Smartphone~~ ~~Cloud~~ ~~Neuralink~~ Singularity.

So how are those physical real-world analog signals converted to convenient lists of finite-precision numbers that we can apply arithmetic to? For that’s all [digital or cyclostationary] signal processing is at bottom: arithmetic. You might know the basic rule-of-thumb for choosing a sampling rate: Make sure it is at least twice as big as the largest frequency component in the analog signal undergoing the sampling. But why, exactly, and what does ‘largest frequency component’ mean?

Jump straight to ‘Significance of the Sampling Theorem in CSP’ below.

Preliminaries

In the mathematical development, I’m mostly going to use real-valued signals so that the hand-drawn figures and MATLAB-based plots correspond closely to the equations. But the theorem applies to complex-valued signals too, and of course we’ll eventually apply it to things like the CSP-Blog mascot signal.

Let’s focus on a real-valued signal $x(t)$ , and let’s also assume that this signal has a Fourier transform ${\cal{F}}[x(t)] = X(f)$ or, in our previous notation, $x(t) \Leftrightarrow X(f)$ . Let’s also assume that the Fourier transform $X(f)$ has finite support of $2W$ , centered at the frequency origin, as shown in Figure 1. This just means that the transform is zero outside the interval of $[-W, W]$ . We will see that this condition makes the theorem work, but in practice we can relax it so that we can effectively sample signals with Fourier transforms that have infinite support, provided that the rate at which the transform magnitude gets small with increasing $|f|$ is large.

Figure 1. The Fourier transform of the signal to be sampled.

We want to obtain uniformly spaced samples of the signal $x(t)$ . This means we want to replace the analog signal $x(t)$ with an infinite sequence of numbers $x_k, k = 0, \pm 1, \pm 2, \ldots$ , for which $x_j$ corresponds in some sense to the value of the signal at time $t = jT_s$ , that is, to $x(jT_s)$ . Here $T_s$ is called the sampling increment (units of seconds) and its reciprocal is called the sampling rate (units of Hertz, or $\mbox{\rm s}^{-1}$ ).

The start of our basic sampling model is illustrated in Figure 2. The signal to be sampled, $x(t)$ , is multiplied by the sampling waveform, $s(t)$ , and the areas of each of the modulated rectangles in the result are the samples. For reasonably well-behaved $x(t)$ and small $\Delta t$ , the areas of the rectangles will be closely approximated by $x(kT_s)$ .

Figure 2. A basic approach to sampling. We end up saving the area of each distorted rectangle and consider that sequence the sampled waveform. If $\Delta t$ is small enough relative to the temporal fluctuations in $x(t)$ , the areas will be closely approximated by the values at the center of each rectangle: $\{x(kT_s)\}$ . [And what is a measure of the rapidity of a signal’s ‘temporal fluctuations?’]

Figure 2. A basic approach to sampling. We end up saving the area of each distorted rectangle and consider that sequence the sampled waveform. If $\Delta t$ is small enough relative to the temporal fluctuations in $x(t)$ , the areas will be closely approximated by the values at the center of each rectangle: $\{x(kT_s)\}$ . [And what is a measure of the rapidity of a signal’s ‘temporal fluctuations?’]

First we will idealize the sampling model in Figure 2 so that the sampling rectangles become impulses (Dirac delta functions). Then we’ll derive the condition under which the ideal sampling results in a set of numbers that are sufficient to enable recreation of the entire original signal–the sampling theorem. Later we’ll look at non-ideal sampling through the same lens, and discuss some issues surrounding practical uses of sampling.

Ideal Sampling

To idealize the sampling, which creates a sampling scheme or definition that is not physically possible, but is mathematically elegant, we allow the rectangle widths in Figure 2 to approach zero, $\Delta t \rightarrow 0$ . This leads to an impulse train for $s(t)$ ,

$\displaystyle s(t) = \sum_{k=-\infty}^\infty \delta(t - k T_s). \hfill (1)$

The sampled signal is then the multiplication of $x(t)$ with $s(t)$ ,

$\displaystyle y(t) = x(t) \sum_{k=-\infty}^\infty \delta(t - kT_s). \hfill (2)$

The sampled signal $y(t)$ is equivalent to

$\displaystyle y(t) = \sum_{k=-\infty}^\infty x(kT_s) \delta(t - k T_s). \hfill (3)$

The signal $y(t)$ is not a representation of $x(t)$ –many of the values of $x(t)$ are unavailable. The question becomes: Can we use $y(t)$ to reconstruct all the values of $x(t)$ without error? If so, what is the condition on the sampling increment $T_s$ that enables such reconstruction? Equivalently, what is the condition on the sampling rate $f_s$ , relative to properties of the signal-to-be-sampled $x(t)$ ? And what is the mathematical operation that we can apply to the samples to obtain $x(t)$ for all time?

Let’s investigate $y(t)$ , which depends only on the countable set of numbers $\{x(kT_s)\}$ , to see if we can answer the question about $T_s$ .

First, what is the Fourier transform of $y(t)$ ?

$\displaystyle Y(f) = {\cal{F}} [y(t)] = {\cal{F}}\left[ x(t) \sum_{k=-\infty}^\infty \delta(t - kT_s) \right] \hfill (4)$

$\displaystyle = {\cal{F}} \left[ x(t) s(t) \right] \hfill (5)$

$\displaystyle \Longrightarrow Y(f) = X(f) \otimes S(f), \hfill (6)$

where the last equality follows from the convolution theorem. We now recall our work on the periodic impulse train,

$\displaystyle s(t) = \sum_{k=-\infty}^\infty \delta(t-kT_s), \hfill (7)$

which, like any periodic function, is representable by a Fourier series,

$\displaystyle s(t) = \sum_{k=-\infty}^\infty c_k e^{i 2 \pi k t /T_s}, \hfill (8)$

where the Fourier coefficients $c_k$ are given by

$\displaystyle c_k = \frac{1}{T_s} \int_{-T_s/2}^{T_s/2} \sum_{k=-\infty}^\infty \delta(t-kT_s) e^{-i 2 \pi k t / T_s} \, dt \hfill (9)$

$\displaystyle = \frac{1}{T_s} \int_{-T_s/2}^{T_s/2} \delta(t) e^{-i 2 \pi k t / T_s} \, dt \hfill (10)$

$\displaystyle = \frac{1}{T_s} \int_{-T_s/2}^{T_s/2} \delta(t) \, dt = 1/T_s. \hfill (11)$

This means we can express the periodic impulse train, which is our sampling function, as the following infinite sum of complex sine waves,

$\displaystyle s(t) = \frac{1}{T_s} \sum_{k=-\infty}^\infty e^{i 2 \pi k t / T_s} = \sum_{k=-\infty}^\infty \delta(t-kT_s). \hfill (12)$

We are now in a position to evaluate the Fourier transform $S(f)$ appearing in our expression for $Y(f)$ in (6),

$\displaystyle S(f) = {\cal{F}} [s(t)] = {\cal{F}} \left[ \frac{1}{T_s} \sum_{k=-\infty}^\infty e^{i 2 \pi k t/T_s} \right] \hfill (13)$

$\displaystyle = \frac{1}{T_s} \sum_{k=-\infty}^\infty {\cal{F}} \left[ e^{i 2 \pi k t /T_s} \right] \hfill (14)$

$\displaystyle = \frac{1}{T_s} \sum_{k=-\infty}^\infty \delta(f - k/T_s). \hfill (15)$

We can now evaluate the Fourier transform for the sampled signal $y(t)$ in (6), and here is where the good-sampling condition on $T_s$ will begin to become evident:

$\displaystyle Y(f) = X(f) \otimes S(f) \hfill (16)$

$\displaystyle = X(f) \otimes \sum_{k=-\infty}^\infty \frac{1}{T_s} \delta(f - k/T_s) \hfill (17)$

$\displaystyle = \frac{1}{T_s} \int_{-\infty}^\infty X(\nu) \sum_{k=-\infty}^\infty \delta(f -\nu - k/T_s) \, d\nu \hfill (18)$

$\displaystyle = \frac{1}{T_s} \sum_{-\infty}^\infty \int_{-\infty}^\infty X(\nu) \delta(f-\nu-k/T_s) \, d\nu \hfill (19)$

$\displaystyle = \frac{1}{T_s}\sum_{k=-\infty}^\infty X(f - k/T_s) \int_{-\infty}^\infty \delta(f-\nu -k/T_s) \, d\nu \hfill (20)$

$\displaystyle = \frac{1}{T_s} \sum_{k=-\infty}^\infty X(f - k/T_s) = \frac{1}{T_s} \sum_{k=-\infty}^\infty X(f - kf_s). \hfill (21)$

At this point, we can use our cartoonish Fourier transform drawing in Figure (1) to sketch $Y(f)$ in (21) for different choices of $f_s = 1/T_s$ relative to the parameter $W$ . Notice that $Y(f)$ is just the sum of shifted and scaled (by $1/T_s$ ) versions of $X(f)$ . So first, let’s pick the sampling rate $f_s$ to be much larger than $2W$ . This will ensure that each of the shifted transforms $X(f-kf_s)$ (since they are all just other names for the basic function $X(f)$ , let’s call them aliases) do not overlap.

Figure 3. The Fourier transform (21) of the sampled signal with transform shown in Figure 1 provided that the sampling rate $f_s = 1/T_s$ is greater than $2W$ .

On the other hand, if the sampling rate is smaller than $2W$ , then the various aliases in (21) will overlap and add together, as in Figure 4.

Figure 4. The Fourier transform (21) of the sampled signal with transform shown in Figure 1 provided that the sampling rate $f_s$ is smaller than $2W$ . Here the aliaes (different $X(f-kf_s)$ ) overlap with their neighbors and so their sum $Y(f)$ is made up of portions of $X(f)$ that are free from contributions from other aliases and portions that are contaminated by other aliases.

Figure 4. The Fourier transform (21) of the sampled signal with transform shown in Figure 1 provided that the sampling rate $f_s$ is smaller than $2W$ . Here the aliaes (different $X(f-kf_s)$ ) overlap with their neighbors and so their sum $Y(f)$ is made up of portions of $X(f)$ that are free from contributions from other aliases and portions that are contaminated by other aliases.

Finally, if the sampling rate is exactly equal to the number $2W$ , the aliases abut as in Figure 5.

Figure 5. The Fourier transform (21) of the sampled signal with transform shown in Figure 1 provided that the sampling rate $f_s$ equals $2W$ , the two-sided width of $X(f)$ in Figure 1.

Figure 5. The Fourier transform (21) of the sampled signal with transform shown in Figure 1 provided that the sampling rate $f_s$ equals $2W$ , the two-sided width of $X(f)$ in Figure 1.

Now let’s consider using $Y(f)$ to obtain $x(t)$ for all $t$ . If we enforce $f_s > 2W$ , as in Figure 3, it might be clear to you how we can operate on $Y(f)$ to get back $X(f)$ , perhaps scaled, but still $X(f)$ . The answer is to use an ideal lowpass filter with cutoff frequency $f_c$ between $f = W$ and $f = f_s - W$ , as illustrated in Figure 6.

Figure 6. Illustration of the nature of the ideal lowpass filter $H_{LPF}(f)$ that can be used to select a single alias in the expression (21) for $Y(f)$ when the sampling rate is large relative to $2W$ .

Figure 6. Illustration of the nature of the ideal lowpass filter $H_{LPF}(f)$ that can be used to select a single alias in the expression (21) for $Y(f)$ when the sampling rate is large relative to $2W$ .

The idea is to pass the ideally sampled signal $y(t)$ through an ideal lowpass filter and show that the output is equal to the original signal $x(t)$ except for possibly a scaling factor and a delay, because we consider that a scaled and delayed version of a signal is equal to the signal itself in terms of what information can be extracted from it. (Generally a non-zero constant scaling factor and a delay are not considered distortion in signal processing.) So we find $y(t) \otimes h_{LPF}(t)$ and determine whether it is equal to $z(t) = Cx(t-D)$ . (Of course $C \neq 0$ must be the case or we fail.)

Proceeding in the frequency domain, we have

$\displaystyle Z(f) = Y(f) H_{LPF}(f) = f_sX(f). \hfill (22)$

So right away we see that $z(t) = f_s x(t)$ . But this leaves out the role of the samples $x(kT_s)$ themselves. We want to prove that there is a way to operate on those samples mathematically to produce $z(t)$ .

The function of frequency $Z(f)$ is zero outside the interval $[-W, W]$ due to the action of the ideal lowpass filter. Therefore we can represent this signal on that interval using a Fourier series. That is,

$\displaystyle Z(f) = \sum_{k=-\infty}^\infty c_k e^{i 2 \pi k f/(2W)}. \hfill (23)$

Notice that we can also choose the cutoff frequency of the lowpass filter to be $f_c = f_s/2$ , because

$\displaystyle f_c > W \Rightarrow \frac{1}{2T_s} > W \Rightarrow f_s > 2W, \hfill (24)$

as assumed, and

$\displaystyle f_c < 1/T_s - W \Rightarrow W < 1/T_s - 1/2T_s = 1/2T_s \Rightarrow f_s > 2W, \hfill (25)$

again, as assumed. So let’s represent the function Z(f) on the interval $[-f_s/2, f_s/2]$ using a Fourier series

$\displaystyle Z(f) = \sum_{k=-\infty}^\infty c_k e^{i 2 \pi f k /f_s}, \hfill (26)$

where the Fourier coefficients are given by

$\displaystyle c_k = \frac{1}{f_s} \int_{-f_s/2}^{f_s/2} Z(f) e^{-i2\pi f k /f_s} \, df \hfill (27)$

$\displaystyle = \int_{-W}^W X(f) e^{-i 2\pi f k/f_s} \, df \hfill (28)$

$\displaystyle = x(-k/f_s) = x(-kT_s). \hfill (29)$

Now the time function $z(t)$ is the inverse Fourier transform of $Z(f)$ , so we can apply the inverse transform to (26) to obtain an expression for $z(t)$ in terms of the samples $x(kT_s)$ ,

$\displaystyle z(t) = \int_{-\infty}^\infty Z(f) e^{i 2 \pi f t} \, df \hfill (30)$

$\displaystyle = \int_{-W}^W Z(f) e^{i 2 \pi f t} \, df \hfill (31)$

$\displaystyle = \int_{-W}^W \left[ \sum_{k=-\infty}^\infty c_k e^{i2\pi f k/f_s} \right] e^{i 2 \pi f t} \, df \hfill (32)$

$\displaystyle = \int_{-W}^W \left[ \sum_{k=-\infty}^\infty x(-kT_s) e^{i2\pi f k /f_s} e^{i 2 \pi f t} \right] \, df \hfill (33)$

$\displaystyle = \sum_{k=-\infty}^\infty x(-kT_s) \int_{-W}^W e^{i 2 \pi f (t - kT_s)} \, df \hfill (34)$

$\displaystyle = \sum_{k=-\infty}^\infty x(-kT_s) \left[ \frac{e^{i2\pi(t-kT_s)W} - e^{-i2\pi(t-kT_s)W}}{i 2 \pi (t-kT_s)} \right] \hfill (35)$

$\displaystyle = \sum_{k=-\infty}^\infty x(-kT_s) \left[ \frac{\sin(2\pi(t-kT_s)W}{\pi (t-kT_s)} \right] \hfill (36)$

$\displaystyle = 2W\sum_{k=-\infty}^\infty x(-kT_s) \mbox{\rm sinc} (2W(t-kT_s)) \hfill (37)$

$\displaystyle = 2W\sum_{k=-\infty}^\infty x(kT_s) \mbox{\rm sinc} (2W(t+kT_s)). \hfill (38)$

Since $z(t) = f_s x(t)$ ,

$\displaystyle x(t) = \frac{2W}{f_s} \sum_{k=-\infty}^\infty x(kT_s) \mbox{\rm sinc} (2W(t+kT_s)). \hfill (39)$

This shows that the signal $x(t)$ can be obtained for all time instants on the real line from the set of samples $x(kT_s)$ provided that the sampling rate $f_s = 1/T_s > 2W$ , and that $W$ is the largest positive frequency for which the Fourier transform of $x(t)$ is not zero.

The sampling theorem: A signal whose Fourier transform has support on the interval  can be perfectly reconstructed for all time from the countable set of samples spaced by  seconds provided that  seconds.

Undersampling, Critical Sampling, and Oversampling

When the sampling rate is exactly equal to twice the largest frequency component in a signal, $f_s = 2W$ , the sampling is referred to as critical. When the sampling rate is small relative to $2W$ , the sampling is referred to as undersampling, and when it is large relative to $2W$ , the sampling is referred to as oversampling. These conditions are illustrated in Figure 7.

This figure was created by first generating a highly oversampled signal in MATLAB. Such signals are clearly already discrete-time signals, so we’re cheating a little bit. But we pretend that this original signal is a continuous-time signal to be sampled, and blithely proceed. The $W$ parameter is $0.125$ , so that critical sampling happens when $f_s = 2*0.125 = 0.25$ , undersampling happens when $f_s < 0.25$ , and oversampling happens when $f_s > 0.25$ .

Figure 7. Illustration of oversampling, critical sampling, and undersampling using MATLAB.

The Case of a Noiseless Sine Wave

When we analyzed the Fourier transform $Y(f)$ in the sampling theorem development, we tacitly assumed all the infinite summations and the various integrals were well-behaved and didn’t pay too much attention to the limits on the integrals. That’s OK when we’re dealing with normal continuous functions, or even functions with step discontinuities. But when the signal transform contains impulses, we have to be more careful.

In the examples illustrated by Figures 1-6, the condition $f_s > 2W$ gives us the same answers as the condition $f_s = 2W$ . That is because the single point $f=W$ where two aliases join will not make any measurable difference in any of the integrals used in the theorem. But what if there is an impulse at $f=W$ ? In such cases, the sampling rate should be chosen so that it is strictly greater than $2W$ –setting it equal to $2W$ will cause problems with the aliases. Figure 8 illustrates these ideas with a sine wave having frequency $0.1$

Figure 8. Illustration of the special case of sampling a sine wave. An ideal lowpass filter can recover the complex sine wave with frequency $0.1$ if the sampling rate is strictly greater than twice the largest frequency component, or $f_s > 0.2$ , or $T_s < 5.$

Figure 8. Illustration of the special case of sampling a sine wave. An ideal lowpass filter can recover the complex sine wave with frequency $0.1$ if the sampling rate is strictly greater than twice the largest frequency component, or $f_s > 0.2$ , or $T_s < 5.$

Sampling of Random Signals

As we’ve seen in previous posts, random signals are not Fourier transformable. To assess their ‘largest frequency component,’ and thereby enable application of the sampling theorem, we can simply invoke the idea that the power spectrum of the random signal (which should exist for all communication-signal models we have any practical interest in here on the CSP Blog) will allow us to identify the ‘largest frequency component.’ If the signal possesses frequency components outside of the support region of the power spectrum, they will have zero average power, and so even though they might exist in some sense, they cannot substantially contribute to aliases.

So in practice we look at the power spectrum and find the largest frequency component, and then we know how to sample the random signal.

Bandwidth Measures

The rule-of-thumb expression of the sampling theorem is that you must sample with a rate at least as large as twice the largest frequency component in the signal. But what does that last phrase mean? What is ‘the largest frequency component’ in a signal? First, we clarify that the ‘largest’ refers to the frequency of the thing, not its magnitude. (One might sensibly talk about the ‘largest frequency component’ of some transform as the frequency with the largest transform magnitude.)

The remaining issue is to define the largest frequency component so that it makes sense for all kinds of Fourier transformable functions or functions with a well-behaved power spectrum. For the Fourier transform in Figure 1, the largest frequency component is clearly $W$ , because for all frequencies larger, the transform is zero. But most transforms and power spectra don’t possess such simple support regions.

So here let’s introduce the idea of bandwidth. Like the concept of signal-to-noise ratio (SNR), bandwidth does not have a single definition. The reason is that the basic problem is estimating the width of a function that doesn’t necessarily have edges. Or, in ordinary parlance, what is the width of a lump or bump? The way you might approach defining the width of a lump depends on what you might want to do with that width or with that lump. It might also depend on how much effort you want to put into calculating the lump width.

So here are some common definitions of bandwidth, or lumpwidth. Let the maximum value of the power spectrum be $P_{max}$ in decibels. This maximum occurs for frequency $f_{max}$ , which is usually the center frequency of the power spectrum.

3-dB Bandwidth. The frequency interval $[f_1, f_2]$ such that $f_1$ is the maximum frequency for which the PSD takes the value $P_{max} - 3$ and $f_2$ is the minimum frequency for which the PSD takes the value $P_{max} - 3$ .
10-dB Bandwidth. The frequency interval $[f_1, f_2]$ such that $f_1$ is the maximum frequency for which the PSD takes the value $P_{max} - 10$ and $f_2$ is the minimum frequency for which the PSD takes the value $P_{max} - 10$ .
20-dB Bandwidth. The frequency interval $[f_1, f_2]$ such that $f_1$ is the maximum frequency for which the PSD takes the value $P_{max} - 20$ and $f_2$ is the minimum frequency for which the PSD takes the value $P_{max} - 20$ .
99% Bandwidth. The frequency interval $[f_1, f_2]$ such that this frequency interval is the smallest interval that contains $99$ % of the power in the signal’s PSD.
Null-to-Null Bandwidth. The frequency interval $[f_1, f_2]$ such that $f_1$ is the first zero in the PSD to the left of $f_{max}$ , and $f_2$ is the first zero in the PSD to the right of $f_{max}$ .
Occupied Bandwidth. The frequency interval $[f_1, f_2]$ such that this interval is the minimum-length interval for which the signal’s power spectrum is zero outside the interval.

Some of the bandwidth definitions might not exist for a particular signal’s power spectrum. If the power spectrum never actually goes to zero as the frequency increases without bound, then the definition of occupied bandwidth doesn’t apply (the occupied bandwidth is infinite).

Let’s illustrate these bandwidth measures by looking at some familiar power spectra. These correspond to PSK/QAM signals with rectangular pulses, square-root raised-cosine pulses, and half-cosine pulses. Rectangular pulses are not particularly practical (and this bandwidth analysis reveals why), square-root raised-cosine pulses are widely used practical pulses, and half-cosine pulses figure in common modulations like minimum-shift keying (My Papers [8]).

Figure 9. Illustration of various bandwidth measures using ideal power spectra for textbook BPSK with rectangular pulses. The 99% bandwidth for this signal, as a fraction of the symbol rate, is 6.9. The occupied bandwidth is, as a fraction of the symbol rate, $\infty$ .

Figure 9. Illustration of various bandwidth measures using ideal power spectra for textbook BPSK with rectangular pulses. The 99% bandwidth for this signal, as a fraction of the symbol rate, is 6.9. The occupied bandwidth is, as a fraction of the symbol rate, $\infty$ .

Figure 10. Illustration of various bandwidth measures using ideal power spectra for signals with half-cosine pulse functions, such as MSK. The 99% bandwidth for this signal, as a fraction of its symbol rate, is 2.5. The occupied bandwidth is, as a fraction of its symbol rate, $\infty$ .

Figure 11. Illustration of various bandwidth measures using ideal power spectra for signals with square-root raised-cosine pulses and rolloff of one. The 99% bandwidth for this signal, as a fraction of its symbol rate, is 1.6. The occupied bandwidth is, as a fraction of its symbol rate, 2.

Figure 12. Illustration of various bandwidth measures using ideal power spectra for signals with square-root raised-cosine pulse functions and rolloff of 0.5. The 99% bandwidth for this signal, as a fraction of its symbol rate, is 1.3. The occupied bandwidth is, as a fraction of its symbol rate, is 1.5.

We’re safe in applying the sampling theorem if we always use occupied bandwidth. We’re less safe, but still secure, if we use the 99% bandwidth, because although the aliases will overlap and cause distortion, the amount of distortion is severely limited. On the other hand, if we use the 3-dB bandwidth to determine the ‘largest frequency component,’ we’ll end up in many cases with highly overlapped aliases and no ability to recover the signal using the ideal lowpass filtering described above.

Aliasing

We’ve seen that undersampling, which is always relative to the sampling-theorem rule-of-thumb for the minimum sampling rate value, causes the aliases $X(f-kf_s)$ in (21) to overlap and combine, which means various frequency components in $Y(f)$ are changed from their values in $X(f)$ . Recovering the signal $x(t)$ from the samples $x(kT_s)$ is then problematic–the recovered signal will be a distorted version of the original signal.

When the aliases in (21) overlap, we say we have aliasing in the sampled signal. Returning to the case of the signals in Figure 7, we can employ a sampling rate of $1/8$ to the signal if we first apply a lowpass filter with width $1/8$ . This means that the largest frequency component in the filter output will have frequency $1/16$ , and so our rate is twice that, or $1/8$ . This situation is illustrated in Figure 13.

In some situations, a sampling device is restricted to a smallish set of possible rates. After a rate is selected (for example, on the basis of some possible range of signals that might appear in the data), an appropriate anti-aliasing filter is then selected and routinely applied. This means that the obtained samples will always be a high-quality embodiment of the information at the output of the filter, even if the sampling rate and anti-aliasing filter may not be a good match for all inputs that could be experienced by the system.

Figure 13. Illustration of the use of an anti-aliasing filter. If we do not have a bandlimited signal prior to sampling, we can’t easily pick a good sampling rate by applying the sampling theorem. To circumvent this problem, we can apply a filter to the data prior to sampling and then use a sampling rate consistent with the bandwidth of the applied filter. This prevents aliasing, and if the filtering only mildly distorts the signal, we have a workable method of sampling. The filter that is applied to prevent aliasing is called, naturally, an anti-aliasing filter.

Downcoversion to Complex Baseband by Purposeful Undersampling

One of the interesting things about aliasing is that one can use it to one’s advantage instead of always considering it a nuisance. Suppose we apply the sampling theorem to a bandpass signal, rather than to the lowpass signals we’ve been considering so far. By that I mean let’s apply the sampling theorem to a carrier-modulated signal. Such signals have power spectra that are zero or very small near zero frequency, and have their power concentrated near some large carrier frequency, such as 800 MHz (PCS band) or 2.4 GHz (ISM band).

For example, consider the signal with spectrum plotted in red in the upper plot of Figure 14. This modulated signal was formed by mixing the baseband signal, plotted in blue, with a sine wave with frequency $0.25$ , plotted in black.

The largest frequency component of the modulated signal is at about $0.375$ Hz, so we’d expect to have to sample at at least twice that to obey the sampling theorem. But if we sample at a reduced rate, and we’re careful about choosing that rate, we can arrange for one of the aliases to land on or near zero frequency, effectively frequency translating the received modulated signal. We can then use our usual lowpass filter to recover the original baseband signal. The aliases of the modulated signal are shown in the lower plot of Figure 14 for the case of sampling the modulated signal at $T_s = 4$ .

Figure 14. Illustration of downconversion (frequency translation) by means of purposeful undersampling. The modulated signal is shown in red. We desire to convert it to complex baseband (center frequency of zero) without employing explicit frequency shifting operations.

Sampling in MATLAB

The basic tool is resample.m, which has the form Y = resample(X,P,Q). The vector of samples in X is resampled at the rate P/Q. So to double the sampling rate, pick P = 2, Q = 1. To halve the sampling rate, pick P = 1, Q = 2, but if you are going to make the sample rate smaller, take care of anti-aliasing filtering, although there are optional ways to call resample.m so that MATLAB does the anti-aliasing filtering for you. Just watch out.

When trying to convert the sampling rate of some data vector in MATLAB from, say $f_1$ to $f_2$ , we have to pick P and Q in resample.m. A useful tool for that is MATLAB’s rat.m function. Suppose we have data sampled at 61.44 MHz and we want to convert that data to a sampling rate of 50 MHz. What are P and Q for resample.m? If we use rat.m, we can easily find out:

[n, d] = rat(61.44/50.0)

gives n = 768, d = 625. We can then identify P = n = 768 and Q = d = 625, and we’re ready to call resample.m.

A tool I used to create some of the plots in this post is MATLAB’s upsample.m, which inserts zeros between the samples of its input vector. The call is Y = upsample (X, N). The returned vector Y is X but with N-1 zeros inserted between each pair of adjacent samples in X. If X = [1 2 3 4], and N = 3, then Y looks like

1 0 0 2 0 0 3 0 0 4 0 0

Use of upsample.m allowed easy modeling of the ideally sampled signal, which is an impulse train where the areas of the impulses are modulated by the values of the signal.

Practical Sampling

We can back off our ideal impulse-train sampling set up by just allowing the impulses to be very tall very narrow rectangles. If we do that, we have a sampling function

$\displaystyle s(t) = \sum_{k=-\infty}^\infty \frac{1}{\Delta t} \mbox{\rm rect} (\frac{t-kT_s}{\Delta t}), \hfill (40)$

and our sampled signal is $y(t) = s(t) x(t)$ . Following the same style of analysis as before, we find essentially the same result with the difference that the aliases have amplitudes that decrease with increasing $|k|$ . However, the basic condition for alias-free $y(t)$ is the same as in the case of ideal sampling: $f_s > 2W$ .

Normalized Frequencies and a Sampling Rate of One

The most common use of ‘sampling frequency’ or ‘sampling rate’ on the CSP Blog is a statement similar to ‘Here we use normalized frequencies, which means the sampling rate is 1.’ The idea is that once you choose a physical sampling rate, hopefully in accordance with the dictates of the sampling theorem, you now have a sequence of complex (usually) numbers to work with internally to your digital signal processing software. The operations you perform on that sequence don’t depend in any meaningful sense on the absolute value of the sampling rate. That is, you can work hard in your software to process the sequence $\{x(kT_s)\}$ or you can just consider it the sequence $\{x(k)\}$ , which means you’re temporarily setting $T_s = 1$ .

With the true physical value of $T_s = 1/f_s$ , you are processing data corresponding to the physical frequency band of $[-f_s/2, f_s/2]$ . When you pretend $T_s = 1$ , this band maps to the band $[-1/2, 1/2]$ and you’re using normalized frequencies internally to your software: all physical frequencies $f$ become $f/f_s$ and all physical times $t$ become $t/T_s$ . At the end of processing, perhaps you have a spectrum estimate or some estimate of a time constant, and these will be in normalized units. When these are reported to the outside (of your software) world, they can be scaled by the true physical sampling rate or sampling increment so that the involved humans can easily relate the signal-processing products to the physical situation that produced the sequence in the first place.

The use of normalized frequencies simplifies the coding of digital signal processing algorithms–they can almost always be written as if $T_s = 1$ .

Significance of the Sampling Theorem and Sampling Rate in CSP

For the most part, if your data is critically sampled or oversampled, you are in good shape applying second-order CSP, such as various estimators of the spectral correlation function. However, one must be on guard in the time domain. If you want to do direct estimation of the cyclic autocorrelation function, you need to form the lag product $x(t+\tau/2) x^*(t-\tau/2)$ or $x(t+\tau/2)x(t-\tau/2)$ , and then multiply that product by a complex-valued sine wave with frequency equal to a cycle frequency $\alpha$ , $e^{-i 2 \pi \alpha t}$ . However, the sampling rate that you are operating at must support the generation of this sine wave without ambiguity. That is, you can unambiguously represent sine waves up through the sine-wave frequency of $f_s/2$ , or $1/2$ for our usual case of $f_s = 1$ . This problem has come up for several readers; see the comments on the post on the cyclic autocorrelation for BPSK. If you want to consider larger cycle frequencies, upsample the data. Or, use frequency-domain approaches to estimation, such as the frequency-smoothing and time-smoothing methods of spectral correlation estimation.

I have encountered engineers over time that casually equate the symbol rate of a digital signal with its bandwidth. This isn’t a problem most of the time. However, if we take it seriously and declare that the significant bandwidth of a digital QAM/PSK signal is equal to its symbol rate, and then use an anti-aliasing filter with bandwidth equal to the symbol rate so we can critically sample at the symbol rate, we will eliminate the non-conjugate cyclic feature(s) of the signal.

To see why this is so, remember that the spectral correlation function measures the temporal correlation between spectrally distinct narrowband components in a time series. For non-conjugate spectral correlation to exist with cycle frequency $\alpha$ the data needs to possess two narrowband components that are separated by $\alpha$ Hz. But if you’ve filtered the signal so that its bandwidth is equal to the symbol rate, then there cannot be any such pairs of narrowband components to correlate. Moreover, there can’t be any correlated narrowband components with separations larger than the symbol rate in this case.

So we must preserve whatever bandwidth in excess of the symbol rate that is exhibited by a digital QAM/PSK/CPM signal in order to assess its symbol-rate cyclic features. From this point of view, preserving the occupied bandwidth of the signal throughout the sampling process is a critical system goal for CSP applications.

For higher-order cyclostationary (HOCS) signal processing, the situation is similar to the second-order time-domain situation, but a little worse. We typically want to apply simple homogeneous nonlinearities to our data, in pursuit of temporal moment functions, such as $x^2(t)$ or $x^4(t)$ or $x(t)x(t)x^*(t)x^*(t)$ , and find the additive sine-wave components in the outputs. But a typical cycle frequency (frequency of one of those additive sine-wave components) is a quadrupled carrier or quadrupled carrier plus the symbol rate. So these cycle frequencies can be large relative to $f_s/2$ . Once they exceed $f_s/2$ , they are not well-represented, and they essentially alias, or wrap around the digital signal processing principal domain of $[-f_s/2, f_s/2)$ , and end up somewhere else in the domain.

Since this HOCS complication is rather abstract, let’s illustrate with some plots. Consider a BPSK signal with symbol rate $1/3$ and carrier frequency offset of $0.05$ . When we subject this signal to various homogeneous $n$ th-order nonlinearities, we obtain the results shown in Figure 15.

Figure 15. Illustration of the arrangement of aliased cycle frequencies in the outputs of various nonlinear operations applied to a complex-valued BPSK signal with symbol rate of 1/3 and carrier offset of 0.05. The appearance of the PSD indicates the signal is slightly oversampled, but once we start using fourth-order nonlinearities, the cycle frequencies alias and their appearance in the principal domain $[-0.5, 0.5)$ is confusing–the cycle frequency pattern is not evident. For example, for fourth order, the cycle frequencies are $\{4*0.05, 4*0.05\pm 1/3\} = \{0.2, 0.5333, -0.1333\}$ . The cycle frequency $0.5333$ aliases to $0.4666$ .

Figure 15. Illustration of the arrangement of aliased cycle frequencies in the outputs of various nonlinear operations applied to a complex-valued BPSK signal with symbol rate of 1/3 and carrier offset of 0.05. The appearance of the PSD indicates the signal is slightly oversampled, but once we start using fourth-order nonlinearities, the cycle frequencies alias and their appearance in the principal domain $[-0.5, 0.5)$ is confusing–the cycle frequency pattern is not evident. For example, for fourth order, the cycle frequencies are $\{4*0.05, 4*0.05\pm 1/3\} = \{0.2, 0.5333, -0.1333\}$ . The cycle frequency $0.5333$ aliases to $0.4666$ .

If we increase the sampling rate by a factor of two, and reprocess, we obtain the results shown in Figure 16. Overall, this result is much more clear, and we can readily imagine processing these functions to find the correct cycle frequency values as well as their correct relative cyclic-moment amplitudes. However, for order six there is still an aliasing problem, as $6*(0.05/2) + 3*((1/3)/2) = 0.65$ . This cycle frequency is aliased to $0.35$ . If we increase the sampling rate by another factor of two, we obtain the results shown in Figure 17, where all generated cycle frequencies lie in the principal domain.

So in the end, if you are doing time-domain HOCS, it is a good idea to oversample your signal prior to the application of the various moment-related nonlinearities. Unfortunately this increases the estimator cost.

Figure 16. Nonlinear HOCS operations on the upsampled BPSK signal of Figure 15. Note that the cycle-frequency patterns are much more as expected, and the locations of the impulsive components at the nonlinearity outputs are not complicated by aliasing.

Figure 17. Generated cycle frequencies for a BPSK signal upsampled by a factor of four. No cycle frequencies are aliased.

For other aspects of and perspectives on sampling in digital signal processing, checkout WaveWalker’s posts on aliasing and spinning objects and on a challenge question regarding the sampling theorem involving sine waves.

Previous SPTK Post: Random Processes Next SPTK Post: Echo Detection

Author: Chad Spooner

I'm a signal processing researcher specializing in cyclostationary signal processing (CSP) for communication signals. I hope to use this blog to help others with their cyclo-projects and to learn more about how CSP is being used and extended worldwide. View all posts by Chad Spooner

11 thoughts on “SPTK: Sampling and The Sampling Theorem”

Matt @ WaveWalkerDSP says:

February 21, 2022 at 9:08 am

Really nice post Chad, thanks for writing this. I particularly appreciate the description of the different types of bandwidth and the images for the different signals.

Loading...

1. Chad Spooner says:
  
  February 25, 2022 at 12:47 pm
  
  Thanks Matt!
  
  Loading...
  
Rob says:

January 12, 2023 at 8:17 am

Hello!
Thanks for the great tutorial!
I am a bit lost in the steps (9)-(11), is there any issue with equation rendering?
(AFAIK, \int \delta(t)f(t) = f(0), and this justifies the 1/T_s you get at the end, but my math is a bit (too) rusty and so I’m probably missing a step there)
Thanks again!
Rob

Loading...

1. Chad Spooner says:
  
  January 12, 2023 at 8:30 am
  
  Thanks Rob! Yes, there were multiple typos: In (9), (10), and (13), which means there are probably more!
  
  (9): Missing equals sign
  (10): Incorrect upper limit on integral
  (13): Missing factor of $1/T_s$
  
  Let me know if you still have questions after looking over the corrected equations.
  
  Loading...
  
Rob says:

January 12, 2023 at 10:40 pm

Hello!
Thanks for the quick reply! Indeed, now it makes much more sense 🙂
However, wouldn’t you leave out altogether (10) — which is right now identical to (9) — or rewrite it with the exponential evaluated at t=0 (and thus =1)?
Thanks again and have a nice day!
Rob

Loading...

1. Chad Spooner says:
  
  January 13, 2023 at 7:52 am
  
  Rob: Keep those error detections coming! Made another change. Hopefully it is all good now.
  
  Loading...
  
A says:

March 14, 2023 at 12:51 pm

Thanks for the post!
My question is why is the signals being processed always converted to baseband?
The signals are always transmitted at a carrier frequency and received back at that carrier frequency. Why not do the processing at the carrier frequency?

Loading...

1. Chad Spooner says:
  
  March 16, 2023 at 8:56 am
  
  Welcome to the CSP Blog A! Thanks much for the comment.
  
  The main reason I want to process signals at complex baseband (the complex envelope) is that I can process the same number of seconds of data with a much lower sampling rate, thereby reducing computational complexity. That is, obeying the sampling theorem at complex baseband means I can use a sampling rate that is approximately equal to the signal’s bandwidth, which is much much smaller, typically, than the signal’s carrier frequency.
  
  To adequately represent an RF signal with samples, without aliases, you need to sample at approximately twice the carrier frequency (really twice the frequency that is equal to the largest frequency component in the signal, so something like $2(f_c + B/2)$ , where $B$ is the occupied bandwidth of the signal at baseband). But this is usually a huge number compared to the critical sampling rate at complex baseband, which is $2*(B/2) = B$ .
  
  Buying it?
  
  Loading...
  
  1. A says:
    
    April 27, 2023 at 12:27 pm
    
    It as been a while since I commented as I have been busy, but here is my late response. From your response, my understanding is that to sample a signal at a carrier frequency, I would need to sample it at a sampling frequency twice its highest frequency. This gives the equation 2(fc + B/2). From the posts, it is also possible to sample the signal at a lower sampling frequency under certain conditions, called under sampling. However the confusion arises when I read other sources that claim you just need to sample at twice the signals bandwidth regardless if it has a carrier frequency or not. I do not know if I am misreading the sources or they are making these statements under the assumption that the signal is already converted to baseband. If my understanding above is correct then it makes sense to me why signal are converted to IF and baseband frequencies before sampling.
    
    Loading...
    
    1. Mansoor Wahab says:
      
      June 7, 2026 at 3:43 pm
      
      When sampling a RF/bandpass signal with carrier frequency, you have several options:
      
      1) Downconvert the signal to baseband with analog parts (mixers, etc) and then sample it like an ordinary lowpass signal as explained in this post. Note that if the lowpass signal turns out to be a complex, you will need two ADCs to sample the real and imaginary (I/Q) parts.
      
      2) Sample the signal directly at $f_s \geq 2 \times (f_c +B/2)$ . This will always work (it basically treats the RF signal as a baseband signal with all its energy concentrated around $\pm f_c$ ) but is very wasteful as mentioned above by Dr. Spooner.
      
      3) Sample the signal directly at $f_s \geq 2B$ or more, but be careful since (unlike the baseband case) not all sampling frequencies above $2B$ are permissible. This restriction stems from the need to ensure there is no spectral mixing between the copies of the bandpass signal at $-f_c$ and $+f_c$ . Use the figure below to guide you in picking an appropriate sampling rate. Here, $F_H= f_c + B/2$ , and white regions represent allowed sampling frequencies.
      
      4) Sample the signal directly at ANY $f_s \geq 2B$ using time-interleaved ADC architecture (the signal is sampled by multiple ADCs separated in time with a small delay). There is no restriction on sampling frequency other than it has to be greater than $2B$ , but there are restrictions on the timing delay between ADCs if the signal is later to be reconstructed.
      
      The nomenclature around RF/bandpass sampling can be confusing so I hope this clarifies it for you.
      
      Loading...
      
      1. Chad Spooner says:
        
        June 10, 2026 at 8:45 am
        
        Here is Mansoor’s figure:
        
        Loading...