Previous SPTK Post: Random Processes Next SPTK Post: TBD

In this Signal Processing ToolKit post we take a close look at the basic sampling theorem used daily by signal-processing engineers. Application of the sampling theorem is a way to choose a sampling rate for converting an analog continuous-time signal to a digital discrete-time signal. The former is ubiquitous in the physical world–for example all the radio-frequency signals whizzing around in the air and through your body right now. The latter is ubiquitous in the computing-device world–for example all those digital-audio files on your ~~Discman~~ ~~Itunes~~ ~~Ipod~~ ~~DVD~~ ~~Smartphone~~ ~~Cloud~~ ~~Neuralink~~ Singularity.

So how are those physical real-world analog signals converted to convenient lists of finite-precision numbers that we can apply arithmetic to? For that’s all [digital or cyclostationary] signal processing is at bottom: arithmetic. You might know the basic rule-of-thumb for choosing a sampling rate: Make sure it is at least twice as big as the largest frequency component in the analog signal undergoing the sampling. But why, exactly, and what does ‘largest frequency component’ mean?

Jump straight to ‘Significance of the Sampling Theorem in CSP’ below.

### Preliminaries

In the mathematical development, I’m mostly going to use real-valued signals so that the hand-drawn figures and MATLAB-based plots correspond closely to the equations. But the theorem applies to complex-valued signals too, and of course we’ll eventually apply it to things like the CSP-Blog mascot signal.

Let’s focus on a real-valued signal , and let’s also assume that this signal has a Fourier transform or, in our previous notation, . Let’s also assume that the Fourier transform has *finite support* of , centered at the frequency origin, as shown in Figure 1. This just means that the transform is zero outside the interval of . We will see that this condition makes the theorem work, but in practice we can relax it so that we can effectively sample signals with Fourier transforms that have infinite support, provided that the rate at which the transform magnitude gets small with increasing is large.

We want to obtain uniformly spaced samples of the signal . This means we want to replace the analog signal with an infinite sequence of numbers , for which corresponds in some sense to the value of the signal at time , that is, to . Here is called the* sampling increment* (units of seconds) and its reciprocal is called the *sampling rate* (units of Hertz, or ).

The start of our basic sampling model is illustrated in Figure 2. The signal to be sampled, , is multiplied by the sampling waveform, , and the areas of each of the modulated rectangles in the result are the samples. For reasonably well-behaved and small , the areas of the rectangles will be closely approximated by .

First we will idealize the sampling model in Figure 2 so that the sampling rectangles become impulses (Dirac delta functions). Then we’ll derive the condition under which the ideal sampling results in a set of numbers that are sufficient to enable recreation of the entire original signal–the sampling theorem. Later we’ll look at non-ideal sampling through the same lens, and discuss some issues surrounding practical uses of sampling.

### Ideal Sampling

To idealize the sampling, which creates a sampling scheme or definition that is not physically possible, but is mathematically elegant, we allow the rectangle widths in Figure 2 to approach zero, . This leads to an impulse train for ,

The sampled signal is then the multiplication of with ,

The sampled signal is equivalent to

The signal is not a representation of –many of the values of are unavailable. The question becomes: Can we use to reconstruct *all* the values of without error? If so, what is the condition on the sampling increment that enables such reconstruction? Equivalently, what is the condition on the sampling rate , relative to properties of the signal-to-be-sampled ? And what is the mathematical operation that we can apply to the samples to obtain for all time?

Let’s investigate , which depends only on the countable set of numbers , to see if we can answer the question about .

First, what is the Fourier transform of ?

where the last equality follows from the convolution theorem. We now recall our work on the periodic impulse train,

which, like any periodic function, is representable by a Fourier series,

where the Fourier coefficients are given by

This means we can express the periodic impulse train, which is our sampling function, as the following infinite sum of complex sine waves,

We are now in a position to evaluate the Fourier transform appearing in our expression for in (6),

We can now evaluate the Fourier transform for the sampled signal in (6), and here is where the good-sampling condition on will begin to become evident:

At this point, we can use our cartoonish Fourier transform drawing in Figure (1) to sketch in (21) for different choices of relative to the parameter . Notice that is just the sum of shifted and scaled (by ) versions of . So first, let’s pick the sampling rate to be much larger than . This will ensure that each of the shifted transforms (since they are all just other names for the basic function , let’s call them *aliases*) do not overlap.

On the other hand, if the sampling rate is smaller than , then the various aliases in (21) will overlap and add together, as in Figure 4.

Finally, if the sampling rate is exactly equal to the number , the aliases abut as in Figure 5.

Now let’s consider using to obtain for all . If we enforce , as in Figure 3, it might be clear to you how we can operate on to get back , perhaps scaled, but still . The answer is to use an ideal lowpass filter with cutoff frequency between and , as illustrated in Figure 6.

The idea is to pass the ideally sampled signal through an ideal lowpass filter and show that the output is equal to the original signal except for possibly a scaling factor and a delay, because we consider that a scaled and delayed version of a signal is equal to the signal itself in terms of what information can be extracted from it. (Generally a non-zero constant scaling factor and a delay are not considered *distortion* in signal processing.) So we find and determine whether it is equal to . (Of course must be the case or we fail.)

Proceeding in the frequency domain, we have

So right away we see that . But this leaves out the role of the samples themselves. We want to prove that there is a way to operate on those samples mathematically to produce .

The function of frequency is zero outside the interval due to the action of the ideal lowpass filter. Therefore we can represent this signal on that interval using a Fourier series. That is,

Notice that we can also choose the cutoff frequency of the lowpass filter to be , because

as assumed, and

again, as assumed. So let’s represent the function Z(f) on the interval using a Fourier series

where the Fourier coefficients are given by

Now the time function is the inverse Fourier transform of , so we can apply the inverse transform to (26) to obtain an expression for in terms of the samples ,

Since ,

This shows that the signal can be obtained for all time instants on the real line from the set of samples provided that the sampling rate , and that is the largest positive frequency for which the Fourier transform of is not zero.

The sampling theorem: A signal whose Fourier transform has support on the interval can be perfectly reconstructed for all time from the countable set of samples spaced by seconds provided that seconds.

### Undersampling, Critical Sampling, and Oversampling

When the sampling rate is exactly equal to twice the largest frequency component in a signal, , the sampling is referred to as *critical*. When the sampling rate is small relative to , the sampling is referred to as *undersampling*, and when it is large relative to , the sampling is referred to as *oversampling*. These conditions are illustrated in Figure 7.

This figure was created by first generating a highly oversampled signal in MATLAB. Such signals are clearly already discrete-time signals, so we’re cheating a little bit. But we pretend that this original signal is a continuous-time signal to be sampled, and blithely proceed. The parameter is , so that critical sampling happens when , undersampling happens when , and oversampling happens when .

### The Case of a Noiseless Sine Wave

When we analyzed the Fourier transform in the sampling theorem development, we tacitly assumed all the infinite summations and the various integrals were well-behaved and didn’t pay too much attention to the limits on the integrals. That’s OK when we’re dealing with normal continuous functions, or even functions with step discontinuities. But when the signal transform contains impulses, we have to be more careful.

In the examples illustrated by Figures 1-6, the condition gives us the same answers as the condition . That is because the single point where two aliases join will not make any measurable difference in any of the integrals used in the theorem. But what if there is an impulse at ? In such cases, the sampling rate should be chosen so that it is strictly greater than –setting it equal to will cause problems with the aliases. Figure 8 illustrates these ideas with a sine wave having frequency

### Sampling of Random Signals

As we’ve seen in previous posts, random signals are not Fourier transformable. To assess their ‘largest frequency component,’ and thereby enable application of the sampling theorem, we can simply invoke the idea that the power spectrum of the random signal (which should exist for all communication-signal models we have any practical interest in here on the CSP Blog) will allow us to identify the ‘largest frequency component.’ If the signal possesses frequency components outside of the support region of the power spectrum, they will have zero average power, and so even though they might exist in some sense, they cannot substantially contribute to aliases.

So in practice we look at the power spectrum and find the largest frequency component, and then we know how to sample the random signal.

### Bandwidth Measures

The rule-of-thumb expression of the sampling theorem is that you must sample with a rate at least as large as twice the largest frequency component in the signal. But what does that last phrase mean? What is ‘the largest frequency component’ in a signal? First, we clarify that the ‘largest’ refers to the frequency of the thing, not its magnitude. (One might sensibly talk about the ‘largest frequency component’ of some transform as the frequency with the largest transform magnitude.)

The remaining issue is to define the largest frequency component so that it makes sense for all kinds of Fourier transformable functions or functions with a well-behaved power spectrum. For the Fourier transform in Figure 1, the largest frequency component is clearly , because for all frequencies larger, the transform is zero. But most transforms and power spectra don’t possess such simple support regions.

So here let’s introduce the idea of *bandwidth*. Like the concept of signal-to-noise ratio (SNR), bandwidth does not have a single definition. The reason is that the basic problem is estimating the width of a function that doesn’t necessarily have edges. Or, in ordinary parlance, what is the width of a lump or bump? The way you might approach defining the width of a lump depends on what you might want to do with that width or with that lump. It might also depend on how much effort you want to put into calculating the lump width.

So here are some common definitions of bandwidth, or lump width. Let the maximum value of the power spectrum be in decibels. This maximum occurs for frequency , which is usually the center frequency of the power spectrum.

**3-dB Bandwidth**. The frequency interval such that is the maximum frequency for which the PSD takes the value and is the minimum frequency for which the PSD takes the value .**10-dB Bandwidth**. The frequency interval such that is the maximum frequency for which the PSD takes the value and is the minimum frequency for which the PSD takes the value .**20-dB Bandwidth**. The frequency interval such that is the maximum frequency for which the PSD takes the value and is the minimum frequency for which the PSD takes the value .**99% Bandwidth**. The frequency interval such that this frequency interval is the smallest interval that contains % of the power in the signal’s PSD.**Null-to-Null Bandwidth**. The frequency interval such that is the first zero in the PSD to the left of , and is the first zero in the PSD to the right of .**Occupied Bandwidth**. The frequency interval such that this interval is the minimum-length interval for which the signal’s power spectrum is zero outside the interval.

Some of the bandwidth definitions might not exist for a particular signal’s power spectrum. If the power spectrum never actually goes to zero as the frequency increases without bound, then the definition of occupied bandwidth doesn’t apply (the occupied bandwidth is infinite).

Let’s illustrate these bandwidth measures by looking at some familiar power spectra. These correspond to PSK/QAM signals with rectangular pulses, square-root raised-cosine pulses, and half-cosine pulses. Rectangular pulses are not particularly practical (and this bandwidth analysis reveals why), square-root raised-cosine pulses are widely used practical pulses, and half-cosine pulses figure in common modulations like minimum-shift keying (My Papers [8]).

We’re safe in applying the sampling theorem if we always use occupied bandwidth. We’re less safe, but still secure, if we use the 99% bandwidth, because although the aliases will overlap and cause distortion, the amount of distortion is severely limited. On the other hand, if we use the 3-dB bandwidth to determine the ‘largest frequency component,’ we’ll end up in many cases with highly overlapped aliases and no ability to recover the signal using the ideal lowpass filtering described above.

### Aliasing

We’ve seen that undersampling, which is always relative to the sampling-theorem rule-of-thumb for the minimum sampling rate value, causes the aliases in (21) to overlap and combine, which means various frequency components in are changed from their values in . Recovering the signal from the samples is then problematic–the recovered signal will be a distorted version of the original signal.

When the aliases in (21) overlap, we say we have aliasing in the sampled signal. Returning to the case of the signals in Figure 7, we can employ a sampling rate of to the signal if we first apply a lowpass filter with width . This means that the largest frequency component in the filter output will have frequency , and so our rate is twice that, or . This situation is illustrated in Figure 13.

In some situations, a sampling device is restricted to a smallish set of possible rates. After a rate is selected (for example, on the basis of some possible range of signals that might appear in the data), an appropriate anti-aliasing filter is then selected and routinely applied. This means that the obtained samples will always be a high-quality embodiment of the information at the output of the filter, even if the sampling rate and anti-aliasing filter may not be a good match for all inputs that could be experienced by the system.

### Downcoversion to Complex Baseband by Purposeful Undersampling

One of the interesting things about aliasing is that one can use it to one’s advantage instead of always considering it a nuisance. Suppose we apply the sampling theorem to a bandpass signal, rather than to the lowpass signals we’ve been considering so far. By that I mean let’s apply the sampling theorem to a carrier-modulated signal. Such signals have power spectra that are zero or very small near zero frequency, and have their power concentrated near some large carrier frequency, such as 800 MHz (PCS band) or 2.4 GHz (ISM band).

For example, consider the signal with spectrum plotted in red in the upper plot of Figure 14. This modulated signal was formed by mixing the baseband signal, plotted in blue, with a sine wave with frequency , plotted in black.

The largest frequency component of the modulated signal is at about Hz, so we’d expect to have to sample at at least twice that to obey the sampling theorem. But if we sample at a reduced rate, and we’re careful about choosing that rate, we can arrange for one of the aliases to land on or near zero frequency, effectively frequency translating the received modulated signal. We can then use our usual lowpass filter to recover the original baseband signal. The aliases of the modulated signal are shown in the lower plot of Figure 11 for the case of sampling the modulated signal at .

### Sampling in MATLAB

The basic tool is resample.m, which has the form Y = resample(X,P,Q). The vector of samples in X is resampled at the rate P/Q. So to double the sampling rate, pick P = 2, Q = 1. To halve the sampling rate, pick P = 1, Q = 2, but if you are going to make the sample rate smaller, take care of anti-aliasing filtering, although there are optional ways to call resample.m so that MATLAB does the anti-aliasing filtering for you. Just watch out.

When trying to convert the sampling rate of some data vector in MATLAB from, say to , we have to pick P and Q in resample.m. A useful tool for that is MATLAB’s rat.m function. Suppose we have data sampled at 61.44 MHz and we want to convert that data to a sampling rate of 50 MHz. What are P and Q for resample.m? If we use rat.m, we can easily find out:

[n, d] = rat(61.44/50.0)

gives n = 768, d = 625. We can then identify P = n = 768 and Q = d = 625, and we’re ready to call resample.m.

A tool I used to create some of the plots in this post is MATLAB’s upsample.m, which inserts zeros between the samples of its input vector. The call is Y = upsample (X, N). The returned vector Y is X but with N-1 zeros inserted between each pair of adjacent samples in X. If X = [1 2 3 4], and N = 3, then Y looks like

1 0 0 2 0 0 3 0 0 4 0 0

Use of upsample.m allowed easy modeling of the ideally sampled signal, which is an impulse train where the areas of the impulses are modulated by the values of the signal.

### Practical Sampling

We can back off our ideal impulse-train sampling set up by just allowing the impulses to be very tall very narrow rectangles. If we do that, we have a sampling function

and our sampled signal is . Following the same style of analysis as before, we find essentially the same result with the difference that the aliases have amplitudes that decrease with increasing . However, the basic condition for alias-free is the same as in the case of ideal sampling: .

### Normalized Frequencies and a Sampling Rate of One

The most common use of ‘sampling frequency’ or ‘sampling rate’ on the CSP Blog is a statement similar to ‘Here we use normalized frequencies, which means the sampling rate is 1.’ The idea is that once you choose a physical sampling rate, hopefully in accordance with the dictates of the sampling theorem, you now have a sequence of complex (usually) numbers to work with internally to your digital signal processing software. The operations you perform on that sequence don’t depend in any meaningful sense on the absolute value of the sampling rate. That is, you can work hard in your software to process the sequence or you can just consider it the sequence , which means you’re temporarily setting .

With the true physical value of , you are processing data corresponding to the physical frequency band of . When you pretend , this band maps to the band and you’re using normalized frequencies internally to your software: all physical frequencies become and all physical times become . At the end of processing, perhaps you have a spectrum estimate or some estimate of a time constant, and these will be in normalized units. When these are reported to the outside (of your software) world, they can be scaled by the true physical sampling rate or sampling increment so that the involved humans can easily relate the signal-processing products to the physical situation that produced the sequence in the first place.

The use of normalized frequencies simplifies the coding of digital signal processing algorithms–they can almost always be written as if .

### Significance of the Sampling Theorem and Sampling Rate in CSP

For the most part, if your data is critically sampled or oversampled, you are in good shape applying second-order CSP, such as various estimators of the spectral correlation function. However, one must be on guard in the time domain. If you want to do direct estimation of the cyclic autocorrelation function, you need to form the lag product or , and then multiply that product by a complex-valued sine wave with frequency equal to a cycle frequency , . *However, the sampling rate that you are operating at must support the generation of this sine wave without ambiguity*. That is, you can unambiguously represent sine waves up through the sine-wave frequency of , or for our usual case of . This problem has come up for several readers; see the comments on the post on the cyclic autocorrelation for BPSK. If you want to consider larger cycle frequencies, upsample the data. Or, use frequency-domain approaches to estimation, such as the frequency-smoothing and time-smoothing methods of spectral correlation estimation.

I have encountered engineers over time that casually equate the symbol rate of a digital signal with its bandwidth. This isn’t a problem most of the time. However, if we take it seriously and declare that the significant bandwidth of a digital QAM/PSK signal is equal to its symbol rate, and then use an anti-aliasing filter with bandwidth equal to the symbol rate so we can critically sample at the symbol rate, we will eliminate the non-conjugate cyclic feature(s) of the signal.

To see why this is so, remember that the spectral correlation function measures the temporal correlation between spectrally distinct narrowband components in a time series. For non-conjugate spectral correlation to exist with cycle frequency the data needs to possess two narrowband components that are separated by Hz. But if you’ve filtered the signal so that its bandwidth is equal to the symbol rate, then there cannot be any such pairs of narrowband components to correlate. Moreover, there can’t be any correlated narrowband components with separations larger than the symbol rate in this case.

So we must preserve whatever bandwidth in excess of the symbol rate that is exhibited by a digital QAM/PSK/CPM signal in order to assess its symbol-rate cyclic features. From this point of view, preserving the *occupied bandwidth* of the signal throughout the sampling process is a critical system goal for CSP applications.

For higher-order cyclostationary (HOCS) signal processing, the situation is similar to the second-order time-domain situation, but a little worse. We typically want to apply simple homogeneous nonlinearities to our data, in pursuit of temporal moment functions, such as or or , and find the additive sine-wave components in the outputs. But a typical cycle frequency (frequency of one of those additive sine-wave components) is a quadrupled carrier or quadrupled carrier plus the symbol rate. So these cycle frequencies can be large relative to . Once they exceed , they are not well-represented, and they essentially alias, or wrap around the digital signal processing principal domain of , and end up somewhere else in the domain.

Since this HOCS complication is rather abstract, let’s illustrate with some plots. Consider a BPSK signal with symbol rate and carrier frequency offset of . When we subject this signal to various homogeneous th-order nonlinearities, we obtain the results shown in Figure 15.

If we increase the sampling rate by a factor of two, and reprocess, we obtain the results shown in Figure 16. Overall, this result is much more clear, and we can readily imagine processing these functions to find the correct cycle frequency values as well as their correct relative cyclic-moment amplitudes. However, for order six there is still an aliasing problem, as . This cycle frequency is aliased to . If we increase the sampling rate by another factor of two, we obtain the results shown in Figure 17, where all generated cycle frequencies lie in the principal domain.

So in the end, if you are doing time-domain HOCS, it is a good idea to oversample your signal prior to the application of the various moment-related nonlinearities. Unfortunately this increases the estimator cost.

For other aspects of and perspectives on sampling in digital signal processing, checkout WaveWalker’s posts on aliasing and spinning objects and on a challenge question regarding the sampling theorem involving sine waves.

Previous SPTK Post: Random Processes Next SPTK Post: TBD

Really nice post Chad, thanks for writing this. I particularly appreciate the description of the different types of bandwidth and the images for the different signals.

Thanks Matt!